Исключить определенные диапазоны из строки при замене регулярным выражением в Python

Исключить определенные диапазоны из строки при замене регулярным выражением в Python ⇐ Python

1 сообщение • Страница 1 из 1

Гость

Исключить определенные диапазоны из строки при замене регулярным выражением в Python

Цитата

Сообщение Гость » 11 мар 2024, 22:56

У меня такая проблема.
Предположим, есть довольно длинный и сложный текстовый файл, в котором в разделителях можно встретить такие специальные блоки (которые могут быть пустыми):

Код: Выделить всё

some text
----
text inside special block
----
some text
----
----
some text
----
text inside special block
----

The task is to make some substitutions in the whole text but exclude text inside these borders (

Код: Выделить всё

----

) from these substitutions.
So for example we need to replace all

Код: Выделить всё

text

substrings with

Код: Выделить всё

TEXT

strings but not inside special blocks. The result should be:

Код: Выделить всё

some TEXT
----
text inside special block
----
some TEXT
----
----
some TEXT
----
text inside special block
----

We cannot use lookahead or lookbehind here because in the given position we don't know if we are inside the special block or not (delimiters are not oriented).
So what I really do to solve this is first I parse the whole text for delimiters of special blocks then I get the indexes of "bad" lines and then I apply my regex substitutions line by line checking if this line is not one of the "bad" lines. But if my regex must apply to more than one line it becomes more complicated. And I'm sure there are some pretty smart and easy ways to handle this.
So basically what I need is to be able to exclude some fragments of the text (by theirs spans) from the

Код: Выделить всё

re.sub

when it applies to the whole text. Even if the regex only intersects with the span (not necessarily contains it). So that I can apply the first regex, take the spans of specials blocks by their begin and end indexes and exclude these spans from the second regex. How is this possible?
Right now I have this solution (the example above is simplified, sorry):

Код: Выделить всё

def find_code_lines(data):
# Search for blocks by regex They can be empty!
r = re.compile(r'(\n----(?=\n)(?P[\s\S]*?\n)----\n)')

# Delete all '\n' which are not line breaks (there are some of them in formulas etc.)
data_edited = data.replace('\\n', '')

# Save spans by symbol indexes
char_spans = []
for m in r.finditer(data_edited):
#print(m.span(1))
#print(m.span[1])
char_spans.append(m.span(1))

# Calculate spans by line indexes
line_spans = []
for span in char_spans:
begin = data_edited[:span[0]].count("\n") + 2
end = data_edited[:span[1]].count("\n") - 1
line_spans.append((begin, end))

return line_spans

# Check if index is inside one of spans
def in_spans(spans, line_index):
res = False
for span in spans:
if line_index >= span[0] and line_index < span[1]:
res = True
return res

# Parse text by blocks
code_lines = find_code_lines(data)

lines_edited = []
data_lines = data.splitlines()
replace_count = 0
for i in range(len(data_lines)):
if in_spans(code_lines, i):
lines_edited.append(data_lines[i])
#print('line in spans:', i)
else:
data_tuple = re.subn(r'(?P\s|^|\s\()\$(?P[^\$`\r\n]{1,1000}?)\$',
r'\1stem:[\2]',
data_lines[i])
if data_tuple[1] == 0:
lines_edited.append(data_lines[i])
else:
lines_edited.append(data_tuple[0])
replace_count += data_tuple[1]
lines_edited.append('')
data = '\n'.join(lines_edited)
log_it('Replaced Math blocks', replace_count)

UPD
I added more text to the input example because some of the solutions below can handle only specific versions of inputs (which are easier). So the most difficult one so far is like this:

Код: Выделить всё

some text
----
text inside special block
----
some text
----
----
some text
----
text inside special block
----
some text

Expected output:

Код: Выделить всё

some TEXT
----
text inside special block
----
some TEXT
----
----
some TEXT
----
text inside special block
----
some TEXT

Источник: https://stackoverflow.com/questions/781 ... sion-in-py

1710186970

Гость


У меня такая проблема.
Предположим, есть довольно длинный и сложный текстовый файл, в котором в разделителях можно встретить такие специальные блоки (которые могут быть пустыми):
[code]some text
----
text inside special block
----
some text
----
----
some text
----
text inside special block
----
[/code]
The task is to make some substitutions in the whole text but exclude text inside these borders ([code]----[/code]) from these substitutions.
So for example we need to replace all [code]text[/code] substrings with [code]TEXT[/code] strings but not inside special blocks. The result should be:
[code]some TEXT
----
text inside special block
----
some TEXT
----
----
some TEXT
----
text inside special block
----
[/code]
We cannot use lookahead or lookbehind here because in the given position we don't know if we are inside the special block or not (delimiters are not oriented).
So what I really do to solve this is first I parse the whole text for delimiters of special blocks then I get the indexes of "bad" lines and then I apply my regex substitutions line by line checking if this line is not one of the "bad" lines. But if my regex must apply to more than one line it becomes more complicated. And I'm sure there are some pretty smart and easy ways to handle this.
So basically what I need is to be able to exclude some fragments of the text (by theirs spans) from the [code]re.sub[/code] when it applies to the whole text. Even if the regex only intersects with the span (not necessarily contains it). So that I can apply the first regex, take the spans of specials blocks by their begin and end indexes and exclude these spans from the second regex. How is this possible?
Right now I have this solution (the example above is simplified, sorry):
[code]def find_code_lines(data):
# Search for blocks by regex They can be empty!
r = re.compile(r'(\n----(?=\n)(?P[\s\S]*?\n)----\n)')

# Delete all '\n' which are not line breaks (there are some of them in formulas etc.)
data_edited = data.replace('\\n', '')

# Save spans by symbol indexes
char_spans = []
for m in r.finditer(data_edited):
#print(m.span(1))
#print(m.span[1])
char_spans.append(m.span(1))

# Calculate spans by line indexes
line_spans = []
for span in char_spans:
begin = data_edited[:span[0]].count("\n") + 2
end = data_edited[:span[1]].count("\n") - 1
line_spans.append((begin, end))

return line_spans

# Check if index is inside one of spans
def in_spans(spans, line_index):
res = False
for span in spans:
if line_index >= span[0] and line_index < span[1]:
res = True
return res

# Parse text by blocks
code_lines = find_code_lines(data)

lines_edited = []
data_lines = data.splitlines()
replace_count = 0
for i in range(len(data_lines)):
if in_spans(code_lines, i):
lines_edited.append(data_lines[i])
#print('line in spans:', i)
else:
data_tuple = re.subn(r'(?P\s|^|\s\()\$(?P[^\$`\r\n]{1,1000}?)\$',
r'\1stem:[\2]',
data_lines[i])
if data_tuple[1] == 0:
lines_edited.append(data_lines[i])
else:
lines_edited.append(data_tuple[0])
replace_count += data_tuple[1]
lines_edited.append('')
data = '\n'.join(lines_edited)
log_it('Replaced Math blocks', replace_count)
[/code]
UPD
I added more text to the input example because some of the solutions below can handle only specific versions of inputs (which are easier). So the most difficult one so far is like this:
[code]some text
----
text inside special block
----
some text
----
----
some text
----
text inside special block
----
some text
[/code]
Expected output:
[code]some TEXT
----
text inside special block
----
some TEXT
----
----
some TEXT
----
text inside special block
----
some TEXT
[/code] 

Источник: [url]https://stackoverflow.com/questions/78141599/exclude-specific-spans-from-string-when-substituting-by-regular-expression-in-py[/url]

Ответить Пред. тема След. тема

1 сообщение • Страница 1 из 1

Быстрый ответ

Заголовок:

Имя пользователя:

Изменение регистра текста:

Смайлики

Ещё смайлики…

К этому ответу прикреплено по крайней мере одно вложение.

Если вы не хотите добавлять вложения, оставьте поля пустыми. Можно прикреплять файлы, перетаскивая их в окно сообщения.

Максимально разрешённый размер вложения: 15 МБ.

Имя файла:

Комментарий к файлу:

Имя файла	Комментарий к файлу	Размер	Статус

Похожие темы

Ответы

Просмотры

Последнее сообщение

Php находит строки с регулярным выражением для заполнения массива [дубликат]

Последнее сообщение Anonymous « 18 окт 2023, 17:15
Добавлено в форуме Php

Anonymous » 18 окт 2023, 17:15 » в форуме Php

У меня есть эта строка:

$content ='277100278101'; Мне нужно регулярное выражение для извлечения строк между и для заполнения массива.

Я получил этот код, но он не работает:

preg_match( /\(.*?)/ , $content, $array); Что мне здесь не хватает?

0 Ответы

88 Просмотры

Последнее сообщение Anonymous
18 окт 2023, 17:15
Как «воспроизвести» элементы строки, обработанной регулярным выражением?

Последнее сообщение Anonymous « 10 янв 2025, 18:15
Добавлено в форуме Python

Anonymous » 10 янв 2025, 18:15 » в форуме Python

Введение . У меня есть файл Markdown, который будет обновляться внешним механизмом. Меня беспокоит то, что в этом файле будут «пустые блоки» (см. ниже), которые я хотел бы очистить. Проблема, с которой я столкнулся, заключается в том, что мое...

0 Ответы

22 Просмотры

Последнее сообщение Anonymous
10 янв 2025, 18:15
Почему я могу использовать «совпадение» в качестве имени переменной в Python при работе с регулярным выражением?

Последнее сообщение Anonymous « 08 янв 2025, 09:07
Добавлено в форуме Python

Anonymous » 08 янв 2025, 09:07 » в форуме Python

Я работаю над простой программой на Python, использующей match вместе с библиотекой регулярных выражений.
Мой вопрос: почему я могу использовать match в качестве имени переменной, но не зарезервированных ключевых слов, таких как try , pass и...

0 Ответы

21 Просмотры

Последнее сообщение Anonymous
08 янв 2025, 09:07
Получение простого скрипта Python с регулярным выражением для работы в VSCode как часть другого (не Python) проекта

Последнее сообщение Anonymous « 16 дек 2024, 16:06
Добавлено в форуме Python

Anonymous » 16 дек 2024, 16:06 » в форуме Python

Привет, у меня есть базовые знания Python, но я не очень знаком со всей этой штукой pip/env и совсем не знаком с VScode, поэтому я немного не уверен, с чего начать. Я написал простой скрипт, который является частью другого проекта, не связанного с...

0 Ответы

15 Просмотры

Последнее сообщение Anonymous
16 дек 2024, 16:06
Получение простого скрипта Python с регулярным выражением для работы в VSCode как часть другого (не Python) проекта

Последнее сообщение Anonymous « 16 дек 2024, 16:40
Добавлено в форуме Python

Anonymous » 16 дек 2024, 16:40 » в форуме Python

Привет, у меня есть базовые знания Python, но я не очень знаком со всей этой штукой pip/env и совсем не знаком с VScode, поэтому я немного не уверен, с чего начать. Я написал простой скрипт, который является частью другого проекта, не связанного с...

0 Ответы

43 Просмотры

Последнее сообщение Anonymous
16 дек 2024, 16:40

Вернуться в «Python»