У меня такая проблема.
Предположим, есть довольно длинный и сложный текстовый файл, в котором в разделителях можно встретить такие специальные блоки (которые могут быть пустыми):
Код: Выделить всё
some text
----
text inside special block
----
some text
----
----
some text
----
text inside special block
----
Код: Выделить всё
----
So for example we need to replace all
Код: Выделить всё
text
Код: Выделить всё
TEXT
Код: Выделить всё
some TEXT
----
text inside special block
----
some TEXT
----
----
some TEXT
----
text inside special block
----
So what I really do to solve this is first I parse the whole text for delimiters of special blocks then I get the indexes of "bad" lines and then I apply my regex substitutions line by line checking if this line is not one of the "bad" lines. But if my regex must apply to more than one line it becomes more complicated. And I'm sure there are some pretty smart and easy ways to handle this.
So basically what I need is to be able to exclude some fragments of the text (by theirs spans) from the
Код: Выделить всё
re.sub
Right now I have this solution (the example above is simplified, sorry):
Код: Выделить всё
def find_code_lines(data):
# Search for blocks by regex They can be empty!
r = re.compile(r'(\n----(?=\n)(?P[\s\S]*?\n)----\n)')
# Delete all '\n' which are not line breaks (there are some of them in formulas etc.)
data_edited = data.replace('\\n', '')
# Save spans by symbol indexes
char_spans = []
for m in r.finditer(data_edited):
#print(m.span(1))
#print(m.span[1])
char_spans.append(m.span(1))
# Calculate spans by line indexes
line_spans = []
for span in char_spans:
begin = data_edited[:span[0]].count("\n") + 2
end = data_edited[:span[1]].count("\n") - 1
line_spans.append((begin, end))
return line_spans
# Check if index is inside one of spans
def in_spans(spans, line_index):
res = False
for span in spans:
if line_index >= span[0] and line_index < span[1]:
res = True
return res
# Parse text by blocks
code_lines = find_code_lines(data)
lines_edited = []
data_lines = data.splitlines()
replace_count = 0
for i in range(len(data_lines)):
if in_spans(code_lines, i):
lines_edited.append(data_lines[i])
#print('line in spans:', i)
else:
data_tuple = re.subn(r'(?P\s|^|\s\()\$(?P[^\$`\r\n]{1,1000}?)\$',
r'\1stem:[\2]',
data_lines[i])
if data_tuple[1] == 0:
lines_edited.append(data_lines[i])
else:
lines_edited.append(data_tuple[0])
replace_count += data_tuple[1]
lines_edited.append('')
data = '\n'.join(lines_edited)
log_it('Replaced Math blocks', replace_count)
I added more text to the input example because some of the solutions below can handle only specific versions of inputs (which are easier). So the most difficult one so far is like this:
Код: Выделить всё
some text
----
text inside special block
----
some text
----
----
some text
----
text inside special block
----
some text
Код: Выделить всё
some TEXT
----
text inside special block
----
some TEXT
----
----
some TEXT
----
text inside special block
----
some TEXT
Источник: https://stackoverflow.com/questions/781 ... sion-in-py