Почему Python по-разному обрабатывает строки с экранированными кавычками? [закрыто]

Почему Python по-разному обрабатывает строки с экранированными кавычками? [закрыто] ⇐ Python

1 сообщение • Страница 1 из 1

Anonymous

Почему Python по-разному обрабатывает строки с экранированными кавычками? [закрыто]

Цитата

Сообщение Anonymous » 07 янв 2026, 07:22

Я пытаюсь разделить строку с помощью нескольких (сложных) разделителей. Эти разделители следует использовать только вне кавычек (одинарных ' или двойных кавычек "). Но эти кавычки следует распознавать только в том случае, если они не экранированы символом \.
Итак, распознавайте эти DELIMITERS:

Код: Выделить всё
```
'A string with a DELIMITER to be used.'
```

Код: Выделить всё

"Another string. It\'s got a DELIMITER within escaped \' single-quotes."

Но не те:

Код: Выделить всё

'A string with a "DELIMITER" inside quotes.'

Код: Выделить всё

'Another string with a \'DELIMITER\' inside quotes.'

Я застрял с:

Код: Выделить всё

'A string with a " \"DELIMITER\" within escaped quotes within quotes", which should not be recognized.'

Я не могу правильно обработать эти экранированные двойные кавычки \". Кажется, что экранирующие обратные косые черты покидают строку, но экранирование обратной косой черты приводит к запутанной строке:

Код: Выделить всё

>>> '"\"'
'""'
>>> len('"\"')
2
>>> '"\\"'
'"\\"'
>>> len('"\\"')
3

Мой (сокращенный) код:

Код: Выделить всё

from typing import List

def split_around_needles(expression: str, needles: List[str]) -> List[str]:
buffer: str = ''
parts: List[str] = []
open_quotes: str = ''

for char in expression:
buffer += char

if char in '"\'' and buffer[-1] != '\\':
# if the character is an unescaped quote (no \ before the ' or ")
if open_quotes == '':
# we are entering a quoted area
open_quotes = char
elif open_quotes == char:
# we are leaving the quoted area
open_quotes = ''

continue

if open_quotes != '' or (char not in needles.keys()):
# ignore every character inside valid quotes
# and every character that does not belong to a needle
continue

for needle in needles[char]:
if len(buffer) >= len(needle) and buffer[-1 * len(needle):] == needle:
# found a needle
buffer = buffer[0:-1 * len(needle)].strip()
if buffer != '':
parts.append(buffer)
buffer = ''
parts.append(needle)

if buffer != '':
parts.append(buffer)

return parts

# it does not matter at this point, but needles is a dictionary with the last
# character of the delimiter as its key:
# needles: List[str, str] = {}
# needles['+'] = '+'
# needles['R'] = 'DELIMITER'
# ....

Код: Выделить всё

>>> split_around_needles('"abc" + de')
['"abc"', '+', ' de']

-> правильно

Код: Выделить всё

>>> split_around_needles('"ab+c" + de')
['"ab+c"', '+', ' de']

-> правильно

Код: Выделить всё

>>> split_around_needles('\"ab+c\" + de')
['"ab+c"', '+', ' de']

-> Я не совсем понимаю, почему обратные косые черты исчезают, поскольку они не нужны для выхода из двойных кавычек

Код: Выделить всё

>>> split_around_needles('"a\"b+c" + de')
['"a"b', '+', 'c" + de']

-> это следует логике из последнего примера, но не является желаемым результатом... \" перед b следует игнорировать, поскольку он экранирован, он должен быть: ['"a\"b+c"', '+', 'de']

Код: Выделить всё

>>> split_around_needles('a\\"+bc')
['a\\"+bc']

-> Я не могу понять это, \" следует игнорировать. Это должно быть: ['a\\"', '+', 'bc'] или, возможно, ['a\"', '+', 'bc']
Может ли кто-нибудь указать мне правильное направление?

Подробнее здесь: https://stackoverflow.com/questions/798 ... ifferently

1767759737

Anonymous

Я пытаюсь разделить строку с помощью нескольких (сложных) разделителей. Эти разделители следует использовать только вне кавычек (одинарных ' или двойных кавычек "). Но эти кавычки следует распознавать только в том случае, если они не экранированы символом \.
Итак, распознавайте эти DELIMITERS:
[list]
[*][code]'A string with a DELIMITER to be used.'[/code]

[*][code]"Another string. It\'s got a DELIMITER within escaped \' single-quotes."[/code]

[/list]
Но не те:
[list]
[*][code]'A string with a "DELIMITER" inside quotes.'[/code]

[*][code]'Another string with a \'DELIMITER\' inside quotes.'[/code]

[/list]
Я застрял с:
[list]
[*][code]'A string with a " \"DELIMITER\" within escaped quotes within quotes", which should not be recognized.'[/code]
[/list]
Я не могу правильно обработать эти экранированные двойные кавычки \". Кажется, что экранирующие обратные косые черты покидают строку, но экранирование обратной косой черты приводит к запутанной строке:
[code]>>> '"\"'
'""'
>>> len('"\"')
2
>>> '"\\"'
'"\\"'
>>> len('"\\"')
3
[/code]
Мой (сокращенный) код:
[code]from typing import List

def split_around_needles(expression: str, needles: List[str]) -> List[str]:
buffer: str = ''
parts: List[str] = []
open_quotes: str = ''

for char in expression:
buffer += char

if char in '"\'' and buffer[-1] != '\\':
# if the character is an unescaped quote (no \ before the ' or ")
if open_quotes == '':
# we are entering a quoted area
open_quotes = char
elif open_quotes == char:
# we are leaving the quoted area
open_quotes = ''

continue

if open_quotes != '' or (char not in needles.keys()):
# ignore every character inside valid quotes
# and every character that does not belong to a needle
continue

for needle in needles[char]:
if len(buffer) >= len(needle) and buffer[-1 * len(needle):] == needle:
# found a needle
buffer = buffer[0:-1 * len(needle)].strip()
if buffer != '':
parts.append(buffer)
buffer = ''
parts.append(needle)

if buffer != '':
parts.append(buffer)

return parts

# it does not matter at this point, but needles is a dictionary with the last
# character of the delimiter as its key:
# needles: List[str, str] = {}
# needles['+'] = '+'
# needles['R'] = 'DELIMITER'
# ....
[/code]
[code]>>> split_around_needles('"abc" + de')
['"abc"', '+', ' de']
[/code]
-> правильно
[code]>>> split_around_needles('"ab+c" + de')
['"ab+c"', '+', ' de']
[/code]
-> правильно
[code]>>> split_around_needles('\"ab+c\" + de')
['"ab+c"', '+', ' de']
[/code]
-> Я не совсем понимаю, почему обратные косые черты исчезают, поскольку они не нужны для выхода из двойных кавычек
[code]>>> split_around_needles('"a\"b+c" + de')
['"a"b', '+', 'c" + de']
[/code]
-> это следует логике из последнего примера, но не является желаемым результатом... \" перед b следует игнорировать, поскольку он экранирован, он должен быть: ['"a\"b+c"', '+', 'de']
[code]>>> split_around_needles('a\\"+bc')
['a\\"+bc']
[/code]
-> Я не могу понять это, \" следует игнорировать. Это должно быть: ['a\\"', '+', 'bc'] или, возможно, ['a\"', '+', 'bc']
Может ли кто-нибудь указать мне правильное направление? 

Подробнее здесь: [url]https://stackoverflow.com/questions/79861926/why-does-python-seem-to-handle-strings-with-escaped-quotes-differently[/url]