Когда часть данных окружена пары трубок — это столбец, а когда он окружен только двумя трубками, это значение каждого столбца.
Код: Выделить всё
||Name||Age||Address||Phones||size||
|Edwards|22|London|06 45 06 06 06
06 75 85 06 06
07 85 22 15 48|180cm|
I should have this:
| Name | Age | Adress | Phones | size |
| --- | --- | --- | --- | --- |
| Edwards | 22 | London | 06 45 06 06 \\n 06 06 75 85 06 06 \\n 07 85 22 15 48 | 180 |
| Name | Age | Adress | Phones | size |
| --- | --- | --- | --- | --- |
| Edwards | 22 | London | | 06 45 06 0606 06 75 85 06 0607 85 22 15 48180 |
Моя функция:< /p>
Код: Выделить всё
def process_description_column(df):
data_dictionary = {}
for index, row in df.iterrows():
modified_text = row['Description'].replace('*', '')
description_text = ""
lines = modified_text.strip().split('\n')
in_description_section = False
line_counter = 0
for line in lines:
line = line.strip()
in_description_section, line_counter, description_text = process_line_description(
line, in_description_section, line_counter, description_text)
if '||' in line:
column_names = re.split(r'\|\|', line)
columns = [col.strip() for col in column_names if col.strip()]
for col in columns:
if col not in data_dictionary:
data_dictionary[col] = []
elif '|' in line and columns:
line_without_bar = line.replace('|', '').strip()
if '\n' in line_without_bar:
values = re.split(r'\|', line)
values = [value.strip() for value in values if value.strip()]
while len(values) < len(columns):
values.append('')
for col, value in zip(columns, values):
data_dictionary[col].append(value)
else:
if data_dictionary[columns[-1]]:
data_dictionary[columns[-1]][-1] += ' ' + line_without_bar
else:
values = re.split(r'\|', line)
values = [value.strip() for value in values if value.strip()]
if len(columns) == len(values):
for col, value in zip(columns, values):
data_dictionary[col].append(value)
else:
while len(values) < len(columns):
values.append('')
for col, value in zip(columns, values):
data_dictionary[col].append(value)
data_dictionary['Description'] = [description_text.strip()]
return data_dictionary
Подробнее здесь: https://stackoverflow.com/questions/791 ... -in-python