Как получить текст из класса? [закрыто]

Как получить текст из класса? [закрыто] ⇐ Python

1 сообщение • Страница 1 из 1

Anonymous

Цитата

Сообщение Anonymous » 28 ноя 2024, 16:48

Я просто хочу получить информацию о бизнесе. Мне удалось разглядеть все детали, кроме некоторых. Проблема в том, что существует несколько экземпляров класса с одним и тем же именем.
Веб-сайт, с которого я пытаюсь получить данные:
https://www.zivefirmy.cz/pohrebni-sluzb ... k_f1770820? cz=15&loc=10000141
Я попытался перечислить все тексты class_='part', чтобы показать все извлеченного текста, чем из всех найденных классов он ищет class="text-label". После этого он показывает весь текст, но по какой-то причине не отображает «Počet zaměstnanců: 6 - 9» и «Obrat Firmy: 5 mil - 9 999 999Kč». Отображаются только «Правильная форма: Společnost s ručením omezeným» и «Datová shránka: 6rey2pd».
Скриншот:
https://drive.google.com/file/d/1Wv_34h ... drive_link
Мой код:
import os
import requests
from bs4 import BeautifulSoup

business_detail_url = "https://www.zivefirmy.cz/pohrebni-sluzb ... c=10000141"

response = requests.get(business_detail_url, headers={"User-Agent": "Mozilla/5.0"})

if response.status_code == 200:
#Parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')

#Locate all elements
part_elements = soup.find_all('div', class_='part')

#Initialize list to store extracted data
extracted_data = []

#Iterate through each
for index, part in enumerate(part_elements, start=1):
#Find all elements inside the current part div
text_labels = part.find_all('span', class_='text-label')

#Extract text from each and its sibling
for label in text_labels:
label_text = label.get_text(strip=True)

#Try to find the sibling element (e.g., )
sibling = label.find_next_sibling()
sibling_text = sibling.get_text(strip=True) if sibling else "No sibling found"

#Combine label and sibling text
extracted_data.append(f"Part {index}: {label_text} {sibling_text}")

#Define the file path for the output in the project folder
output_file_path = os.path.join(os.getcwd(), 'text_labels_with_siblings.txt')

#Save the extracted information to a file
with open(output_file_path, 'w', encoding='utf-8') as file:
for data in extracted_data:
file.write(data + "\n")

#Print summary to terminal
if extracted_data:
print("Extracted Data:")
for data in extracted_data:
print(data)
else:
print("No text-labels found within parts.")

print(f"Extracted text saved to '{output_file_path}'.")
print(f"Failed to retrieve business detail page. Status code: {response.status_code}")

Подробнее здесь: https://stackoverflow.com/questions/792 ... from-class

1732801715

Anonymous

Я просто хочу получить информацию о бизнесе. Мне удалось разглядеть все детали, кроме некоторых. Проблема в том, что существует несколько экземпляров класса с одним и тем же именем.
Веб-сайт, с которого я пытаюсь получить данные:
https://www.zivefirmy.cz/pohrebni-sluzba-vsetin-orgonik_f1770820? cz=15&loc=10000141
Я попытался перечислить все тексты class_='part', чтобы показать все извлеченного текста, чем из всех найденных классов он ищет class="text-label". После этого он показывает весь текст, но по какой-то причине не отображает «Počet zaměstnanců: 6 - 9» и «Obrat Firmy: 5 mil - 9 999 999Kč». Отображаются только «Правильная форма: Společnost s ručením omezeným» и «Datová shránka: 6rey2pd».
Скриншот:
https://drive.google.com/file/d/1Wv_34hEZBN7cUkDe00yBuqRG5NaSQsGw/view?usp=drive_link
Мой код:
import os
import requests
from bs4 import BeautifulSoup

business_detail_url = "https://www.zivefirmy.cz/pohrebni-sluzba-vsetin-orgonik_f1770820?cz=15&loc=10000141"

response = requests.get(business_detail_url, headers={"User-Agent": "Mozilla/5.0"})

if response.status_code == 200:
#Parse the HTML content
soup = BeautifulSoup(response.text, 'html.parser')

#Locate all  elements
part_elements = soup.find_all('div', class_='part')

#Initialize list to store extracted data
extracted_data = []

#Iterate through each 
for index, part in enumerate(part_elements, start=1):
#Find all  elements inside the current part div
text_labels = part.find_all('span', class_='text-label')

#Extract text from each  and its sibling
for label in text_labels:
label_text = label.get_text(strip=True)

#Try to find the sibling element (e.g., )
sibling = label.find_next_sibling()
sibling_text = sibling.get_text(strip=True) if sibling else "No sibling found"

#Combine label and sibling text
extracted_data.append(f"Part {index}: {label_text} {sibling_text}")

#Define the file path for the output in the project folder
output_file_path = os.path.join(os.getcwd(), 'text_labels_with_siblings.txt')

#Save the extracted information to a file
with open(output_file_path, 'w', encoding='utf-8') as file:
for data in extracted_data:
file.write(data + "\n")

#Print summary to terminal
if extracted_data:
print("Extracted Data:")
for data in extracted_data:
print(data)
else:
print("No text-labels found within parts.")

print(f"Extracted text saved to '{output_file_path}'.")
print(f"Failed to retrieve business detail page. Status code: {response.status_code}")
 

Подробнее здесь: [url]https://stackoverflow.com/questions/79233807/how-to-get-text-from-class[/url]