Парсинг сайта недвижимости с использованием Python

Парсинг сайта недвижимости с использованием Python ⇐ Python

Ответить

1 сообщение • Страница 1 из 1

Anonymous

Парсинг сайта недвижимости с использованием Python

Цитата

Сообщение Anonymous » 24 янв 2025, 08:12

Я пытаюсь получить номер MLS, цену и адрес объявлений о недвижимости с веб-сайта с помощью BeautifulSoup.

Код: Выделить всё

import requests
from bs4 import BeautifulSoup

# string url
str_url = 'https://www.utahrealestate.com/search/map.search'

# get response
response = requests.get(str_url)

# get html
soup = BeautifulSoup(response.text, 'html.parser')

# get the number of listings and assign it to int_n_pages (I cant get this to work; it returns NoneType)
int_n_pages = soup.find('li', {'class': 'view-results'})

# split and get n pages (this does not work because the previous line does not work)
int_n_pages = int(int_n_pages.split(' ')[2])

Далее я планирую перебрать все страницы и извлечь информацию из каждого списка.
Что-то вроде...

Код: Выделить всё

# empty list
list_dict_cards = []

# iterate through pages
for int_page in range(1, int_n_pages+1):

# get url
str_url = f'https://www.utahrealestate.com/search/map.search/page/{int_page}/vtype/map'

# get response
response = requests.get(str_url)

# get html
soup = BeautifulSoup(response.text, 'html.parser')

# get property cards
property_cards = soup.find_all(class_='property___card')

# iterate through property cards
for card in property_cards:

# empty dict
dict_card = {}

# get mls number
int_mls = card.find(class_='mls___number').text.split(' ')[1]

# put into dict_card
dict_card['mls'] = int_mls

# I would get other info here as well and put into dict_card

# append dict_card to list_cards
list_dict_cards.append(dict.card)

# make df
df_cards = pd.DataFrame(list_dict_cards)

# save
df_cards.to_csv('./output/df_dict_cards.csv', index=False)

Я почти уверен, что сайт пытается предотвратить программный доступ к большей части отображаемой информации.
Как/можно это обойти?< /п>

Подробнее здесь: https://stackoverflow.com/questions/707 ... ing-python

1737695533

Anonymous

Я пытаюсь получить номер MLS, цену и адрес объявлений о недвижимости с веб-сайта с помощью BeautifulSoup.
[code]import requests
from bs4 import BeautifulSoup

# string url
str_url = 'https://www.utahrealestate.com/search/map.search'

# get response
response = requests.get(str_url)

# get html
soup = BeautifulSoup(response.text, 'html.parser')

# get the number of listings and assign it to int_n_pages (I cant get this to work; it returns NoneType)
int_n_pages = soup.find('li', {'class': 'view-results'})

# split and get n pages (this does not work because the previous line does not work)
int_n_pages = int(int_n_pages.split(' ')[2])

[/code]
Далее я планирую перебрать все страницы и извлечь информацию из каждого списка.
Что-то вроде...[code]# empty list
list_dict_cards = []

# iterate through pages
for int_page in range(1, int_n_pages+1):

# get url
str_url = f'https://www.utahrealestate.com/search/map.search/page/{int_page}/vtype/map'

# get response
response = requests.get(str_url)

# get html
soup = BeautifulSoup(response.text, 'html.parser')

# get property cards
property_cards = soup.find_all(class_='property___card')

# iterate through property cards
for card in property_cards:

# empty dict
dict_card = {}

# get mls number
int_mls = card.find(class_='mls___number').text.split(' ')[1]

# put into dict_card
dict_card['mls'] = int_mls

# I would get other info here as well and put into dict_card

# append dict_card to list_cards
list_dict_cards.append(dict.card)

# make df
df_cards = pd.DataFrame(list_dict_cards)

# save
df_cards.to_csv('./output/df_dict_cards.csv', index=False)
[/code]
Я почти уверен, что сайт пытается предотвратить программный доступ к большей части отображаемой информации.
Как/можно это обойти?< /п> 

Подробнее здесь: [url]https://stackoverflow.com/questions/70715374/scraping-real-estate-website-using-python[/url]