Beautiful Soup: индекс списка выходит за пределы диапазона при очистке fbref.com

Beautiful Soup: индекс списка выходит за пределы диапазона при очистке fbref.com ⇐ Python

1 сообщение • Страница 1 из 1

Anonymous

Beautiful Soup: индекс списка выходит за пределы диапазона при очистке fbref.com

Сообщение Anonymous » 26 окт 2025, 12:13

Как исправить эту ошибку (очистка веб-страниц)

import requests

standings_url = "https://fbref.com/en/comps/9/Premier-League-Stats"
data = requests.get(standings_url)

from bs4 import BeautifulSoup
import pandas as pd
import time

soup = BeautifulSoup(data.text)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.expand_frame_repr', False)

standings_table = soup.select('table.stats_table')[0]

links = standings_table.find_all('a')
links = [l.get("href") for l in links]
links = [l for l in links if '/squads/' in l]

team_urls = [f"https://fbref.com{l}" for l in links]
data = requests.get(team_urls[0])

matches = pd.read_html(str(data.text), match="Scores & Fixtures")[0]

soup = BeautifulSoup(data.text)

links = soup.find_all('a')
links = [l.get("href") for l in links]
links = [l for l in links if l and 'all_comps/shooting/' in l]

data = requests.get(f"https://fbref.com{links[0]}")
shooting = pd.read_html(str(data.text), match="Shooting")[0]

shooting.columns = shooting.columns.droplevel()

team_data = matches.merge(shooting[["Date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="Date")

years = list(range(2023, 2020, -1))
all_matches = []

standings_url = "https://fbref.com/en/comps/9/Premier-League-Stats"

for year in years:
data = requests.get(standings_url)
soup = BeautifulSoup(data.text)
standings_table = soup.select('table.stats_table')[0]

links = [l.get("href") for l in standings_table.find_all('a')]
links = [l for l in links if '/squads/' in l]
team_urls = [f"https://fbref.com{l}" for l in links]

previous_season = soup.select("a.prev")[0].get("href")
standings_url = f"https://fbref.com{previous_season}"

for team_url in team_urls:
team_name = team_url.split('/')[-1].replace("-Stats", "").replace("-", " ")
data = requests.get(team_url)
matches = pd.read_html(str(data.text), match="Scores & Fixtures")[0]
soup = BeautifulSoup(data.text)
links = [l.get("href") for l in soup.find_all('a')]
links = [l for l in links if l and 'all_comps/shooting/' in l]
data = requests.get(f"https://fbref.com{links[0]}")
shooting = pd.read_html(str(data.text), match="Shooting")[0]
shooting.columns = shooting.columns.droplevel()
try:
team_data = matches.merge(shooting[["Date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="Date")
except ValueError:
continue

team_data["Season"] = year
team_data["Team"] = team_name
all_matches.append(team_data)
time.sleep(1)
break

match_df = pd.concat(all_matches)
match_df.columns = [c.lower() for c in match_df.columns]

match_df.to_csv("matches.csv")
print(match_df)

Ошибка:

Код: Выделить всё

Traceback (most recent call last):
File "C:\Users\user\PycharmProjects\WebScrapingEPL\EPL webscrape.py", line 48, in 
standings_table = soup.select('table.stats_table')[0]
IndexError: list index out of range

Пытаюсь извлечь журналы матчей за последние несколько лет для всех команд премьер-лиги. Получена ошибка, связанная с красивым супом. Может ли кто-нибудь помочь мне устранить эту ошибку?

Подробнее здесь: https://stackoverflow.com/questions/763 ... -fbref-com

Anonymous

1 сообщение • Страница 1 из 1

Вернуться в «Python»