Сообщение об ошибке при очистке нескольких фильмов через bs4 [дубликат]

Сообщение об ошибке при очистке нескольких фильмов через bs4 [дубликат] ⇐ Python

Ответить

1 сообщение • Страница 1 из 1

Anonymous

Сообщение об ошибке при очистке нескольких фильмов через bs4 [дубликат]

Цитата

Сообщение Anonymous » 30 окт 2025, 18:11

Код Python:

Код: Выделить всё

from bs4 import BeautifulSoup
import requests

#####################################################
# Extracting the links of multiple movie transcripts
#####################################################

# How To Get The HTML
root = 'https://subslikescript.com'  # this is the homepage of the website
website = f'{root}/movies'  # concatenating the homepage with the movies section
result = requests.get(website)
content = result.text
soup = BeautifulSoup(content, 'lxml')
# print(soup.prettify())  # prints the HTML of the website

# Locate the box that contains a list of movies
box = soup.find('article', class_='main-article')

# Store each link in "links" list (href doesn't consider root aka "homepage", so we have to concatenate it later)
links = []
for link in box.find_all('a', href=True):  # find_all returns a list
links.append(link['href'])

#################################################
# Extracting the movie transcript
#################################################

# Loop through the "links" list and sending a request to each link
for link in links:
result = requests.get(f'{root}/{link}')
content = result.text
soup = BeautifulSoup(content, 'lxml')

# Locate the box that contains title and transcript
box = soup.find('article', class_='main-article')
# Locate title and transcript
title = box.find('h1').get_text()
title = ''.join(title.split('/'))
transcript = box.find('div', class_='full-script').get_text(strip=True, separator=' ')

# Exporting data in a text file with the "title" name
with open(f'{title}.txt', 'w') as file:
file.write(transcript)

Как устранить следующее сообщение об ошибке, изменив приведенный выше код Python?

Код: Выделить всё

Traceback (most recent call last):
File "C:\Users\Administrator\PycharmProjects\WebScraping\2.bs4-multiple-movies.py", line 43, in 
file.write(transcript)
~~~~~~~~~~^^^^^^^^^^^^
File "C:\Program Files\Python313\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u266a' in position 438: character maps to 

Process finished with exit code 1

Эта проблема была решена после того, как строка 42 была отредактирована следующим образом: с open(f'{title}.txt', 'w',coding="utf-8") как файл: но как решить проблему, связанную с тем, что некоторые выходные файлы отображаются пустыми?

Подробнее здесь: https://stackoverflow.com/questions/798 ... es-via-bs4

1761837066

Anonymous

Код Python:
[code]from bs4 import BeautifulSoup
import requests

#####################################################
# Extracting the links of multiple movie transcripts
#####################################################

# How To Get The HTML
root = 'https://subslikescript.com'  # this is the homepage of the website
website = f'{root}/movies'  # concatenating the homepage with the movies section
result = requests.get(website)
content = result.text
soup = BeautifulSoup(content, 'lxml')
# print(soup.prettify())  # prints the HTML of the website

# Locate the box that contains a list of movies
box = soup.find('article', class_='main-article')

# Store each link in "links" list (href doesn't consider root aka "homepage", so we have to concatenate it later)
links = []
for link in box.find_all('a', href=True):  # find_all returns a list
links.append(link['href'])

#################################################
# Extracting the movie transcript
#################################################

# Loop through the "links" list and sending a request to each link
for link in links:
result = requests.get(f'{root}/{link}')
content = result.text
soup = BeautifulSoup(content, 'lxml')

# Locate the box that contains title and transcript
box = soup.find('article', class_='main-article')
# Locate title and transcript
title = box.find('h1').get_text()
title = ''.join(title.split('/'))
transcript = box.find('div', class_='full-script').get_text(strip=True, separator=' ')

# Exporting data in a text file with the "title" name
with open(f'{title}.txt', 'w') as file:
file.write(transcript)
[/code]
Как устранить следующее сообщение об ошибке, изменив приведенный выше код Python?
[code]Traceback (most recent call last):
File "C:\Users\Administrator\PycharmProjects\WebScraping\2.bs4-multiple-movies.py", line 43, in 
file.write(transcript)
~~~~~~~~~~^^^^^^^^^^^^
File "C:\Program Files\Python313\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u266a' in position 438: character maps to 

Process finished with exit code 1
[/code]
Эта проблема была решена после того, как строка 42 была отредактирована следующим образом: с open(f'{title}.txt', 'w',coding="utf-8") как файл: но как решить проблему, связанную с тем, что некоторые выходные файлы отображаются пустыми?
 

Подробнее здесь: [url]https://stackoverflow.com/questions/79804573/error-message-when-scraping-mutliple-movies-via-bs4[/url]