у меня есть следующее: в основном это.
Код: Выделить всё
response=requests.get('https://economictimes.indiatimes.com/news/international/business/volkswagen-sets-5-7-revenue-growth-target-preaches-cost-discipline/articleshow/101168014.cms',headers={"User-Agent" : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"})
soup = BeautifulSoup(response.content, 'html.parser')
if len(soup.body.find_all('h1'))>2: #to check if there is more than one tag
if i.endswith(".cms"): #to check if the website has .cms ending (i have my doubts on this part)
for elem in soup.next_siblings:
if elem.name == 'h1':
GET THE TEXT SOME HOW
break
Подробнее здесь: https://stackoverflow.com/questions/765 ... rst-h1-tag
Мобильная версия