Использование приведенного ниже кода для получения и анализа данных с веб-сайта. Похоже, ошибка указывает на то, что возвращенный XML имеет неверный формат. Например, для оценки использовался валидатор XML, и в возвращаемом XML обнаружена следующая проблема:Error : InvalidTag
Line : 50
Message : Closing tag 'p' is expected inplace of 'P'.
Новичок в XML. Как устранить ошибки форматирования, возвращаемые со стороны сервера?
import pandas as pd
import requests
from datetime import datetime, timedelta
import xml.etree.ElementTree as ET
cik = "0000320193"
BASE_URL = "https://data.sec.gov"
USER_AGENT = "alias (alias199@gmail.com)"
ACC_ENCODING = "gzip, deflate"
HOST_NAME = "www.sec.gov"
headers = {
"User-Agent": USER_AGENT,
"Accept-Encoding": ACC_ENCODING,
"Host": HOST_NAME
}
filing_data = []
test = pd.DataFrame(
{'accessionNumber': ['0000320193-24-000132', '0000320193-24-000130', '0000320193-24-000129', '0000320193-24-000126', '0000320193-24-000116'],
'filingDate': ['2024-12-18', '2024-11-19', '2024-11-19', '2024-11-07', '2024-10-17'],
'form': ['4', '4' ,'4', '4', '4'],
'primaryDocument': ['xslF345X05/wk-form4_1734564614.xml', 'xslF345X05/wk-form4_1732059096.xml', 'xslF345X05/wk-form4_1732059042.xml', 'xslF345X05/wk-form4_1731022209.xml', 'xslF345X05/wk-form4_1729204211.xml']
}
)
for i, filing in test.iterrows():
filing_url = f"{BASE_URL}/Archives/edgar/data/{int(cik)}/{filing['accessionNumber'].replace('-', '')}/{filing['primaryDocument']}"
try:
response = requests.get(filing_url, headers=headers)
if response.status_code == 200:
try:
# Clean up the response content before parsing
content = response.content.decode('utf-8', errors='ignore')
# Fix malformed XML by identifying unclosed tags and repairing them
try:
root = ET.fromstring(content)
except ET.ParseError:
# Attempt to auto-correct malformed XML
if not content.strip().startswith(""):
content = f"{content}"
root = ET.fromstring(content)
filing_info = {
"accessionNumber": filing["accessionNumber"],
"filingDate": filing["filingDate"],
"form": filing["form"],
"content": {},
}
for child in root.iter():
filing_info["content"][child.tag] = child.text
filing_data.append(filing_info)
except ET.ParseError as e:
print(f"Error parsing XML for filing: {filing_url}. Error: {e}")
else:
print(f"Failed to retrieve filing document: {filing_url}. HTTP Status: {response.status_code}")
except Exception as e:
print(f"Error retrieving filing document: {filing_url}. Error: {e}")
Пример XML ниже:
print(response.content)
b'\n\n\n\nSEC FORM \n 4\n\n .FormData {color: blue; background-color: white; font-size: small; font-family: Times, serif;}\n .FormDataC {color: blue; background-color: white; font-size: small; font-family: Times, serif; text-align: center;}\n .FormDataR {color: blue; background-color: white; font-size: small; font-family: Times, serif; text-align: right;}\n .SmallFormData {color: blue; background-color: white; font-size: x-small; font-family: Times, serif;}\n .FootnoteData {color: green; background-color: white; font-size: x-small; font-family: Times, serif;}\n .FormNumText {font-size: small; font-weight: bold; font-family: arial, helvetica, sans-serif;}\n .FormAttention {font-size: medium; font-weight: bold; font-family: helvetica;}\n .FormText {font-size: small; font-weight: normal; font-family: arial, helvetica, sans-serif; text-align: left;}\n .FormTextR {font-size: small; font-weight: normal; font-family: arial, helvetica, sans-serif; text-align: right;}\n .FormTextC {font-size: small; font-weight: normal; font-family: arial, helvetica, sans-serif; text-align: center;}\n .FormEMText {font-size: medium; font-style: italic; font-weight: normal; font-family: arial, helvetica, sans-serif;}\n .FormULText {font-size: medium; text-decoration: underline; font-weight: normal; font-family: arial, helvetica, sans-serif;}\n .SmallFormText {font-size: xx-small; font-family: arial, helvetica, sans-serif; text-align: left;}\n .SmallFormTextR {font-size: xx-small; font-family: arial, helvetica, sans-serif; text-align: right;}\n .SmallFormTextC {font-size: xx-small; font-family: arial, helvetica, sans-serif; text-align: center;}\n .MedSmallFormText {font-size: x-small; font-family: arial, helvetica, sans-serif; text-align: left;}\n .FormTitle {font-size: medium; font-family: arial, helvetica, sans-serif; font-weight: bold;}\n .FormTitle1 {font-size: small; font-family: arial, helvetica, sans-serif; font-weight: bold; border-top: black thick solid;}\n .FormTitle2 {font-size: small; font-family: arial, helvetica, sans-serif; font-weight: bold;}\n .FormTitle3 {font-size: small; font-family: arial, helvetica, sans-serif; font-weight: bold; padding-top: 2em; padding-bottom: 1em;}\n .SectionTitle {font-size: small; text-align: left; font-family: arial, helvetica, sans-serif; \n \t\tfont-weight: bold; border-top: gray thin solid; border-bottom: gray thin solid;}\n .FormName {font-size: large; font-family: arial, helvetica, sans-serif; font-weight: bold;}\n .CheckBox {text-align: center; width: 5px; cell-spacing: 0; padding: 0 3 0 3; border-width: thin; border-style: solid; border-color: black:}\n body {background: white;}\n \n\nSEC Form 4 \n \n\nFORM 4\n\nUNITED STATES SECURITIES AND EXCHANGE COMMISSION
Washington, D.C. 20549
STATEMENT OF CHANGES IN BENEFICIAL OWNERSHIP
Filed pursuant to Section 16(a) of the Securities Exchange Act of 1934
or Section 30(h) of the Investment Company Act of 1940\n\n\nOMB APPROVAL\n\n\nOMB Number:\n3235-0287\n\nEstimated average burden\n\nhours per response:\n0.5\n\n\n\n\n\n\xc2\xa0\xc2\xa0\nCheck this box if no longer subject to Section 16. Form 4 or Form 5 obligations may continue. \n See\n\n Instruction 1(b).\n\n\n\xc2\xa0\xc2\xa0\nCheck this box to indicate that a transaction was made pursuant to a contract, instruction or written plan for the purchase or sale of equity securities of the issuer that is intended to satisfy the affirmative defense conditions of Rule 10b5-1(c). See Instruction 10. \n\n\n\n\n\n1. Name and Address of Reporting Person*KONDO CHRIS\n\n\n(Last)\n(First)\n(Middle)\n\n\nONE APPLE PARK WAY\n\n\n\n(Street)\nCUPERTINO\nCA\n95014\n\n\n\n(City)\n(State)\n(Zip)\n\n\n\n2. Issuer Name and Ticker or Trading Symbol\n Apple Inc.\n [ AAPL ]\n \n\n5. Relationship of Reporting Person(s) to Issuer\n
(Check all applicable)\n\n\nDirector\n\n10% Owner\n\n\nX\nOfficer (give title below)\n\nOther (specify below)\n\n\n\nPrincipal Accounting Officer\n\n\n\n\n\n\n\n3. Date of Earliest Transaction\n (Month/Day/Year)
10/15/2024\n\n\n\n4. If Amendment, Date of Original Filed\n (Month/Day/Year)
\n\n\n6. Individual or Joint/Group Filing (Check Applicable Line)\n \n\nX\nForm filed by One Reporting Person\n\n\n\nForm filed by More than One Reporting Person\n\n\n\n\n\n\n\nTable I - Non-Derivative Securities Acquired, Disposed of, or Beneficially Owned\n\n1. Title of Security (Instr. \n 3)\n \n2. Transaction Date\n (Month/Day/Year)\n2A. Deemed Execution Date, if any\n (Month/Day/Year)\n3. Transaction Code (Instr. \n 8)\n \n4. Securities Acquired (A) or Disposed Of (D) (Instr. \n 3, 4 and 5)\n \n5. \n Amount of Securities Beneficially Owned Following Reported Transaction(s) (Instr. \n 3 and 4)\n \n6. Ownership Form: Direct (D) or Indirect (I) (Instr. \n 4)\n \n7. Nature of Indirect Beneficial Ownership (Instr. \n 4)\n \n\n\nCode\nV\nAmount\n(A) or (D)\nPrice\n\n\n\n\nCommon Stock\n10/15/2024\n\nM\n\n8,115\nA\n(1)\n23,534\nD\n\n\n\n\nCommon Stock(2)\n\n10/15/2024\n\nF\n\n3,985\nD\n\n$233.85\n\n19,549\nD\n\n\n\n\n\n\n\nTable II - Derivative Securities Acquired, Disposed of, or Beneficially Owned(e.g., puts, calls, warrants, options, convertible securities)\n\n\n1. Title of Derivative Security (Instr. \n 3)\n \n2. Conversion or Exercise Price of Derivative Security\n \n3. Transaction Date\n (Month/Day/Year)\n3A. Deemed Execution Date, if any\n (Month/Day/Year)\n4. Transaction Code (Instr. \n 8)\n \n5. \n Number of Derivative Securities Acquired (A) or Disposed of (D) (Instr. \n 3, 4 and 5)\n \n6. Date Exercisable and Expiration Date \n (Month/Day/Year)\n7. Title and Amount of Securities Underlying Derivative Security (Instr. \n 3 and 4)\n \n8. Price of Derivative Security (Instr. \n 5)\n \n9. \n Number of derivative Securities Beneficially Owned Following Reported Transaction(s) (Instr. \n 4)\n \n10. Ownership Form: Direct (D) or Indirect (I) (Instr. \n 4)\n \n11. Nature of Indirect Beneficial Ownership (Instr. \n 4)\n \n\n\nCode\nV\n(A)\n(D)\nDate Exercisable\nExpiration Date\nTitle\n
Подробнее здесь: https://stackoverflow.com/questions/793 ... ned-xml-fi
Получение ошибки «неправильно сформированный (неверный токен)» при анализе возвращенного XML-файла с использованием запр ⇐ Python
Программы на Python
1736975255
Anonymous
Использование приведенного ниже кода для получения и анализа данных с веб-сайта. Похоже, ошибка указывает на то, что возвращенный XML имеет неверный формат. Например, для оценки использовался валидатор XML, и в возвращаемом XML обнаружена следующая проблема:[b]Error : InvalidTag
Line : 50
Message : Closing tag 'p' is expected inplace of 'P'.
[b]Новичок в XML. Как устранить ошибки форматирования, возвращаемые со стороны сервера?[/b]
import pandas as pd
import requests
from datetime import datetime, timedelta
import xml.etree.ElementTree as ET
cik = "0000320193"
BASE_URL = "https://data.sec.gov"
USER_AGENT = "alias (alias199@gmail.com)"
ACC_ENCODING = "gzip, deflate"
HOST_NAME = "www.sec.gov"
headers = {
"User-Agent": USER_AGENT,
"Accept-Encoding": ACC_ENCODING,
"Host": HOST_NAME
}
filing_data = []
test = pd.DataFrame(
{'accessionNumber': ['0000320193-24-000132', '0000320193-24-000130', '0000320193-24-000129', '0000320193-24-000126', '0000320193-24-000116'],
'filingDate': ['2024-12-18', '2024-11-19', '2024-11-19', '2024-11-07', '2024-10-17'],
'form': ['4', '4' ,'4', '4', '4'],
'primaryDocument': ['xslF345X05/wk-form4_1734564614.xml', 'xslF345X05/wk-form4_1732059096.xml', 'xslF345X05/wk-form4_1732059042.xml', 'xslF345X05/wk-form4_1731022209.xml', 'xslF345X05/wk-form4_1729204211.xml']
}
)
for i, filing in test.iterrows():
filing_url = f"{BASE_URL}/Archives/edgar/data/{int(cik)}/{filing['accessionNumber'].replace('-', '')}/{filing['primaryDocument']}"
try:
response = requests.get(filing_url, headers=headers)
if response.status_code == 200:
try:
# Clean up the response content before parsing
content = response.content.decode('utf-8', errors='ignore')
# Fix malformed XML by identifying unclosed tags and repairing them
try:
root = ET.fromstring(content)
except ET.ParseError:
# Attempt to auto-correct malformed XML
if not content.strip().startswith(""):
content = f"{content}"
root = ET.fromstring(content)
filing_info = {
"accessionNumber": filing["accessionNumber"],
"filingDate": filing["filingDate"],
"form": filing["form"],
"content": {},
}
for child in root.iter():
filing_info["content"][child.tag] = child.text
filing_data.append(filing_info)
except ET.ParseError as e:
print(f"Error parsing XML for filing: {filing_url}. Error: {e}")
else:
print(f"Failed to retrieve filing document: {filing_url}. HTTP Status: {response.status_code}")
except Exception as e:
print(f"Error retrieving filing document: {filing_url}. Error: {e}")
Пример XML ниже:
print(response.content)
b'\n\n\n\nSEC FORM \n 4\n\n .FormData {color: blue; background-color: white; font-size: small; font-family: Times, serif;}\n .FormDataC {color: blue; background-color: white; font-size: small; font-family: Times, serif; text-align: center;}\n .FormDataR {color: blue; background-color: white; font-size: small; font-family: Times, serif; text-align: right;}\n .SmallFormData {color: blue; background-color: white; font-size: x-small; font-family: Times, serif;}\n .FootnoteData {color: green; background-color: white; font-size: x-small; font-family: Times, serif;}\n .FormNumText {font-size: small; font-weight: bold; font-family: arial, helvetica, sans-serif;}\n .FormAttention {font-size: medium; font-weight: bold; font-family: helvetica;}\n .FormText {font-size: small; font-weight: normal; font-family: arial, helvetica, sans-serif; text-align: left;}\n .FormTextR {font-size: small; font-weight: normal; font-family: arial, helvetica, sans-serif; text-align: right;}\n .FormTextC {font-size: small; font-weight: normal; font-family: arial, helvetica, sans-serif; text-align: center;}\n .FormEMText {font-size: medium; font-style: italic; font-weight: normal; font-family: arial, helvetica, sans-serif;}\n .FormULText {font-size: medium; text-decoration: underline; font-weight: normal; font-family: arial, helvetica, sans-serif;}\n .SmallFormText {font-size: xx-small; font-family: arial, helvetica, sans-serif; text-align: left;}\n .SmallFormTextR {font-size: xx-small; font-family: arial, helvetica, sans-serif; text-align: right;}\n .SmallFormTextC {font-size: xx-small; font-family: arial, helvetica, sans-serif; text-align: center;}\n .MedSmallFormText {font-size: x-small; font-family: arial, helvetica, sans-serif; text-align: left;}\n .FormTitle {font-size: medium; font-family: arial, helvetica, sans-serif; font-weight: bold;}\n .FormTitle1 {font-size: small; font-family: arial, helvetica, sans-serif; font-weight: bold; border-top: black thick solid;}\n .FormTitle2 {font-size: small; font-family: arial, helvetica, sans-serif; font-weight: bold;}\n .FormTitle3 {font-size: small; font-family: arial, helvetica, sans-serif; font-weight: bold; padding-top: 2em; padding-bottom: 1em;}\n .SectionTitle {font-size: small; text-align: left; font-family: arial, helvetica, sans-serif; \n \t\tfont-weight: bold; border-top: gray thin solid; border-bottom: gray thin solid;}\n .FormName {font-size: large; font-family: arial, helvetica, sans-serif; font-weight: bold;}\n .CheckBox {text-align: center; width: 5px; cell-spacing: 0; padding: 0 3 0 3; border-width: thin; border-style: solid; border-color: black:}\n body {background: white;}\n \n\nSEC Form 4 \n \n\nFORM 4\n\nUNITED STATES SECURITIES AND EXCHANGE COMMISSION
Washington, D.C. 20549
STATEMENT OF CHANGES IN BENEFICIAL OWNERSHIP
Filed pursuant to Section 16(a) of the Securities Exchange Act of 1934
or Section 30(h) of the Investment Company Act of 1940\n\n\nOMB APPROVAL\n\n\nOMB Number:\n3235-0287\n\nEstimated average burden\n\nhours per response:\n0.5\n\n\n\n\n\n\xc2\xa0\xc2\xa0\nCheck this box if no longer subject to Section 16. Form 4 or Form 5 obligations may continue. \n [i]See[/i]\n\n Instruction 1(b).\n\n\n\xc2\xa0\xc2\xa0\nCheck this box to indicate that a transaction was made pursuant to a contract, instruction or written plan for the purchase or sale of equity securities of the issuer that is intended to satisfy the affirmative defense conditions of Rule 10b5-1(c). See Instruction 10. \n\n\n\n\n\n1. Name and Address of Reporting Person*[url=/cgi-bin/browse-edgar?action=getcompany&CIK=0001631982]KONDO CHRIS[/url]\n\n\n(Last)\n(First)\n(Middle)\n\n\nONE APPLE PARK WAY\n\n\n\n(Street)\nCUPERTINO\nCA\n95014\n\n\n\n(City)\n(State)\n(Zip)\n\n\n\n2. Issuer Name and[/b] Ticker or Trading Symbol\n [b][url=/cgi-bin/browse-edgar?action=getcompany&CIK=0000320193]Apple Inc.[/url]\n [ AAPL ]\n \n\n5. Relationship of Reporting Person(s) to Issuer\n
(Check all applicable)\n\n\nDirector\n\n10% Owner\n\n\nX\nOfficer (give title below)\n\nOther (specify below)\n\n\n\nPrincipal Accounting Officer\n\n\n\n\n\n\n\n3. Date of Earliest Transaction\n (Month/Day/Year)
10/15/2024\n\n\n\n4. If Amendment, Date of Original Filed\n (Month/Day/Year)
\n\n\n6. Individual or Joint/Group Filing (Check Applicable Line)\n \n\nX\nForm filed by One Reporting Person\n\n\n\nForm filed by More than One Reporting Person\n\n\n\n\n\n\n\nTable I - Non-Derivative Securities Acquired, Disposed of, or Beneficially Owned[/b]\n\n1. Title of Security (Instr. \n 3)\n \n2. Transaction Date\n (Month/Day/Year)\n2A. Deemed Execution Date, if any\n (Month/Day/Year)\n3. Transaction Code (Instr. \n 8)\n \n4. Securities Acquired (A) or Disposed Of (D) (Instr. \n 3, 4 and 5)\n \n5. \n Amount of Securities Beneficially Owned Following Reported Transaction(s) (Instr. \n 3 and 4)\n \n6. Ownership Form: Direct (D) or Indirect (I) (Instr. \n 4)\n \n7. Nature of Indirect Beneficial Ownership (Instr. \n 4)\n \n\n\nCode\nV\nAmount\n(A) or (D)\nPrice\n\n\n\n\nCommon Stock\n10/15/2024\n\nM\n\n8,115\nA\n(1)\n23,534\nD\n\n\n\n\nCommon Stock(2)\n\n10/15/2024\n\nF\n\n3,985\nD\n\n$233.85\n\n19,549\nD\n\n\n\n\n\n\n\n[b]Table II - Derivative Securities Acquired, Disposed of, or Beneficially Owned[/b][b](e.g., puts, calls, warrants, options, convertible securities)[/b]\n\n\n1. Title of Derivative Security (Instr. \n 3)\n \n2. Conversion or Exercise Price of Derivative Security\n \n3. Transaction Date\n (Month/Day/Year)\n3A. Deemed Execution Date, if any\n (Month/Day/Year)\n4. Transaction Code (Instr. \n 8)\n \n5. \n Number of Derivative Securities Acquired (A) or Disposed of (D) (Instr. \n 3, 4 and 5)\n \n6. Date Exercisable and Expiration Date \n (Month/Day/Year)\n7. Title and Amount of Securities Underlying Derivative Security (Instr. \n 3 and 4)\n \n8. Price of Derivative Security (Instr. \n 5)\n \n9. \n Number of derivative Securities Beneficially Owned Following Reported Transaction(s) (Instr. \n 4)\n \n10. Ownership Form: Direct (D) or Indirect (I) (Instr. \n 4)\n \n11. Nature of Indirect Beneficial Ownership (Instr. \n 4)\n \n\n\nCode\nV\n(A)\n(D)\nDate Exercisable\nExpiration Date\nTitle\n
Подробнее здесь: [url]https://stackoverflow.com/questions/79359703/getting-a-not-well-formed-invalid-token-error-when-parsing-a-returned-xml-fi[/url]
Ответить
1 сообщение
• Страница 1 из 1
Перейти
- Кемерово-IT
- ↳ Javascript
- ↳ C#
- ↳ JAVA
- ↳ Elasticsearch aggregation
- ↳ Python
- ↳ Php
- ↳ Android
- ↳ Html
- ↳ Jquery
- ↳ C++
- ↳ IOS
- ↳ CSS
- ↳ Excel
- ↳ Linux
- ↳ Apache
- ↳ MySql
- Детский мир
- Для души
- ↳ Музыкальные инструменты даром
- ↳ Печатная продукция даром
- Внешняя красота и здоровье
- ↳ Одежда и обувь для взрослых даром
- ↳ Товары для здоровья
- ↳ Физкультура и спорт
- Техника - даром!
- ↳ Автомобилистам
- ↳ Компьютерная техника
- ↳ Плиты: газовые и электрические
- ↳ Холодильники
- ↳ Стиральные машины
- ↳ Телевизоры
- ↳ Телефоны, смартфоны, плашеты
- ↳ Швейные машинки
- ↳ Прочая электроника и техника
- ↳ Фототехника
- Ремонт и интерьер
- ↳ Стройматериалы, инструмент
- ↳ Мебель и предметы интерьера даром
- ↳ Cантехника
- Другие темы
- ↳ Разное даром
- ↳ Давай меняться!
- ↳ Отдам\возьму за копеечку
- ↳ Работа и подработка в Кемерове
- ↳ Давай с тобой поговорим...
Мобильная версия