Чтение PDF-файла с помощью записных книжек Azure SynapsePython

Программы на Python
Ответить Пред. темаСлед. тема
Гость
 Чтение PDF-файла с помощью записных книжек Azure Synapse

Сообщение Гость »


It's my first post, asking for a help, before I usually used examples from Stack overflow, but can't find and answer. I am sorry, if the formatting of my post is not great, will try to improve it for the future.

I am struggling with reading PDF files from Azure Date Lake Gen 2 with Azure Synapse Notebooks.

Reading CSV file is not problem, I can access CSV with command:

%%pyspark df = spark.read.load('abfss://**accountname**.dfs.core.windows.net/**file.csv**' ## If header exists uncomment line below ##, header=True ) display(df.limit(10)) But when I tried to read PDF, it's always failing. I used libraries like pypdf2 and camelot.

pdf_file = "abfss://**accountname**.dfs.core.windows.net/**file.pdf" # Open the PDF using PyPDF2 pdf_reader = PyPDF2.PdfReader(pdf_file) I receive an error:

FileNotFoundError: [Errno 2] No such file or directory

I tried to mount storage location as mentioned in this post - How can I read pdf or pptx or docx files in python from ADLS gen2 using Synapse?

Still can read CSV file from that mounted storage, but not PDF.

mssparkutils.fs.mount( "abfss://container@accountname.dfs.core.windows.net/", "/TR", {"LinkedService":"linkedservice"} ) # can get a path, this command is working: path = mssparkutils.fs.getMountPath("TR") print(path) import PyPDF2 with open("/synfs/mount#/TR/file.pdf") as f: pdf_reader = PyPDF2.PdfReader(f) Gives an error:

OSError: [Errno 5] Input/output error:

I tried to read using path, still not working.

file_name = path + "/file.pdf" print(file_name) reader = PyPDF2.PdfReader(open(file_name, 'rb')) gives an error: OSError: [Errno 5] Input/output error

Tried to use PyPDF2:

pdf_reader = PyPDF2.PdfReader(file_name) Gives an error:

logger_warning( 310 "PdfReader stream/file object is not in binary mode. " 311 "It may not be read correctly.", 312 name,'

Please advice, if you know how to solve it. I am using Azure Synapse Studio, not SDK.


Источник: https://stackoverflow.com/questions/780 ... -notebooks
Реклама
Ответить Пред. темаСлед. тема

Быстрый ответ

Изменение регистра текста: 
Смайлики
:) :( :oops: :roll: :wink: :muza: :clever: :sorry: :angel: :read: *x)
Ещё смайлики…
   
К этому ответу прикреплено по крайней мере одно вложение.

Если вы не хотите добавлять вложения, оставьте поля пустыми.

Максимально разрешённый размер вложения: 15 МБ.

  • Похожие темы
    Ответы
    Просмотры
    Последнее сообщение

Вернуться в «Python»