Чтобы извлечь логотип и максимально точный текст из отсканированного PDF-файла с помощью OCR и Python. ⇐ Python
Чтобы извлечь логотип и максимально точный текст из отсканированного PDF-файла с помощью OCR и Python.
So I want to extract logo and tabular data from scanned invoice PDF. When I try to extract logos as images with using pypdf2 library, I have result whole document as image. That is why I can not reach the logo.
And another problem is that, when I use the Pytesseract for extracting tabular text data, my output was not logic and OCR did not return correct outputs.
Which way should I follow to extract these things ?
I try to convert pdf to image and manipulate that image and convert again to pdf. Then I try to read text with using pdfreader libraries (e.g. pdfplumber, pypdf2, etc.) But I did not get any acceptable results.
Источник: https://stackoverflow.com/questions/781 ... -and-pytho
So I want to extract logo and tabular data from scanned invoice PDF. When I try to extract logos as images with using pypdf2 library, I have result whole document as image. That is why I can not reach the logo.
And another problem is that, when I use the Pytesseract for extracting tabular text data, my output was not logic and OCR did not return correct outputs.
Which way should I follow to extract these things ?
I try to convert pdf to image and manipulate that image and convert again to pdf. Then I try to read text with using pdfreader libraries (e.g. pdfplumber, pypdf2, etc.) But I did not get any acceptable results.
Источник: https://stackoverflow.com/questions/781 ... -and-pytho
-
- Похожие темы
- Ответы
- Просмотры
- Последнее сообщение
-
-
Извлечение маркированных диаграмм из отсканированного PDF-файла, например вопросника
Anonymous » » в форуме Python - 0 Ответы
- 15 Просмотры
-
Последнее сообщение Anonymous
-