Извлеките столбцы текста из файла PDF с помощью iText

Извлеките столбцы текста из файла PDF с помощью iText ⇐ JAVA

1 сообщение • Страница 1 из 1

Anonymous

Извлеките столбцы текста из файла PDF с помощью iText

Цитата

Сообщение Anonymous » 11 ноя 2024, 05:34

Мне нужно извлечь текст из PDF-файлов с помощью iText.

Проблема в том, что некоторые PDF-файлы содержат 2 столбца, и когда я извлекаю текст, я получаю текстовый файл, в котором В результате столбцы объединяются (т.е. текст из обоих столбцов в одной строке)

это код:

Код: Выделить всё

public class pdf
{
private static String INPUTFILE = "http://www.revuemedecinetropicale.com/TAP_519-522_-_AO_07151GT_Rasoamananjara__ao.pdf" ;
private static String OUTPUTFILE = "c:/new3.pdf";

public static void main(String[] args) throws DocumentException, IOException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(OUTPUTFILE));
document.open();

PdfReader reader = new PdfReader(INPUTFILE);
int n = reader.getNumberOfPages();

PdfImportedPage page;

// Go through all pages
for (int i = 1; i 

Подробнее здесь: [url]https://stackoverflow.com/questions/4028240/extract-columns-of-text-from-a-pdf-file-using-itext[/url]

1731292451

Anonymous

Мне нужно извлечь текст из PDF-файлов с помощью iText.

Проблема в том, что некоторые PDF-файлы содержат 2 столбца, и когда я извлекаю текст, я получаю текстовый файл, в котором В результате столбцы объединяются (т.е. текст из обоих столбцов в одной строке)

это код:

[code]public class pdf
{
private static String INPUTFILE = "http://www.revuemedecinetropicale.com/TAP_519-522_-_AO_07151GT_Rasoamananjara__ao.pdf" ;
private static String OUTPUTFILE = "c:/new3.pdf";

public static void main(String[] args) throws DocumentException, IOException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(OUTPUTFILE));
document.open();

PdfReader reader = new PdfReader(INPUTFILE);
int n = reader.getNumberOfPages();

PdfImportedPage page;

// Go through all pages
for (int i = 1; i 

Подробнее здесь: [url]https://stackoverflow.com/questions/4028240/extract-columns-of-text-from-a-pdf-file-using-itext[/url]