



Here see the tags image first six tags are paragraph, then header and then again three more paragraph tag. Same things coming in content panel i.e first six and last three paragraph tag and then header, annotation as well. But if you look into reading order panel image then first six paragraph coming as a single reading order and last three paragraph is also coming as single reading order and this is main problem that I am facing.
Also if page has all tens paragraph then its reading order consider as a single reading order. So, whole summary is that if page has continuous similar structural elements, then it reads as a single structural element inside the reading order. (Structure elements may be headers, list, paragraph.)
Below is code logic for drawing text into content stream of current page.
private PDStructureElement addTextCharByChar(List textinfoList, String elementType, PDPage currentPage, PDStructureElement Parent) throws IOException { PDResources res = currentPage.getResources(); PDStructureElement currParent = null; currentContentStream.beginText(); if (elementType.toLowerCase().equals("h2")) { beginMarkedConent(COSName.H); for(TextPositionsInfo textInfo : textinfoList) { PDFont font = getFonts(res, textInfo.fontName); if(font != null) { currentContentStream.setFont(font, 1); Matrix _tm = textInfo.textMatrix; currentContentStream.setTextMatrix(_tm); currentContentStream.showText(textInfo.unicode); } } currentContentStream.endMarkedContent(); currParent = addStructEleToStructEleTree(elementType, Parent,currentPage, COSName.H); } else if (elementType.toLowerCase().equals("p")) { beginMarkedConent(COSName.P); for(TextPositionsInfo textInfo : textinfoList) { PDFont font = getFonts(res, textInfo.fontName); if(font != null) { currentContentStream.setFont(font, 1); currentContentStream.setTextMatrix(textInfo.textMatrix); currentContentStream.showText(textInfo.unicode); } } currParent = addStructEleToStructEleTree(elementType, Parent,currentPage, COSName.P); currentContentStream.endMarkedContent(); } currentContentStream.endText(); return currParent; } private PDStructureElement addStructEleToStructEleTree(String elementtype, PDStructureElement Parent,PDPage currentPage, COSName name) { PDStructureElement StructEle = new PDStructureElement(elementtype, Parent); StructEle.setPage(currentPage); PDMarkedContent markedContent = new PDMarkedContent(name, currentMarkedContentDictionary); StructEle.appendKid(markedContent); Parent.appendKid(StructEle); return StructEle; } private COSDictionary beginMarkedConent(COSName name) throws IOException { currentMarkedContentDictionary = new COSDictionary(); currentMarkedContentDictionary.setInt(COSName.MCID, mcid); mcid++; currentContentStream.beginMarkedContent(name, PDPropertyList.create(currentMarkedContentDictionary)); return currentMarkedContentDictionary; } So, please help me where things are going wrong.
Источник: https://stackoverflow.com/questions/781 ... ral-elemen
Мобильная версия