Character index sequencing counts extra characters

mossdude
Topic Author
Offline
New Member

IP: 192.168.0.71 11 years 8 months ago #7652 by mossdude

Character index sequencing counts extra characters was created by mossdude

I'm developing an Android app that exchanges PDF text highlights via database objects with similar apps using other PDF parsers in Flash and iOS. To that end I need to be able to extract the following information about a selection:

The highlighted text
The pixel coordinates of the highlight rectangles
The start and end indices of the selected text.

So far so good, I have figured out how to accomplish this, but it seems that the character indices (start, end) don't match those derived in the other two PDF parsers, namely, the parser in the Radaee component seems to be counting extra characters. Except for the first few characters in the sequence, ObjsGetCharIndex() returns a value which is greater than the character's actual position in the text stream. I can't think of a way to correct for this, and without correct indexing this app won't be compatible with the other apps.

I can attach a single page example PDF and a log of the character sequencing that I expect to see, if useful.

radaee
Offline
Platinum Member

IP: 192.168.0.71 11 years 8 months ago #7654 by radaee

Replied by radaee on topic Character index sequencing counts extra characters

it is not bug.
different PDF lib may got different result for extracting text.
algorithm of extracting text is(or shall) not defined in PDF reference.
you can save database as rects, by [PDFAnnot getMarkupRects];

Time to create page: 0.593 seconds