it is possible manage on java layer, if you can manage UTF-16.
that Page.ObjsGetString(0, objsCharCount) return whole string value of page, and index of each char in this string shall be in UTF16 order.
another method Page.ObjsGetCharRect() return rectangle of each char.
that mean, you can judge by own, using char rect and space between 2 chars.
but in this way, you shall manage UTF-16 code by yourself.
i'm still not sure what you really want.
i share some codes here。
for example, you want get char code and rectangle area of a char by index:
String sval = page.ObjsGetString(start, end);
float rect = new float;
for( int i = start; i < end; i++)
char code = sval.charAt(i - start);//encoding as UTF16.
page.ObjsGetCharRect(i, rect);//array as [left,top,right,bottom].
//todo: process each char.
Hi, we checked this PDF file, and tested in last version of Edge(chrome core).
the extracted text is:
T U M U LT U A
also, we tested in PDF XChange pro, it is same to Edge.
this is depends on blank judgement of PDF software.