Signin/Signup with: 
Welcome, Guest
Username: Password: Remember me
Questions about Android development and PDF
  • Page:
  • 1
  • 2

TOPIC:

OnPDFSelectEnd return text without white spaces 3 years 2 months ago #15405

  • radaee
  • radaee's Avatar
  • Offline
  • Moderator
  • Moderator
  • Posts: 1123
  • Thank you received: 73
dear user:
it is possible manage on java layer, if you can manage UTF-16.
that Page.ObjsGetString(0, objsCharCount) return whole string value of page, and index of each char in this string shall be in UTF16 order.
another method Page.ObjsGetCharRect() return rectangle of each char.
that mean, you can judge by own, using char rect and space between 2 chars.
but in this way, you shall manage UTF-16 code by yourself.

Please Log in or Create an account to join the conversation.

Last edit: by radaee.

OnPDFSelectEnd return text without white spaces 2 years 7 months ago #15622

  • thiagopelikan
  • thiagopelikan's Avatar Topic Author
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 36
  • Thank you received: 0
Could you give me an example how to do this?

For example, in the attached file, the ObjsGetString method gives me:
"TORCIDA T U M U L T U A AMBIENTE"

As I see, the 3 lines in the example are exactly with the same space between the chars

How can I use ObjsGetCharRect and extract the word from a RECT?

Tks,
Thiago
Attachments:

Please Log in or Create an account to join the conversation.

OnPDFSelectEnd return text without white spaces 2 years 7 months ago #15623

  • radaee
  • radaee's Avatar
  • Offline
  • Moderator
  • Moderator
  • Posts: 1123
  • Thank you received: 73
dear user,
i'm still not sure what you really want.
i share some codes here。
for example, you want get char code and rectangle area of a char by index:
String sval = page.ObjsGetString(start, end);
float[] rect = new float[4];
for( int i = start; i < end; i++)
{
   char code = sval.charAt(i - start);//encoding as UTF16.
   page.ObjsGetCharRect(i, rect);//array as [left,top,right,bottom].
   //todo: process each char.
}

 

Please Log in or Create an account to join the conversation.

Last edit: by radaee.

OnPDFSelectEnd return text without white spaces 2 years 7 months ago #15624

  • support
  • support's Avatar
  • Offline
  • Administrator
  • Administrator
  • Posts: 692
  • Thank you received: 59
Dear Thiago, your issue seems to be related to the way your test pdf file was generated and how chars are spaced on the page.
May you provide the file, please?

Please Log in or Create an account to join the conversation.

OnPDFSelectEnd return text without white spaces 2 years 7 months ago #15625

  • thiagopelikan
  • thiagopelikan's Avatar Topic Author
  • Offline
  • Junior Member
  • Junior Member
  • Posts: 36
  • Thank you received: 0
Hi, I had to upload the file to my repository because it has 6MB.

itgames-my.sharepoint.com/:b:/g/personal...zre9kwNyt5w?e=3dSTOv

Tks,
Thiago

Please Log in or Create an account to join the conversation.

OnPDFSelectEnd return text without white spaces 2 years 7 months ago #15626

  • radaee
  • radaee's Avatar
  • Offline
  • Moderator
  • Moderator
  • Posts: 1123
  • Thank you received: 73
Hi, we checked this PDF file, and tested in last version  of Edge(chrome core).
the extracted text is:
TORCIDA
T U M U LT U A
AMBIENTE
also, we tested in PDF XChange pro, it is same to Edge.
this is depends on blank judgement of PDF software.

Please Log in or Create an account to join the conversation.

  • Page:
  • 1
  • 2
Powered by Kunena Forum