I'm currently working on an app to search through PDF files. Using Page.FindOpen does wonders, but I found it was unreliable when searching phrases, or strings that would span more than one line for some PDF files. For example, searching "Hello world" in the following example:
"Make sure to print Hello world!"
Would be able to highlight "Hello world", but in the next example:
Most professors use the program Hello
world to introduce programming
"Hello world" would not get highlighted. Upon looking at the raw strings of the Page using Page.ObsGetString, I found that some documents had linebreaks and carriage returns (\r\n for example) at the end of each line. This makes searching long strings that span multiple lines with Page.FindOpen unreliable.
If there will not be a fix to these unexpected line breaks and carriage returns, is there existing documentation on how RadaeePdf generates the strings for the pdf? This will help immensely when working with Page.FindOpen to support long search strings and multi line searching.
Just an initial comment. Enabling multiline searching is done through the reader package (PDFLayoutView) rather than the pdf package (using Page.FindOpen). Is there no other way to enable multiline searching using the pdf package?
Regardless, will have a more thorough look at the code. Thanks!