Knowledge Base - Identify whether a PDF contains a text or not

Searchable PDF is essentially a PDF image file. Unlike static image formats such as TIFF, JPEG and BMP, every PDA document has the ability to contain several layers of information i.e. image layer and text layer. The image layer carries information including the actual image, resolution, compression method, color depth, etc. Similarly, the text layer includes the actual ASCII text and an identification of the text's location on the page. In simple terms the Searchable PDF's text portions of the scanned document gets stored in a text layer, allowing the user to easily search for and locate any keyword within the scanned document.

To check whether you can extract a text from it or not, here is a sample code that can be used to do so:

public boolean isPDFSearchable(Document m_doc) {

        if (m_doc != null) {

            int totalPageCount = m_doc.GetPageCount();

            for (int pageCount = 0; pageCount <= totalPageCount - 1; pageCount++) {

                Page page = m_doc.GetPage(pageCount);

                page.ObjsStart();

                String text = page.ObjsGetString(0, page.ObjsGetCharCount() - 1);

                if (text != null && text.trim().length() > 0) {

                    return true;

                }

            }

        }

        return false;

}

Applies To

RadaeePDF SDK for Android

Details

Created : 2015-03-17 10:38:00, Last Modified : 2015-03-17 10:40:16