What is a "Searchable PDF"?
The PDF file format can be confusing, especially when it comes to understanding what constitutes a "searchable" PDF file. To understand whether a PDF file is searchable, you have to look at its origin.
First, a PDF file can originate with a file on your computer, like a Word document. Normally, you create the file in your software and then "print" it to a PDF printer. This converts the file to PDF format. These PDF files are text-based PDF, meaning that they retain the text and formatting of the original. Text-based PDF files are searchable because they contain real text.
PDF files can also originate from a scan or a fax. These are image-based PDF files, meaning that they are simply a picture of the original. To your computer, these images are no different than digital photos or graphics. Your computer does not see any text in them.
To make these files searchable, it is necessary to "recognize" the text in the image using optical character recognition ("OCR"). This creates text from the "pictures" of the letters and then inserts the text invisibly behind the image. Without OCR, an image-based PDF file is not searchable.
It's easy to do. Just select or open the PDF, then click the OCR button. FileCenter will take care of the rest.
Test Whether a PDF Has Text
If you're in doubt, there's an easy way to see whether a PDF file is searchable or not:
- If you don't have the PDF open in FileCenter, select it, then click the Files button Open in FileCenter
- With the PDF open in FileCenter, right-click on it and choose Select Tool
- Now click and drag the mouse across text to see if it selects anything
- If you can't select any text, it's because there isn't text and the PDF isn't searchable
Alternatively, open the PDF in Adobe Acrobat, then select the "Edit" menu > "Select All". This will select all of the text in the file. If nothing is selected, there is no text and the file isn't searchable.