Full Text Search for Beginners
Full text search is a way to search the body of your computer documents in much the same way that you search the web. It requires that your documents have actual text (not just scanned images) and that you use a search engine to index the files.
That's the quick and easy definition, at least. But what if you want to actually use full text search and you're starting from scratch? What software do you need? How do you implement it? How do you get started with it? In this article we'll walk you through everything you need to know to get going with text search and we'll recommend a top solution: FileCenter.
Text Search Summary
- Full text search: what is it?
- What kinds of documents can I search?
- Can I search scans? PDFs?
- What is a full text index?
- What kind of full text search software do I need?
- Text search software recommendation: Windows Search + FileCenter DMS
- How do I set up a full text search index?
- Can I search documents that are on the network?
- How do I perform a text search?
- What is a text search query?
- Jump to our recommendation »
What Is Full Text Search?
In a nutshell, full text search is a tool that allows you to search through the body text of the documents on your computer. This is different than merely searching the file names. Hence the name full text search, since it goes through the actual document contents.
What Kinds of Documents Can I Search?
You can typically search any kind of document that has text in it. That will obviously include any Office document, like word processing documents, spreadsheets, etc. The rule of thumb is this: If you can select and copy text from the document, you can search it.
Can I Search Scans? PDFs?
Now things get more tricky. Scanned documents don't technically have text in them. Even though you can see words in the scan, to your computer it is nothing more than a picture, no different than any other digital photo. Remember the rule of thumb: if you can't select and copy text in the document, there's no text to search.
This, of course, begs the question: Can't we teach the computer to recognize text in a scanned document? The answer is yes, and the technology to do it has been around for decades. It's called Optical Character Recognition, or "OCR" for short. But realize that this requires an additional piece of software and an additional step.
What about PDF documents? Again, that's a tricky question. Some PDF documents have text in them. Some just have images. Scans, for example, are often saved as PDF documents. How can you know if a given PDF is searchable? Simple: try to select and copy some text in it. If it isn't searchable, then OCR is your solution to make it searchable. For an easy way to do this, see What Kind of Full Text Search Software Do I Need below.
What Is a Full Text Index?
This question is a bit more techincal, so let's keep the answer very basic. If you had your computer actually crawl through every single document when you do a search, your search would take a really, really long time. We don't like things that take a long time. We want search results in seconds, not minutes or hours. So the search engine does something clever. It goes through your documents once and makes a database of every word it finds and where it occurs. This full text search database or "FTS database" is usually referred to as an index or search index and it's just like the index at the back of a book. It's a very quick way for the search engine to look up the text to search.
So when you run a text search, the search tool doesn't actually look at the documents at all. It looks at its search index. That being true, you obviously need software that will keep these indexes current.
What Kind of Full Text Search Software Do I Need?
Now the important question: what software do you need? There are three key tasks related to full text searching. You need to make sure that you have search software to cover each task:
First, OCR. As we mentioned above, not all documents are searchable. OCR software will make sure they are. It's important to note that this often requires converting the documents to PDF format. For example, a scan saved as a TIFF file can't be made searchable because the TIFF file format doesn't support text. PDFs, on the other hand, do. So make sure that your OCR software will turn unsearchable files into searchable PDFs.
Second, indexing. You need a piece of software that will index your documents. And not just index them, but keep the indexes current. For example, if you create a new document, it needs to get added to the index. If you edit a document, it needs to be updated in the index. If you delete a document, it needs to be removed from the index. You get the idea. So your indexing software needs to be running behind the scenes all the time to keep things in sync.
Finally, you need software where you can perform your searches. This will often be the same software that does the indexing, but it doesn't have to be. It just has to be compatible with your indexes.
Text Search Software Recommendation: Windows Search + FileCenter DMS
These days Windows comes with search software built right in. It's called Windows Search and it does a wonderful job of indexing documents and allowing you to run searches from inside Windows Explorer. In Windows Explorer, you'll notice a field in the upper right-hand corner where you can enter search queries.
What Windows Search does not do is OCR your documents or convert them to searchable PDF. It also doesn't give you any handholding. That's where FileCenter DMS comes in. FileCenter is a program that helps you organize your files, like Windows Explorer, but does it even better, and adds a lot of easy-to-use features as well. One of those is the ability to make any file searchable and convert it to PDF. Another is the ability to run searches across all of your indexed folders, or even limit searches to a very specific folder or branch of folders. It is the perfect companion to Windows Search.
How Do I Set Up a Full Text Search Index?
From this point on, we'll assume that you're using Windows Search together with FileCenter DMS.
You set up your indexing in Windows. Click the Windows icon at the bottom left-hand corner of your screen and either go to the Control Panel > Indexing Options, or simply start typing Indexing Options and you should see it show up on the list. Open it.
The first thing that you'll see is a list of locations that are being indexed. Usually the Users folder is included on the list. This includes your Desktop, Documents folder, and other personal folders. If you have documents in another place, you can add that folder or drive to the list:
- Click the Modify button
- You'll see a list of locations on your computer
- Expand drives and folders by clicking the arrow next to them
- Click any checkbox to include that location and all of its subfolders in the index
- Un-check any locations that you want to exclude from the index
- Click OK when you're done
Can I Search Documents that Are on the Network?
Yes, you can search network locations. The catch is that you have to set up the search indexing on the computer where the documents actually reside. So if, for example, your documents are on a file server, you need to go to the file server to set up the indexing.
How Do I Perform a Text Search?
Assuming that you'll be using FileCenter DMS to run your searches, go into FileCenter and click the Search tab. Here you can search the whole Windows Search index for your computer. If you've set up FileCenter to view your network files as well, you can search those too.
To limit your search to a specific folder of files, you'll navigate to that folder in the main FileCenter interface then click the Search button (immediately above the list of files), choose Search In > Selected Location and enter your search terms.
What Is a Text Search Query?
A text search query is nothing more than the terms you are searching for. It is no different that doing a Google search. Just enter the words you want to search for and optionally set other limits, like the modified date or file type. Windows Search has a very rich search query syntax that you can use to perform highly specialized searches, but it isn't necessary to know any of that to get started.
Our Text Search Recommendation
FileCenter DMS combines with Windows Search to make full text searching as simple and powerful as possible. With the ability to scan and organize documents as well as make any file searchable through built-in OCR and PDF conversion tools, FileCenter will help make your entire document set searchable, including scans, not just Office files. Download a free trial today!