Make Scanned PDFs Text-Searchable

   August 5th, 2010 Brian Herzog

WatchOCR logoI'm not entirely comfortable talking about something I haven't used myself, but I really like the idea of this software - it automatically OCRs flat-scanned PDFs and creates text-searchable versions.

Alright, some of you might be saying, "it does what now?" From their description (and more on SlashDot), this is software you install on your server. Then, when one of those horrible originally-scanned-as-one-big-image PDF files gets saved to a "watched" directory, the software automatically converts it to a proper, search-the-text type PDF.

Since I haven't tried it, I don't know how well it works. It sounds like it'll take a bit of tinkering to get operational, but it'd be worth it to make those scanned PDF more useful.

It would also be worth exploring hooking up a scanner directly with this software, to help speed the digitization of historical (and other) records. We looked at the Library Scan Station, which was pretty awesome itself, but was just too expensive for us. This might prove to be a lower-cost solution.

