Friday, October 30, 2009

Nerdlets: Christianity/Culture/Computing

Nerdlets: Christianity/Culture/Computing 

Get Real Text out of your Scanned Documents

OCR is the technology used to turn an image of text into plain (editable, search-able) text. If you’re like me (i.e., a nerd) you probably have a pile of scanned journal articles and books and such meticulously sorted on your hard drive (PDFs for example). You can read them and print them, but you can’t search them or edit them. Wouldn’t it be nice if you could?

Well, there are a number of free options on the web, but they all have their problems. Google has some of the best OCR technology out there–they recently acquired CAPTCHA to make it even better–and they have apparently been rolling this out into Google Docs. The Google Docs version is not as wonderful as you might like, but it works on high-res documents. Read about how to turn your images into text here.

Update: I was not able to get this to work with PDFs, surprisingly. The web-app only accepts PNG, JPEG, or GIF images right now. That is unfortunate, and I assume will be “corrected” in the future. Has anyone tried this on an image yet?

Nerdlets: Christianity/Culture/Computing

No comments: