Canon Scanners, Spotlight and OCR
When I scan receipts and invoices with my Canon LIDE 600F, OS X's Spotlight somehow, magically, is able to find them based on their text content. But I'm just scanning the documents, not running OCR software on them.
For example, this morning I scanned the receipt for my new MacBook, so I could send it in for a shipping rebate. Just for grins I then did a Spotlight search for "MacBook". There it was, near the top of the list.
mdls showed that "MacBook" didn't appear anywhere in the file's metadata. And the filename, "File0001.PDF", certainly didn't match. Yet somehow mdfind identified it as a match.
Searching for the string "MacBook" in the PDF turned up nothing. (Granted, I used 'strings -a' to extract the strings, and that command is ASCII only.)
Google, and David Creemer, provided the answer. Creemer also showed that the text in "CanoScanned" documents is selectable in Preview.app.
What's going on? It seems that the CanoScan Toolbox application is automatically performing OCR when it finishes scanning documents. For more info see zachary.com : Easy and cheap PDF Document Management (with OCR) on Mac OS X.
Synergy kudos to Canon and Apple!
No comments:
Post a Comment