Available Options
There is great demand for extracting text from image files, and various procedures exist for such extraction. Apart from commonly available OCR software, there are several open source options available, and even free online websites. You can upload your image file to these sites, they extract the text and send the file back to you in your chosen format. Of course, you can also write your own customized OCR in programming languages such as C or Perl. This is especially useful if you want to scan text from an image in a non-english language, for most of the available OCRs support only English.
How does it work?
In order to extract text from large numbers of images, you need to have a high quality fast scanner that can scan images at high resolution and provide the output to your OCR software. The software then cleans the image of scratches, specks and other noise and aligns it into a portrait position. Finally, it uses a proprietary recognition engine in order to perform the extraction, and sends the output into the text editor that you have specified beforehand so that you can read the output and perform any necessary corrections.
Applications of text extraction
The process of extracting text from image files has a large number of applications in fields as diverse as medicine, law, proofreading, redaction of public documents etc. many doctors are used to writing out their prescriptions by hand, and these prescriptions are then scanned in batch mode and processed for text extraction. Similarly, legal notes are routinely extracted, while compliance with government policies that require publishing of administration documents also needs usage of these services. There is an active research area into extraction of text from the small captchas that you use on many websites, so that the verification process can be automated.
3. (Compressing scanned images)
Introduction
Scanners, digital cameras and other image capture