Monday, January 5, 2009

Java Optical Character Recognition

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine-editable text.

OCR is a field of research in pattern recognition, artificial intelligence and machine vision. Though academic research in the field continues, the focus on OCR has shifted to implementation of proven techniques. Optical character recognition (using optical techniques such as mirrors and lenses) and digital character recognition (using scanners and computer algorithms) were originally considered separate fields. Because very few applications survive that use true optical techniques, the OCR term has now been broadened to include digital image processing as well.

Early systems required training (the provision of known samples of each character) to read a specific font. "Intelligent" systems with a high degree of recognition accuracy for most fonts are now common. Some systems are even capable of reproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components. (excerpt from wikipedia)

My understanding for Java OCR implmentation so far is that there is no pure java implementation and most of implementation available in the market is a java porting / wrapper from existing implementation from other language (e.g. c / c++).

You can find list of available softwares from ocr page from wikipedia

Reference List:

Optical Character Recognition

Futurama: Using Java Technology to Build Robots That Can See, Hear Speak, and Move

Optical Character Recognition(OCR)

Java OCR

Java libraries to read and write PDF files

http://www.gnome.sk/

http://asprise.com/home/

http://www.sane-project.org/


JDesktop Integration Components (JDIC)

JDIC provides Java applications with access to functionalities and facilities provided by the native desktop. It consists of a collection of Java packages and tools. JDIC supports a variety of features such as embedding the native browser, launching the desktop applications, creating tray icons on the desktop, registering file type associations, creating JNLP installer packages, etc (the project supports features such as embedding the native HTML browser, programmatically opening the native mail tool, using registered file-type viewers, and packaging JNLP applications as RPM, SVR4, and MSI installer packages. As a bonus, an SDK for developing platform-independent screensavers is included).

Many new features are contributed as incubator projects from the community.


Reference urls:

JDesktop Integration Components (JDIC) project
Introducing JDIC
Multiple JDIC browsers integrated into Processing sketch

´