Reading Between the Lines and Beyond

An old saying goes “pictures say a thousand words”. But have you ever wondered how a picture is read? Not metaphorically, rather, quite literally? No? Let us tell you…

 

In the early 20th century, the 1920s to be precise, Fournier d'Albe's Optophone and Tauschek sowed the seed of Optical Character Recognition/Reader (OCR). Today, OCR can be explained as the electronic or mechanical changeover of images of texts (ranging from typed to handwritten, to photo or scan of a document) into machine-encoded text. However, its initial conception can be traced to a device that could read texts for blind people. Here’s a timeline to give you a perspective on how this technology or its idea and application were conceived -

1870–1931

Earliest ideas of optical character recognition (OCR) are conceived. Fournier d'Albe's[1] Optophone[2] and Tauschek's Reading Machine are developed as devices to help the blind read.

1931–1954

First OCR tools are invented and applied in industry, able to interpret Morse code[3] and read text out loud. The Intelligent Machines Research Corporation[4] is the first company created to sell such tools.

1954-1974

The Optacon[5] , the first portable OCR device, is developed. Similar devices are used to digitise Reader's Digest[6] coupons and postal addresses. Special typefaces are designed to facilitate scanning.

1974-2000

Scanners are used massively to read price tags and passports. Companies such as Caere Corporation, ABBYY[7] and Kurzweil Computer Products Inc, are created. The latter one develops the first omni-font OCR software, capable of reading any text document.

Slider

With technological innovations and developments entering new avenues every day, the OCR has found its application in various fields. But before we explore its applications, let us look at what are the different types of OCR.

 

Broadly, OCR can be categorized into four types. The first is Optical Character Recognition, which targets typed and written texts, one character at a time. The second is called Optical Word Recognition. This too spots typed and written texts, but one word at a time. It is used for languages that use space as a word divider. When we say OCR, this is the type that is generally referred to. The third category is called Intelligent Character Recognition.

 

This takes a step forward from the previous two types and targets script or cursive text, one character at a time. The fourth and last type here is called Intelligent Word Recognition. This targets handwritten cursive text, one word at a time. In a paper called The State of the Art in Online Handwritten Recognition published way back in 1990, C. C. Tappert, C. Y. Suen and T. Wakahara[8] explain that among other inputs used, handwritten movement analysis can be used for handwriting recognition. This technology is also known as dynamic character recognition.

 

Now that we know what OCR is and what it does, let us look at how it functions. OCR’s work can be broken down into four stages of processing. These stages include i) pre-processing, ii) text detection and text recognition, iii) post-processing, and iv) application-specific maximization. Each of these stages has different and many techniques within them. For instance, the pre-processing stage includes various techniques[9] that aid the process of recognition of characters. It is at this stage that image is checked for defects such as distorted text lines, skew images, noise, etc. One can think of it as an image cleaning stage where one reduces noise as much as possible. Second, the stage of text detection and recognition assigns probabilities to each character that matches the graphic symbol. It has traditional core algorithms[10], such as SVM, HMM, or the recent advent of AI such as CNN (Convolutional Neural Network), RNN (Recurrent Neural Network) based neural networks, which present a list of prospective characters. Here, the ensemble of multiple models such as CNN + HMM and CNN + LSTM, etc leads to significant enhancement in the quality of the OCR output. The post-processing stage is aimed at correcting the OCR mistakes and thereby increasing the accuracy of recognition[11].

Usually, results obtained from the OCR system do not exactly match the original document. To mitigate this issue, we have to go through this stage to process the obtained results. In the post-processing step, we modify the input sequence of characters to match with another sequence of characters that is graphically similar and generate the maximum likelihood of the sentence from the given language models. We find joint probability distributions that maximize the conditional probability of OCR output with a given likelihood of observing that sequence. And the last stage, the stage of application-based optimization is, as the name suggests, where OCR is tweaked to deal with more specific inputs. This step is to bring contextual awareness (high-level lexicons) to our OCR based on our custom client preference and specification.

 

Today, without realizing a layman uses OCR so frequently that it has very casually become an inseparable part of our day to day functioning. While some of the common applications include automatic number plate recognition, assistive technology for visually impaired users, scanning documents and converting them to searchable PDFs, there are many other fields[12] in which OCR can be used. Unfortunately, despite its immense potential, research and innovation in OCR technology seem to have become stagnant in the last half a decade or so. And this is where we slip in!

OCR is a technology that sits at the conjunction of Computer Vision or Machine Vision and Natural Language Processing. Text detection is considered a Computer vision task whereas text recognition is of NLP nature. We at EZ Works have been working to bring OCR up to date with other technological developments today that define the task of translation. Our major work is in the field of creating our own OCR algorithm that works seamlessly with languages such as Arabic, which has the additional challenge of diacritics and is written right to left. Secondly, we are also working towards improving the image preprocessing using Artificial Intelligence algorithms for Computer Vision.

 

Our EZ Lens system is capable of enhancing the image quality from a low fidelity image to a high-quality one. We’ve set a threshold at which our OCR works. Any image that falls below that, needs to be reworked upon so that it qualifies the standards that our OCR has set to ensure that the OCR produces great output even on low-quality input such as poor quality scans or washed out paper records. And the third innovation is that we have trained the OCR on our large proprietary dataset to build better character and word recognition than any other Arabic OCR in the world.

 

We, at EZ Works, with our present innovation in the field have brought appreciable enhancement by working on some more new and pathbreaking technological development. This will take the OCR technology to another level of the application altogether.

 

References :

[1] Wikipedia contributors. "Edmund Edward Fournier d'Albe." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 12 Dec. 2019. Web. 27 Oct. 2020.

[2] Wikipedia contributors. "Optophone." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 26 Apr. 2020. Web. 27 Oct. 2020.

[3] Wikipedia contributors. "Morse code." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 21 Oct. 2020. Web. 27 Oct. 2020.

[4] Wikipedia contributors. "David H. Shepard." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 4 Jan. 2019. Web. 27 Oct. 2020.

[5] Wikipedia contributors. "Optacon." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 17 Mar. 2020. Web. 27 Oct. 2020.

[6] Wikipedia contributors. "Reader's Digest." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 22 Oct. 2020. Web. 27 Oct. 2020.

[7] Wikipedia contributors. "ABBYY FineReader." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 26 Oct. 2020. Web. 27 Oct. 2020.

[8] C. C. Tappert, C. Y. Suen and T. Wakahara, "The state of the art in online handwriting recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 8, pp. 787-808, Aug. 1990, doi: 10.1109/34.57669.

[9] https://www.nicomsoft.com/optical-character-recognition-ocr-how-it-works/

[10] Liwei Wang, Xiao Wang and Jufu Feng, "On image matrix based feature extraction algorithms," in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 36, no. 1, pp. 194-197, Feb. 2006, doi: 10.1109/TSMCB.2005.852471.

[11]https://web.archive.org/web/20160322103356/https://community.havenondemand.com/t5/Wiki/How-to-optimize-results-from-the-OCR-API-when-extracting-text/ta-p/1656

[12] https://medium.com/swlh/applications-of-ocr-you-havent-thought-of-69a6a559874b

Posted in EZ Knowledge