I. Introduction

Originally, the first device using Optical Character Recognition (OCR) technology was designed to help blind people reading by converting characters into telegraph code. By its huge development over time, OCR now commonly is a technology used to convert handwritten, scanned, typed, photographed or machine-printed text on documents into machine-readable, editable and searchable documents in almost any formats (eg. txt, doc, etc.).

Fig. 1: OCR technology.

Traditional OCR: Before AI became a popular choice in OCR, traditional OCR couldn’t work properly without strict rules and templates. Those old-school tools are extremely inflexible and work acceptably only in the conditions of excellent visual format and with a limited number of pre-loaded templates in the system. Sometimes creating those templates consumes as much time as manual input.

AI came to rescue: The strongest advantage of AI is the ability to recognize objects through sophisticated features that it can learn to extract, which is much in a similar way that the human brain works. The modern OCR systems now are not only able to predict characters in high accuracy despite poor visual condition but also can check the dictionaries, understand the context and select the best combination matches based on surrounding information.

Powering by neural networks and deep learning from AI, the limits of modern OCR software are being pushed much further than ever before. The businesses that can capitalize on that strength will have a chance to emerge strongly in an increasingly competitive market.

1. How can OCR recognize letters in an image?

The algorithms applied in OCR technology can be categorized into two classes: pattern recognition and feature extraction.

a) Pattern recognition:

Using methods in this class, the software tries to recognize the whole letter by comparing the letter it “sees” with letters it has already “remembered”. The software will understand the letter if it finds something that “fits” into the letter. The most disadvantage of these methods is that it is difficult for them to handle too many types of fonts. In the earlier time, people even designed several types of fonts that helps increasing the accuracy of these methods (eg. font OCR-A).

Fig. 2: Font OCR-A.

b) Feature extraction:

The method in this class is much more sophisticated. Instead of recognizing the whole letter at once, the software tries to find features that can help it recognize the letter, for example: points, angles, curves, etc. Modern machine learning algorithms are able to learn and extract extremely abstract features that even humans cannot understand, and they can recognize letters in a wide variety of conditions or fonts.

Fig. 3: Feature extraction.

2. Applications of OCR

With the huge improvement in accuracy recently, OCR is being applied in many areas of human life and has proved its benefit to reduce a lot of resources and avoid human errors. 

  • Helping blind people: Modern devices using OCR and speech synthesis technology are able to read out loud printed texts, books and magazines in many different types of fonts to blind people.
  • Automating processes: reading plate numbers in car parks, reading personal documents like passports, ID cards, etc. in airports.
  • Sorting/Classifying letters/documents in post offices, firms, etc.
  • Preserving historical books and texts.

II. What a business can benefit from OCR technology

In fact, at the business level, OCR technology can be beneficial to a lot of processes related to documents.

Fig. 4: Benefits of OCR to business.

By applying OCR technology, businesses are:

  • Saving time
  • Improving work management
  • Reducing cost
  • Improving data access and searchability
  • Improving business processes
  • Securing data
  • Avoid storage problems
  • Improving customer services
  • Doing good for environment

With the above benefits, the application of OCR technology in enterprises in Vietnam has been increasing rapidly in recent years, especially in the banking and finance industry. Some popular applications can be listed as follows:

  • Automating the process of customer identification by reading personal identification documents like ID cards, driving licenses, passports, etc.
  • Digitizing and managing documents: paper documents like contracts, invoices, receipts etc. are converted to digital form. These documents then can be archived,  sorted, searched and freely transferred.
  • Increase the efficiency in logistic and transportation: the technology can greatly help improve the speed and the cost efficiency in processing, tracking and shipping parcels. Instead of manually inputting long tracking numbers, human names and addresses, users can use only scanning devices to extract necessary information from labels in real time.  

1. Modern enterprise OCR platform

With the development of deep learning, modern OCR platforms are increasingly meeting the needs of businesses. 

Fig. 5: OCR processing flow.

In general, today OCR platforms usually include 4 features as follows:

Fig. 6: Features of modern OCR platforms.

a) Document classification

The first main feature is to automate the document classification and separation using AI-based classifiers. The documents will be put into specific categories or directed into different processing flows. For example, the system is processing and classifying structured forms, semi-structured documents like invoices, ID cards, passports, or completely unstructured documents like CVs or contracts.

b) Data extraction

Customized text detectors integrated with Natural Language Processing (NLP) technology automates the identification and extraction of content from structured documents like ID cards, passports or unstructured documents like contracts, emails. By leveraging the advancement of new algorithms, the system helps customers to accelerate transactions while significantly reducing operating costs and errors.

c) Data validation and control

Important data fields and entities are identified, validated, and automatically processed according to business rules and requirements. For example ID number, passport number, etc. 

Fig. 7: Reviewing OCR results.

d) Visibility into Data and Processes

This is a critical feature for every customer. All important resources, performance and accuracy metrics need to be monitored and reported in real time, therefore providing administrators a more accurate view of opportunities for improvement.

Fig. 8: Monitoring training history.

2. Example of continuous learning classifier:

Fig. 9: Continuous learning customized classifier.

By keeping feeding new data into core AI-based models, customers easily see the improvement of their classifier through each version. The whole process, from providing data to reviewing and monitoring the test results, can be operated and monitored from customers’ side.

III. Conclusion

In this article, we go through from the origin of OCR technology to how modern OCR algorithms are applied in business. In general, OCR technology offers many benefits like cost reduction or resource optimization. Businesses that can sooner take advantage of OCR technology may better have the potential to outperform their competitors.

Nguyen Bao Trung – FPT Smart Cloud

Related posts: