From Document to Insights
Harnessing the power of cognitive AI
December 2019
In a business landscape characterized by digital transformation, it is easy to see how data offers businesses an edge. The addition of intelligence to the data management process has infused speed into data movement and analysis. Increasingly, enterprises are finding new ways to gather, unify, clean, and analyze data at speed.
RPA has helped automate digital data movement across business tasks, but hits a roadblock with non-digital data in documents such as invoices, scanned paper forms, statements, claims, and receipts. How then can organizations increase business efficiency by bringing more data into the purview of their digital information systems, especially data that is locked away in scanned documents? Two words – Data Digitization.
Understanding the Importance of Intelligent Information Extraction
Most large companies have many business processes that deal with paper forms and scanned documents, usually requiring human agents to enter this information into an enterprise IT system. While organizations have relied on varying capacities on manual processing centers, optical character recognition (OCR), and handwriting recognition, each of these techniques has its share of challenges. The emergence of deep learning techniques in the areas of computer vision and NLP, coupled with the flexibility of provisioning resources in the cloud is a game-changer enabling a new breed of text digitization solutions. These new techniques can help understand the relationships between field labels and values and the structure and layout of data elements – starting from boxes and tables to specific details like checkboxes and signatures.
I already have OCR. How is this different?
OCR involves the ability to detect text regions on any scanned images and convert those regions into the correct digital text. Handwriting recognition must supplement this foundational capability and include the ability to deal with pages that feature handwritten text. Here are some of the issues with that approach:
Visualizing The Data Digitization
A comprehensive data digitization solution should offer the following features:
Use Cases for Data Digitization
Form Digitization: Digitizing existing enrolment or other multi-page paper-based forms that involve a mix of typed text, handwriting, check boxes, and other fields and tables.
Dynamic Extraction or Touch-free Zero Template Extraction: Dealing with non-standard input documents that are not structured like forms, but usually contain the same information, albeit in varying layouts.
Content Classification and Extraction from Mixed-type Documents: Digitizing documents that include many different document types.
Information Consistency Checking: The most complex use case that requires mature products, which address all the previous use cases and also support the definition of consistency verification rules that enforce domain-specific rules for information consistency.
Digitization products today deliver enhanced value through these advanced features for extracting and interpreting information from your scanned documents and integrating results into existing business processes and applications. They are typically used in conjunction with modern scanning solutions to ensure a virtually touch-free deployment of the process in production, so your human resources can focus on improving and delivering stellar customer experiences.
Share This Story
By John Kuriakose
Principle Product Architect,
Infosys Nia, Infosys