XtractEdge Activity

XtractEdge document extraction activity allows you to extract data from the document by connecting to the XtractEdge Platform API. XtractEdge Platform allows you to define document types including the required fields for extraction and upload documents to get the required field and values extracted. For more information, see XtractEdge Platform Design Time User Guide.

 

With dynamic extraction, you can extract the fields and values from different locations in the unstructured document. For example, for extraction of data from non-standard documents like invoices, purchase orders, and payslips, where the structure of the document itself is varied and cannot be fixed to any layout.


With templatized extraction, you can process documents to follow a particular template. You can extract the data from standard reports like application forms, survey forms, or any other forms where the document structure is consistent.


The XtractEdge activity provides the extracted output in JSON format. The following are the types of data provided by JSON:


Templatized:

 

  • Fields: The fields include field name, value, confidence score, tabular data and checkbox.
    • Tabular data: If there is a table in the document, inside the Fields, you can see "is_table": "True",. The tabular data includes row count, column count, column headers, and rows are retrieved. The rows include cells with value, field name, and confidence score.
    • Checkbox: If there is a checkbox in the document, inside the Fields, you can see "is_checkbox": "True",. The checkbox includes field name, checkbox option name, field confidence, and if it is selected.


Dynamic:

 

  • Tabular data: If there is a table in document, then the row count, column count, column headers, and rows are retrieved. The rows include row numbers and cells with value, field name, and confidence score.
  • Fields: The fields include field name, value, confidence score.