PDF Extractor
It lets you extract data from a PDF file using an existing PDF
template created in Automation Studio. See PDF
Template Creator activity to know how to create a PDF template
in Automation Studio.
The dynamic PDF extraction capability allows you to automate
actions on complex PDF documents. Advanced PDF controls and OCR
capabilities extract information from PDF documents faster and
with improved data quality and accuracy.
Using File Read Activity
1. In
the Canvas Tools
pane, click File to
expand the tool and view the associated activities.
2. Drag
the PDF Extractor
activity and drop on to the Flowchart designer
on the Canvas.

3. Click
the PDF Location field
and browse for the PDF file from where you want to extract data.
The selected PDF file must be similar to the PDF template or an
error is received.
· Instead
of passing the PDF file to the activity as a default file, you
can pass the PDF file as a parameter. Create a parameter in the
Parameter pane, and assign
the PDF file path (along with file name and file extension). In
the Properties grid, enter
the parameter name in the FileName
field property.
4. Click
the Template Location
field and browse for the PDF template created in Automation Studio.
By default, the template is saved in the %localappdata%
> EdgeVerve> AutomationStudio > ProtonFiles> PdfRepository
folder.
The PDF
Extractor activity with the default name is created.
You can perform test run to view
the extracted data. A message for successful data extraction is
displayed in the Output
console of Automation Studio.

The extracted data is saved in a
.CSV file at %localappdata% >
EdgeVerve> AutomationStudio folder for further processing.
You can delete the file, if required.
PDF
Extractor Properties
The properties of PDF Extractor
activity are listed in the following table and can be edited
in the Properties grid on the right pane.
Property Name |
Usage |
Control Execution |
Ignore Error |
When this option is set to Yes,
the application ignores any error while executing the
activity.
If set to NA,
it bypasses the exception (if any) to let the automation
flow continue; however, it marks the automation status
as failure, in case of an exception.
By default, this option is set to No. |
Delay |
Wait After |
Specify the time delay that must occur after the
activity is executed. The value must be in milliseconds. |
Wait Before |
Specify the time delay that must occur before the
activity is executed. The value must be in milliseconds. |
Misc |
Breakpoint |
Select this option to mark this activity as a pause
point while debugging the process. At this point, the
process freezes during execution allowing you to examine
if the process is functioning as expected.
In large or complex processes, breakpoints help in identifying
the error, if any. |
Compare Result |
Compares the data extracted from the scanned PDF
file with the original document in a comparison view in
the Automation Studio. By default, it is not selected. |
Commented |
Select this option to mark this activity as inactive
in the entire process. When an activity is commented,
it is ignored during the process execution. |
DisplayName |
The display name of the activity in the flowchart
designer area. By default, the name is set as PDF
Extractor. You can change the name as required. |
FileName |
The path of the PDF file which you want to use for
data extraction. You can enter a pre-defined parameter
in this field to pass the PDF file as a parameter and
not the default file. |
FolderPath |
The location of the folder where the excel file
needs to be created to save the extracted data. By default,
the excel file gets created in the %localappdata%
> EdgeVerve> AutomationStudio > ProtonFiles>
PdfRepository folder. You can specify a folder
location of your choice to over write the default location. |
PageRange |
The range of pages that you want to retrieve. For
specifying a single page, enter the page number in double
quotes, for example, "5".
You can specify a range of pages by providing the range
in double quotes, for example, "2-5",
or "All"
for all the pages. Only string types are supported. By
default, it is cleared. |
TemplateName |
The name of the configured PDF template to extract
data. Alternatively, you can select the required template
in the Template Location
field of the activity block. The template selected
in the Properties
grid reflects in the activity block and vice versa. |
|
Example
of PDF Template Creator |