Altova MapForce 2025 Enterprise Edition

The Find Lines or Edges method searches for lines or edges, along which the region will be split into snippets.

 

Properties

The table below summarizes the properties of the edge-finding method.

 

Property

Description

Fill Gaps

The Fill Gaps property enables you to specify the distance between adjacent high-contrast pixels, which causes them to merge. The Fill Gaps property can be particularly useful when, for example, a table row has dotted lines. Merging the dotted lines into one line will enable the PDF Extractor to identify this line as an edge.

 

Minimum Edge Length

The Minimum Edge Length property is an advanced setting that specifies the percentage of the search-region width an object has to cover in order to be counted. This property can be useful in situations when grid lines are inconsistent (e.g., when a grid line is shorter than the row). The default value is 60%. With enough space and consistent grid lines, the Minimum Edge Length property may not have a significant influence on the detection of split positions. However, you may want to tweak this parameter if there are missing grid lines. In this case, setting a lower percentage may help the splitter find the edge.

 

Resolution

The Resolution property allows scanning a document at a higher resolution in case the document contains very fine lines. You can choose between Standard, Fine (144 ppi), and Extra Fine (288 ppi) resolution.

 

 

For an example that uses the Find Lines or Edges method, see Example below.

 

Example

This example shows how to configure the Find Lines/Edges method. The goals of this example are as follows:

 

To extract data from the table

To exclude the top part of the page (which contains the header, company, client, and invoice details), the header row of the table, and the bottom part of the page from processing

 

To achieve the goals, we have configured the Split object in the following way:

 

The Skip Initial property has been set to 2.

The Skip Finial property has been set to 1.

The Method has been set to Find lines or edges.

No value has been set for the Region, therefore, the whole page is treated as a region.

 

The algorithm has identified the first edge in the location where the header row starts and the second edge in the location where the header row ends. Therefore, the upper part of the document together with the header row of the table have been excluded from processing (grayed-out top part in screenshot below).

 

The Skip Final value (1) has caused the algorithm to exclude the Subtotal, Sales Tax, and Total cells, because the first edge from the bottom of the region has been identified on the line where the Fence repair row ends. The rest of the table will be split into rows (grayed-out bottom part in screenshot below).

pdfex_skipinitial2_zoom60

 

© 2018-2024 Altova GmbH