PDF documents are used at many stages of modern business workflows, often serving as the format of choice for invoices, reports, legal contracts, and other critical documents. While PDFs are ideal for preserving content integrity and a particular visual layout, their structure makes automated data extraction challenging. For organizations engaged in data integration and ETL, unlocking information contained in PDFs is a necessity—and this is where the MapForce PDF Extractor comes in.
The MapForce PDF Extractor includes multiple tools for visually defining extraction rules to map PDF data to other formats. One that is particularly useful for zeroing in on specific content is text search. Here’s how it works – including a video demo.
A common requirement in data processing is batch data mapping, especially in the context of data transformation and integration. It involves converting data in batches rather than processing individual data points one at a time. Batch data mapping is often required in data integration scenarios where input from multiple sources needs to be aligned or transformed together. Two common scenarios are “batch to batch” and “batch to one.”
In our latest series of MapForce demo videos, we explore these common data mapping challenges.
BATCH TO BATCH DATA MAPPING
Batch to batch data mapping is helpful in scenarios where you have data updates or changes coming in batches, and you need to synchronize or transform these batches together. This could be to convert them to a different format, perform some type of sort or calculation, or a combination of these.
In this demo, we create a data mapping project that reads files from a directory and uses wildcards to set up a mapping that will process data from multiple files at once. Then, we explore another option for defining batch conversion using dynamic file names supplied by the mapping. This demo also shows how to add calculations and comments to your data mapping projects.
While this video highlights a batch to batch transformation of JSON files to XML files, MapForce supports conversion and transformation for any combination of XML, JSON, PDF, database, text, Protobuf, Excel, XBRL, and so on, for advanced data integration and ETL processes.
BATCH TO ONE DATA TRANSFORMATION
Batch to one data transformation is another common requirement, for example, when you want to merge or combine multiple files into a single consolidated document and perform some data transformation, conversion, or calculation operations in between.
This example also explores multiple approaches to defining the batch process, which will be applicable in different scenarios:
Specifying a collection of files in the same directory using a wildcard
Selecting batch files based on a list of file names stored in a different directory
This demo also shows how to sort the data merged from multiple files before writing it to the target.
After watching these quick demos, you can download a free trial of MapForce to try batch mapping, conversion, and transformation for yourself.
MORE MAPFORCE DEMOS
If you liked these videos, check out the rest of the MapForce demo series, which covers everything from mapping XML and JSON to databases to configuring data processing functions and extracting data from PDF documents.
MapForce, Altova’s award-winning data mapping tool, includes support for PDF input in data integration and ETL workflows. The MapForce PDF Extractor makes it easy to define rules for extracting PDF data in a structured format to make it available for mapping to other popular formats like Excel, XML, JSON, databases, and more.
Automated sentiment analysis of text, such as user reviews, has historically been a challenge. Because of the myriad intricacies of natural language, systems faced difficulties in analyzing context and nuances. This required an inordinate amount of manual work to overcome.
One of the many useful capabilities of modern AI systems that are based on large language models (LLMs) such as OpenAI’s GPT-4 is that they are very good at sentiment analysis of natural text inputs. We can use that capability to build a very efficient database solution in MapForce that, for example, goes through all the new incoming records in a support database and automatically determines whether a particular support request or other customer feedback is positive, negative, constitutes a bug report, or should be considered as a feature request.