ETL—extract, transform, load—is the backbone of modern data integration. While most technical professionals understand the basics, the real challenge lies in designing efficient, scalable ETL processes that handle complex data transformations while maintaining performance and accuracy.
In our latest video series, we break down how ETL works, common challenges with defining ETL workflows, and how graphical tools like Altova MapForce can help. We’ll walk through demos of real-world scenarios such as transforming and loading CSV reports to a SQL database, as well as implementing scalable automation.
Whether you’re optimizing an existing process or researching new ETL tools, this series covers all the bases.
ETL processes span a wide spectrum of complexity, from straightforward tasks like a one-to-one mapping of an API payload to a database, to highly intricate scenarios requiring extensive data filtering, transformation, and manipulation.
Altova MapForce can tackle this full range of ETL tasks.
This video tutorial explores a common ETL scenario:
Extract CSV data received in multiple reports
Transform and filter the data
Load the transformed data to a target SQL database
This particular transformation is somewhat complex because the CSV data is in a wide format, with separate columns corresponding to each of several years. Part of our transformation will be to melt or pivot the data to the long format more in line with how data is stored in a relational database.
This way, each year becomes a value in a single column, and its corresponding data is moved to a new column, resulting in more rows but fewer columns. This long format is also more readily consumed by common analytical and BI tools down the line.
In addition, we will filter unwanted data and round up long decimals before writing the data to the database.
Though making this type of transformation seems like a complicated problem, it’s easy using MapForce ETL tools that include drag-and-drop data mapping, dynamic node names, and built-in functions.
The example in this video a CSV to database ETL scenario, but MapForce supports a wide variety of additional data formats including XML, JSON, PDF, Excel, EDI, and XBRL. All popular SQL and NoSQL databases are also supported as the source or target of any data mapping.
MapForce is available for a free, 30-day trial. Now account or credit card are needed – so you can just get to work trying this ETL functionality for yourself.
Up next: Check out the previous video in our series, ETL Basics.
ETL processes are increasingly required in modern enterprises as organizations receive data in diverse formats that must be transformed and loaded into target databases or business systems. ETL projects range from simple to highly complex, depending on the specific requirements.
A common example of a straightforward ETL process involves extracting CSV data from incoming files, mapping the data structure, applying basic transformations to align with the target schema, deduping records, and then finally loading the processed data into a SQL database.
Whether an ETL project is basic with just a one-to-one mapping or more complex with sophisticated data processing requirements, developers need tools that can handle the scope of complexity without a huge learning curve – or price tag. That’s where MapForce comes in.
As part of our series on defining ETL pipelines, this video walks you through this process of extracting data in CSV documents, transforming it using data processing functions, and then configuring how it will be written to the target system.
Though this example focuses on CSV, it’s easy to define data mapping projects in MapForce for any combination of data formats. Benefits of MapForce as an ETL tool include:
Graphical, drag-and-drop data mapping
Extensible library of data processing functions
Support for all major SQL and NoSQL databases
Support for CSV, XML, JSON, PDF, XBRL, and other data sources
Instant output with affordable ETL automation
Watch the video now:
To continue learning about defining more complex ETL pipelines, watch the next video in the series.
PDF documents are used at many stages of modern business workflows, often serving as the format of choice for invoices, reports, legal contracts, and other critical documents. While PDFs are ideal for preserving content integrity and a particular visual layout, their structure makes automated data extraction challenging. For organizations engaged in data integration and ETL, unlocking information contained in PDFs is a necessity—and this is where the MapForce PDF Extractor comes in.
The MapForce PDF Extractor includes multiple tools for visually defining extraction rules to map PDF data to other formats. One that is particularly useful for zeroing in on specific content is text search. Here’s how it works – including a video demo.
A common requirement in data processing is batch data mapping, especially in the context of data transformation and integration. It involves converting data in batches rather than processing individual data points one at a time. Batch data mapping is often required in data integration scenarios where input from multiple sources needs to be aligned or transformed together. Two common scenarios are “batch to batch” and “batch to one.”
In our latest series of MapForce demo videos, we explore these common data mapping challenges.
BATCH TO BATCH DATA MAPPING
Batch to batch data mapping is helpful in scenarios where you have data updates or changes coming in batches, and you need to synchronize or transform these batches together. This could be to convert them to a different format, perform some type of sort or calculation, or a combination of these.
In this demo, we create a data mapping project that reads files from a directory and uses wildcards to set up a mapping that will process data from multiple files at once. Then, we explore another option for defining batch conversion using dynamic file names supplied by the mapping. This demo also shows how to add calculations and comments to your data mapping projects.
While this video highlights a batch to batch transformation of JSON files to XML files, MapForce supports conversion and transformation for any combination of XML, JSON, PDF, database, text, Protobuf, Excel, XBRL, and so on, for advanced data integration and ETL processes.
BATCH TO ONE DATA TRANSFORMATION
Batch to one data transformation is another common requirement, for example, when you want to merge or combine multiple files into a single consolidated document and perform some data transformation, conversion, or calculation operations in between.
This example also explores multiple approaches to defining the batch process, which will be applicable in different scenarios:
Specifying a collection of files in the same directory using a wildcard
Selecting batch files based on a list of file names stored in a different directory
This demo also shows how to sort the data merged from multiple files before writing it to the target.
After watching these quick demos, you can download a free trial of MapForce to try batch mapping, conversion, and transformation for yourself.
MORE MAPFORCE DEMOS
If you liked these videos, check out the rest of the MapForce demo series, which covers everything from mapping XML and JSON to databases to configuring data processing functions and extracting data from PDF documents.
MapForce, Altova’s award-winning data mapping tool, includes support for PDF input in data integration and ETL workflows. The MapForce PDF Extractor makes it easy to define rules for extracting PDF data in a structured format to make it available for mapping to other popular formats like Excel, XML, JSON, databases, and more.
Automated sentiment analysis of text, such as user reviews, has historically been a challenge. Because of the myriad intricacies of natural language, systems faced difficulties in analyzing context and nuances. This required an inordinate amount of manual work to overcome.
One of the many useful capabilities of modern AI systems that are based on large language models (LLMs) such as OpenAI’s GPT-4 is that they are very good at sentiment analysis of natural text inputs. We can use that capability to build a very efficient database solution in MapForce that, for example, goes through all the new incoming records in a support database and automatically determines whether a particular support request or other customer feedback is positive, negative, constitutes a bug report, or should be considered as a feature request.