Process Multiple Input Files in a Single Data Mapping
Typical data conversion tasks require processing numerous input files that arrive in batches. Altova MapForce includes features that let you handle groups of files with minimal intervention.For instance, we recently copied a set of files from the memory card of a digital camera with GPS support. Each .LOG file is a CSV containing GPS coordinates for a single route.
We quickly designed a mapping to convert the CSV data to XML-based .gpx format and processed all three files to generate three output files in a single execution:
First, we used a wildcard character in the Input file name in the Properties dialog for the input component of the mapping. This instructs MapForce to individually process every file in the working directory that matches the wildcard.
If you are designing a complex conversion, or if the input files are very large, you can use a single unique filename to develop the mapping, then change to a wildcard when you are satisfied with the mapping output.
File Path Functions
The built-in MapForce Function Library includes file path functions we can use to manage output file names. If we define a single output file, it will be appended with new data when we process each successive input.
You can combine file path functions with other string functions for complete control of output file names and locations. We decided to leave the output in the same directory as the input files, but to create more descriptive filenames, and to use the .gpx file extension.
The portion of the mapping shown below uses the string concat function with file path functions to generate the output file 1211190converted.gpx from 1211190.LOG, and so on.
You can also use file path functions to generate strings and insert them as output. The XML Schema for .gpx files contains a metadata description element. We decided to insert the input file name into the metadata to explicitly link the output file to the original data. This strategy makes the output file self-documenting, and can help with debugging if you need to trace unexpected output back to the original source.
The portion of the mapping shown below inserts the source file name into a string and maps the string to the metadata <desc> element:
The resulting description is on line 4 of the mapping Output preview:
Filtering Input Data
The core of this data mapping required a filter on the input file. The camera GPS log files are recorded according to the National Marine Electronics Association (NMEA) specification. A portion of one of the input files is shown below:
After the first line, each recorded point is described by two NMEA sentences, where the sentence type is identified in the first field. Each GGA sentence includes the time, latitude, longitude, elevation, and additional data about the quality of the fix. Each RMC sentence contains the time, latitude, longitude, and date.
An RMC sentence contains the minimum data we need to generate a .gpx <trkpt> element, so we can use a filter to select only those lines from the input, as shown here:
If the message type in the first field of a row contains “$GPRMC” it is passed through for processing. If not, the row is ignored.
The actual data in the input file also required some manipulation. For each latitude and longitude, we had to combine multiple fields in the source that defined degrees, minutes, and seconds and convert to decimal degrees. We needed to combine the time and date fields and record the result in ISO 8601 format as required by .gpx, such as 2012-11-19T20:43:23Z. We defined each of those conversions as user functions to encapsulate their complexities and separate them from the main mapping.
This mapping also provides an opportunity to reuse the getElevationUS user function we defined in the earlier post Expect the Unexpected – Altova MissionKit Solves a Number Format Mystery. This time we rounded the elevation data to three decimal points, representing the nearest millimeter.
The core section of the CamerlogToGPX data mapping with user functions looks like this:
And here is one of the output files showing a <trk>, <trkseg>, and several <trkpt> elements.
The MapForce Output menu provides a selection that lets us validate the output files against the .gpx XML Schema:
If you would like use Altova MapForce to process input files in batches for your own data mappings, click here to download a free trial.
We followed up with a post describing how to deploy this mapping in a FlowForce Server job. Click here for the full story.
Great article and it saved me a lot of time learning about this software which I have never used. I'm wondering if you could help me solve an issue I am having.
I have a large amount of XML files in one folder which I want to output some data elements contained within them to a single CSV file. I've successfully created a mapping of the schema to my CSV and the results in the “Output” are what I want and expect. As per your article, I put “*.xml” as the Input file which it accepts. When I return back and hit Output – I can see that the program is running through the 30 xml file names in that folder but the output only has the output for one xml and when I utlize “Save the output file” it only has the information from the last XML file in the folder. I wanted to create a comprehensive CSV file which contains all the data elements I want to extract from the all the XML files. Is this possible?
Thanks a lot!
Thank you for your comment! The sample file named MergeMultipleFiles.mfd in the MapForce examples project has a sample mapping showing what you're trying to achieve. I hope this helps!