Data Mapping NoSQL Databases
NoSQL databases are non-tabular databases that store data differently than traditional databases made up of relational tables. Two of the most popular NoSQL databases, MongoDB and Apache CouchDB, store data as collections of BSON (binary JSON) and JSON documents. These databases leverage flexible JSON schemas and scale easily with large amounts of data and high user loads.
Altova MapForce has long supported data mapping all popular relational databases and now also includes native support for data mapping NoSQL databases. MapForce includes functionality for inserting, extracting, filtering, and ordering NoSQL data. Let’s look at an example.
Shown below is a complete data mapping from the NoSQL MongoDB sample_analytics database. This mapping extracts a subset of the binary JSON (BSON) data from multiple collections, combines, filters, transforms, and sorts the data, then exports the result as a single JSON document.
The sample_analytics NoSQL database is an example hosted on the publicly accessible MongoDB Atlas cloud server, storing training data for a mock financial services application. The database includes collections of BSON documents for 500 customers, 1,746 accounts, and 1,746 buy or sell transactions.
Data mapping NoSQL databases in MapForce starts by selecting the NoSQL database from the MapForce Insert Database Connection Wizard and entering the connection parameters. Next, we can select the collections to map:
Each collection requires a JSON schema for mapping its objects. The schema may be assigned in the database for validation, or it may be an external file.
Here is an expanded view of objects in the customers and transactions collections ready for NoSQL data mapping in MapForce:
After we drop in a JSON Schema or example JSON file for the output target, we’re ready to begin connecting objects in the database to the output. The MongoDB NoSQL database stores binary JSON data. The MapForce Function Library includes built-in functions to manipulate BSON data:
The database includes customers with gmail, hotmail, and yahoo email addresses in their profiles but our mapping project is tasked with retriving only results for customers with yahoo.com addresses. Obviously, there is no SQL Select / Where query in a NoSQL database to filter the email addresses! So instead we can combine a BSON to-regex function with a MapForce built-in Where / Order structure to achieve the same result:
The MapForce Where / Order Sort option also accomplishes another project requirement — sort the customers by birthdate in the output.
BSON format supports additional data formats than JSON string or numeric. MapForce automatically casts the birthdate from BSON date format to string for processing by the format-date function for the final output.
Combining MapForce Where/Order structures with other MapForce built-in functions filters the transactions collection to output only “buy” transactions, and joins the transactions in grouped and sorted format to profiles in the customers collection.
The output file consists of all customers with yahoo.com email addresses and reports their buy orders grouped by account for that customer. Further, the output file lists customers in order by birthdate from youngest to oldest. Here is a partial view of the JSON output in XMLSpy, the popular tool for editing, modeling, transforming, and debugging JSON, and XML technologies:
The the users appear in descending order by birthdate, and buy transactions for each account are in alphabetical order of the JSON “name” objects in the output.
If you need to periodically repeat processing to collect newer transactions, MapForce Server automates execution of data mappings designed in MapForce
MapForce is a powerful data mapping tool for data mapping NoSQL databases to or from any of: XML, JSON, databases, EDI, XBRL, flat files, Excel and/or Web services. To try it out for your own NoSQL data mapping application, download a fully functional free trial today!