How to Compare CSV Files or Compare a CSV File to a Database Table
CSV files are a quick and convenient way to record structured data in a generic format. Because CSV files are so easy to create, multiple similar versions of very large CSV files can quickly proliferate. Often it becomes necessary to compare CSV files to find the desired version. In an ETL scenario, a data analyst may want to compare a CSV file to a database table for validation or to update data.
DiffDog, the unique XML-aware diff / merge tool from Altova, supports CSV as a native file format for comparison and can compare and selectively merge data CSV to CSV, or between a CSV file and database table. Let’s look at an example.
Since CSV files contain structured data, a CSV comparison is launched using the DiffDog Compare Database Data selection, either from the File menu or the tool icon. This opens a new comparison window where the user selects the left and right comparison objects.
Note that DiffDog lets you compare CSV files with comma, tab, or semicolon characters as column separators and files with or without header rows.
In the screenshot below, the user has selected left and right comparison files, mapped the files at the top level to compare all columns, and executed the comparison by clicking the green Compare tool icon at the top right side of the toolbar.
Note the row count for each file near the object name and the not-equal icon in the connection link. These two CSV files have the same number of rows, but the contents are not identical. The column names are listed below each object name. You can deselect any individual connection to prevent comparison of that column.
Next, we want to see the exact differences between the CSV files. Right-clicking the unequal connection line opens a menu with an option to view detailed results.
The result window shows a detailed view of both files.
The navigation toolbar at the top lets you quickly locate the first difference between the files. Perhaps an order for customer number 544 was incorrectly entered and needed to be reclassified to a different region and sales rep.
You can select and merge individual differences between the CSV files within this window via a right-click context menu:
You can merge differences in either direction and save the results by updating the existing CSV files or saving new files with new names.
Compare CSV Files to Database Tables
A CSV to database comparison is opened simply by using the connection wizard to supply a database as the comparison object on either side. The illustration below shows our original CSV on the left side and a SQLite database on the right.
In this example the contents of the CSV and database table are equal. If there are differences, all the CSV-to-CSV comparison merge features described above also work for CSV-to-database comparisons. Any changes to the database table can be instantly committed to the database directly from the results window.
Supported databases are:
• Firebird
• IBM DB2 for iSeries®
• IBM DB2®
• Informix®
• MariaDB
• Microsoft Access™
• Microsoft® Azure SQL
• Microsoft® SQL Server®
• MySQL®
• Oracle®
• PostgreSQL
• Progress OpenEdge
• SQLite
• Sybase® ASE
• Teradata
Automate CSV Comparisons with DiffDog Server
You can automate diff report generation for CSV-to-CSV and CSV-to-database comparisons using DiffDog Server. Once installed on a machine in your network, DiffDog Server runs as a service. You can run comparisons by invoking DiffDog Server either from a command line interface or through executable scripts.
If you need to perform the same data comparison frequently, you can set it up once in DiffDog and save it to a database data comparison (.dbdif) file, which can be automatically executed by DiffDog Server.
DiffDog Server is available for Windows, Windows Server, Linux, and macOS platforms.
Download a free trial to find out for yourself how easily DiffDog can help you compare CSV files!