In this article we use stats from NFL.com and ESPN.com to show how easy it can be to process and analyze online data in new ways – even when it uses different metrics and is only available in textual format. We have seen in previous blog posts how easy it is to gather data from the Internet that is widely available in XML formats. But what about interesting data that is available online but not in an XML format, or data that is buried in legacy data processing systems and only available in textual report format? One such example involves quarterback ratings. The NFL has used a Passer Rating that rates quarterbacks solely based on a passer’s completions, attempts, touchdowns, and interceptions. ESPN introduced a new rating system this year called the Total QBR (Quarterback Rating). The Total QBR incorporates more data, including an expected points average and a clutch play index, that ESPN claims gives a more accurate measure of a quarterback’s performance. Let’s compare the rankings that these system produce to see if we can garner some useful information. For this example we’ll be using the data importing and analysis tools of the Altova MissionKit to compare the ratings. If you want to try this out yourself, the MissionKit is available to download for a 30 day free trial from the Altova web site. You can access the files used in this example here. The first thing we need is the raw data to analyze. Let’s use the entire 2010 season as a data source. We can get the table with Passer Ratings from NFL.com and then copy and paste it as a new text file. We can access a similar table of Total Quarterback Ratings from the ESPN web site and create a second text file. We now have two text files with tables of data in different orders. The next step is to combine the tables into one file and generate charts. First, we need a schema file for the destination of the data. In XMLSpy, we can create an XSD file quickly, and graphically, to contain a series of QB nodes with child nodes of first and last name, team, passer rating and rank, and total QBR and rank. Now, in MapForce, we open the text documents and use FlexText to parse the text and change it into a list of categories. We then build a mapping file in MapForce to map the data from the text files to the destination XML file. Built-in functions make it easy to extract the first and last names from the Player string, and a value-map will change the team abbreviation to a string (ARI is changed to Arizona Cardinals, ATL to Atlanta Falcons, etc.). We set the Priority Context in the test of our filters to make sure we get the correct set of data for each unique quarterback. Once we execute the mapping, we can save the resulting XML data file and use it as the source file in StyleVision to design a stylesheet. In this stylesheet, we create a table of the top ten ranked passers and charts showing the Passer Rating and the Total QBR graphically. Now that we have a visual representation of the rankings of the two rating systems, we can examine their differences and try to see which works better. For example, Peyton Manning was tenth in passer rating, but was second in Total QBR. This can be explained by the Total QBR taking clutch points into account and knowing that Peyton Manning had a few late game comebacks in the 2010 season. Since we now have a collection of files (the XSD file built in XMLSpy, the FlexText and mapping files from MapForce, and the stylesheet design created in StyleVision), we can update the text data files easily to analyze new sets of quarterback data. Later in the season, we can update the text tables with 2011 data, and allow the data to flow through the mappings and into the stylesheet to update the charts and see the rankings for the current season. This example focuses on numbers from the NFL, but this method can easily be adapted to other data sets and data sources that are accessed as text files as well as in other formats. You can learn more about how to use the products in the Altova MissionKit by taking our free online training courses.