SPARQL/RDF Is a Cost-Effective Way to Search XBRL Data
Ashu Bhatnagar is CEO of Good Morning Research, a Softpark company that specializes in building Semantic XBRL technology. The Good Morning Research machine automates XBRL tagging of Excel data in RDF format with one-click Save As XBRL functionality. Mr. Bhatnagar also moderates the Semantic XBRL group on LinkedIn.
Serious financial research analytics rarely start and finish by querying a single database source, no matter how comprehensive that database source may be. Instead, the most common practice involves accessing data from multiple databases and information sources; aggregating, normalizing, and making appropriate adjustments and calculations to the data; and co-mingling it to make it ready for meaningful queries and analytics.
The increasing use of XBRL and standardized taxonomies is not only making source data more normalized and more comparable than before; it is also rendering it amenable to far more efficient and error-free data manipulation, as data re-keying is eliminated.
However, these aspects of the XBRL transformation lie just at the beginning of the analyst’s work-flow, which demands greater automation and improved cost-effectiveness. In other words, there is still more work ahead before this data is ready for analytical queries and alpha mining.
For example, a query to find fourth-quarter revenue data of several large-cap metal, mineral, and mining companies around the globe for a particular year from more than one data source would require the analyst to (a) normalize currencies to a common currency, (b) make reasonable adjustments to account for hyper-inflationary country data, and, finally, (c) apply business rules to account for common differences in companies’ fiscal years (e.g., December versus March year-ends). Such differences make apples-to-apples comparisons on raw data less meaningful and useful.
Enter the Semantic Web technologies of RDF and SPARQL, which create a pathway for cost-effective, efficient, and powerful data analysis:
1. The XBRL file set — including file, schema, and linkbases – comprises XML files and is therefore highly suited for automated and error-free transformation into RDF/XML format. At this point, we’re still in the “raw data” stage.
2. Automated machine processing enables aggregating these RDF files into a RDF data store as SPARQL endpoints.
3. The use of SPARQL, a powerful contextual interactive language and tool specifically designed to search/query RDF data store.
SPARQL queries enable selection based on binding variables such as fiscal Q4 to calendar months and other pre-defined business rules, thereby returning a more comparable result set from an underlying raw RDF data store.
Simply put, the task of normalizing and financial adjustments can now be performed interactively and at runtime, which allows more advanced, customized, and meaningful queries to be run against an underlying as-reported raw data set.
Additionally, unlike a traditional SQL database with its well designed and controlled schema, this RDF data store is highly flexible and dynamic in nature, allowing co-mingling of other Excel datasets transformed into RDF format as well.
In summary, SPARQL and XBRL together enable a more flexible and cost-efficient data store with advanced query tools when compared to simpler queries on more complex and more expensive databases.
Editor’s note: Mr. Bhatnagar will be speaking on the topic of XBRL and SPARQL/RDF at the 2010 Semantic Technology Conference session on June 23 in San Francisco. His earlier posts on the Semantic Web and XBRL are Introducing Semantic XBRL, Semantic XBRL Data Search Using SPARQL, and Semantic XBRL Transparency, Verification, and ‘Raw Data Now’.