|
Ashu Bhatnagar is CEO of Good Morning Research, a Softpark company that specializes in building Semantic XBRL technology. The machine automates XBRL tagging of Excel data in RDF format with one-click Save As XBRL functionality. Mr. Bhatnagar moderates the Semantic XBRL group on LinkedIn.
Widespread and almost uniform consensus exists on the overarching high-level vision of XBRL and the benefits associated with the concept of XBRL-tagged financial data. However, when it comes to this vision’s implementation, the execution processes, methods, and available software tools are as diverse as the players involved… and there are no clear favorite front-runners.
For example, let’s look at the choice of methods for one of the key steps of the XBRLization process that involves tagging of financial data from a source document with appropriate XBRL tags as predefined in an accepted taxonomy document.
Broadly categorized, the XBRL data tagging processes fall under the following four generic methods:
1. One-Click Automated Data Tagging Method
In an ideal situation, the filers would continue to use their in-house and existing financial reporting tools, including Microsoft Office Excel, and simply click on a button for Save as XBRL. This method of XBRL data tagging would resemble the method when we save a Word document with Save As HTML function for publishing in Web-ready format, or Save As PDF for publishing in a printer-ready format. Clicking on such a button or menu option in an existing software tool would be a nirvana from the usability point of view, because it hides all the complex mark-up details from the user and shifts the arduous and error-prone burden of data tagging from humans to the machine.
Unfortunately, such automated tagging of financial data with taxonomy-defined XBRL tags requires semantic interpretation of data. In other words, it requires computers to recognize financial data almost as well as an accountant trained to interpret matching taxonomy-defined financial metrics. This would be a feat that borders on artificial intelligence and expert systems.
Almost all current knowledge management tools lack the sophistication required to deliver on such a capability. Promising work is being done by researchers in the Semantic Web domain who are actively pursuing knowledge automation tools and technologies. A handful of new patents have also emerged that are pushing the boundaries in this area, but we’re still some time away from converting these "bleeding edge" technologies into fully ready-to-use expert systems.
2. Template/Form Based Data Tagging Method
This method involves creating a small set of pre-approved templates which look like online forms. These forms have XBRL tags pre-attached to the data fields which start out as blank spaces that are filled-in by the filers manually entering appropriate data by rekeying or cutting-and-pasting from other in-house reporting tools.
This form-based method has been adapted by a few regulators around the world. The Securities and Exchange Board of India together with Bombay Stock Exchange are among them, having mandated one hundred listed companies to file in XBRL using its CorpFiling system, which uses pre-tagged blank template forms.
The advantage of this form-based system is that the learning curve associated with XBRL taxonomy, selection of appropriate tags, and validation rules is restricted to a select few professionals who design the templates and forms. In theory, this method should be easier for end-user adoption, because it almost entirely hides the complexity and learning curve of XBRL from the filers who actually fill the forms with their data.
In practice, the disadvantages of this one-size-fits-many method are several, including:
l These generic template-based forms rarely map neatly to any individual company’s filing. In order to keep the complexity of the form low, a minimalist approach is taken whereby fewer than one hundred financial metrics are embedded from an available list of over several thousand metrics as defined by US GAAP or IFRS GAAP. Taxonomy extension, by design, is strongly discouraged and company data forced to fit a generic form.
l With growing use, necessity forces the need to increase the number of templates or forms. When these templates’ count grow beyond a few dozen, the true fragile nature of these templates takes over, and significant IT resources and costs need to be expanded for managing this environment as a document management system with version controls, access privileges, maintenance, and support.
l The transparency of data with detailed drill-down is negatively affected to such an extent owing to limited metrics being tagged that the data has very limited use for a more sophisticated institutional investor or a sell-side/buy-side analyst.
3. Drag-And-Tag Data Tagging Method
This is a more practical approach, a wizard tool-based method where the filers bite the bullet and commit to educate themselves on the subject of XBRL with reasonable level of understanding. Software tools from several vendors now exist that enable a step-by-step wizard-like process that allows appropriate metrics to be selected from a pre-selected taxonomy list, generally organized in a collapsible tree structure for easy navigation. These selected metrics are dragged and dropped, one at a time, by hand, on the appropriate data in a source document (such as Excel or Word), thereby creating the XBRL mapping or tagging the data.
While the manual process does sound onerous, it gets better with time when the subsequent tagging exercises are able to reuse the mappings from the previous exercise; this greatly simplifies and speeds up the process. The most important criticism of this process remains the time-consuming, error-prone, manual tagging effort, which can only be outsourced to resources that have skills in both accounting and IT. Ensuring high degree of QA and validation checking becomes a significant part of the effort in this tagging method.
4. Automated Semantic Extraction and Man-Machine Expert System Methods
This method involves machine-automated tagging and is based on semantic extraction of meaningful and relevant tags based on prior training of the system with expert knowledge.
One of the many new and emerging examples of this method includes OpenCalais from Thomson Reuters, which automatically creates rich semantic metadata for the submitted content in less than a second. Using natural language processing, machine learning, and other methods, Calais analyzes the document and finds the entities within it. Calais goes well beyond classic entity identification and returns the facts and events hidden within the text as well. However, it is not clear if Calais will remain focused on tagging unstructured text, such as business news, or will it also be used for XBRL tags for company filings as well.
Similarly, the World Wide Web Consortium (W3C) defines the Semantic Web as a vision for the future of the Web, in which information is given explicit meaning, making it easier for machines to automatically process and integrate information available. The Semantic Web will build on XML’s ability to define customized tagging schemes and RDF’s flexible approach to representing data. The first level above RDF required for the Semantic Web is an ontology language, named OWL, to formally describe the meaning of terminology used in Web documents. While machine-automated ontology development tools are an area of advanced research activity and not yet ready for commercial production, early results show sufficient promise that manual tagging of XBRL tags is likely to be a short-lived practice.
In summary, in answering which tagging method is best of all, it seems that over the next few years all of the above methods will be in use at the same time and serve different needs of different users. In the long run, however, machine-automated tagging technology should mature enough to become the primary XBRL tagging method.
|