|
We all use Google for Web searches on a daily basis and admire the simplicity of its front-end user interface. It’s nice, clean, fast, and simple.
Behind this simplicity lie sophisticated index databases and advanced search technologies, but we as users don’t need to know or understand these. All we need to know are smart keywords that help direct our searches from hundreds of billions of marked-up HTML pages scattered across the global Inter
When we try to search using regular SQL database search technologies, though, we run into difficulties. Why? Because most of this web ntent is in distributed HTML flat files and isn’t anized in any centralized database with well defined data structures and schema. It’s like a world full of roads with no roadmaps. Go disver!
Search engines like Google, Ask, and others find the ntent that matches with our queries by building and employing centralized databases that ntain metadata, where every keyword acts as a tag and has fast and efficient links to rresponding webs. In other words, a search engine acts like a very knowledgeable guide for us, responding to our queries with found/not found answers based on the Inter roads it has aess to and has crawled before.
Why not use such a powerful search front-end to query financial research data? During my experience working with both sell-side and buy-side research analysts, there has been a long standing request to build such a tool, but until recently, the short answer to this request has been “No!”
No, because it’s technically too difficult or it’s too expensive.
No, because Google deals with text and not data, which has both ntext and meaning. Data is far more challenging to search, because even when it’s on the Web, it is marked up with HTML as text, not as data, thereby losing its ntext for meaningful search.
No, because there are no generally aepted standard financial dictionaries, or taxonomies, that define terms such as revenue, sales, or inme as synonyms.
Until recently this list of No’s has been long. The good news is that the list is now shrinking quickly with the increasing adoption of XBRL and EDGAR standard taxonomies and the release of several XBRL tools.
All that is needed to acmplish powerful search of financial research data is to subscribe to the SEC’s XBRL filings as free RSS feeds, extract XBRL data into our own relational or Google-like index databases, and use SQL to find answers to our queries. As an alternative, we uld subscribe to third-party data services firms like Bloomberg, Thomson Reuters, Factset and others that would add XBRL data to their current aggregate data and ntinue to offer this as a service.
The news gets even better when we add SPARQL, a W3C specified query language for RDF, to XBRL and Linked Data.
Jim Rapoza, Chief Technology Analyst of eWeek, explains:
Called SPARQL this standard brings about a standardized SQL-like query language for the Semantic Web. And, like most Semantic Web standards, it is heavily based on RDF, although it also makes use of many Web services standards, such as WSDL.
SPARQL essentially nsists of a standard query language, a data aess protol and a data model (which is basically RDF).
Some people out there are probably thinking, So what? Sounds like just another search tool—big whoop. But there’s a big difference between blindly searching the entire Web and querying actual data models.
The ability of database queries to pull data from giant databases is pretty much the basis of a large number of enterprise applications. No one argues about the value of being able to write a query in an application that can pull relevant customer and product data.
Now, imagine writing a similarly small application that does the same thing—only with data stored across the entire World Wide Web.
That would include all the mpanies who not only file in XBRL but also, in nformance to SEC requirements, will be posting XBRL data on their own mpany webs.
In essence, with SPARQL, we can choose to build centralized databases to query XBRL data, but we don’t have to. We simply can point our queries to so-called SPARQL endpoints that — unlike traditional database requests that must be under one administrative ntrol — can span the Web over thousands of mpany webs with XBRL data and obtain results as if they came from one centralized database. Imagine the st savings in not having to build and maintain a huge and growing centralized database.
Applications for publishing XBRL as Linked Open Data are limited at this time, but they are emerging. As one example, Roberto García and Rosa Gil describe their work undertaken at a Research Group at Universitat de Lleida, Spain, which extracted 1.34 million triples from 612 XBRL filings. (Triples are semantic data elements in RDF format.) The process of extraction is machine automated and results in transforming XBRL data into Semantic Web formatted RDF data.
In addition, sufficient examples in the current Web exist to give us insight into how the user experience might look when Semantic XBRL applications go into production use. Next time you search for the best flight for your air travel on s such as Orbitz, Kayak, or Farepare, take a pause and observe that the flight schles, prices and airline details are being pulled not from any one centralized database but from a variety of airline databases, in real time, to match your exact itinerary requirements, thanks to some very specialized and mplex technologies.
In summary, SPARQL makes Semantic XBRL searches possible on-demand across a distributed web space while simplifying front-end design, and keeping the mplexity of technology hidden and out of sight from end users. |