XBRL地区组织官方网站

关注我们：

︱

2023年6月6日

English version

　新闻动态

　其他国家、地区和多边机制

　IASB

　XBRL国际组织

　港澳台

　中国内地

xbrl > 新闻动态 > 中国内地 >

XBRL技术：解读信息的真正含义

2009-02-19

来源：HITACHI

编辑：

浏览量:

Andy Greener is a software architect at Her Majesty’s Revenue & Customs with responsibility for, among other things, interoperability and the use of standards. He is also strategy architect for HMRC’s pany Tax online service, which has mandated the use of XBRL for the aounts and tax putation ponents of all pany tax returns filed after March 31, 2011. He is a Chartered Engineer and Chartered Information Technology Professional with over 27 years’ experience as a software engineer. You can ntact Andy by email.

This is the first of a three-part post on XBRL and the Semantic Web.

What is the meaning of information?

We’re all used to absorbing information from documents and drawing our own nclusions based on the apparent facts or patterns of information we see there. Indeed, many professions are founded on this useful ability of the human brain, honed to near perfection by years of cation or training. But what are we actually seeing and understanding when we look at a document designed for human nsumption? How do we actually get at the meaning? Before we examine this last question, let’s take a small diversion back to some "first principles."

Fundamentally, when we look at a document we’re looking at an anised llection of symbols laid out in two-dimensional form. We can discern some meaning just from the style, size, or position of these symbols. We hope at least some of the symbols are in an alphabet we regnise, and we make an assumption or cated guess about the direction of travel of our eyes in absorbing symbol sequences. There should be llections of symbols (words, numbers, or emotins), llections of llections of symbols (sentences and paragraphs), punctuation marks, and so on. Some sentences may stand on their own in a larger or bolder font (titles or headings), some (numeric) symbols may sit alone, denoting a page or chapter number. The absence of symbols ("white space") may be as significant as some of the symbols themselves. All of this visual information provides cues for our attention and draws our eye (and therefore our brain) to the deeper and more meaningful information that the document is trying to nvey.

Speaking of deeper meaning, the llections of alphabetic symbols probably represent words (which stand for ncepts) in a familiar human language — words that are defined in a dictionary so that you can learn to associate meaning with those ordered llections of symbols. Some words may appear in a different dictionary altogether, and they may be subtly different visually as a result — we are all familiar with the italicised Latin phrases that pepper the text of the erudite (or pretentious!). Human beings are innately capable of anising llections of ncepts (at least verbally) into structures that are erned by a set of rules – i.e., a grammar — and which can nvey plex and subtle layers of deeper meaning as a result.

For a human being, then, understanding the meaning of a document involves many layers of (sometimes unnscious) information analysis, both visual and nceptual.

Imagine now that you are rendered blind. The two-dimensional nature of written documents is no longer apparent to you — those parts of your brain that unnsciously or nsciously manage all the visual cues, from font size to paragraph layout, from section titles to numbers, are now bereft of input and, as a result, defunct. Instead, a lleague is going to read the document to you from start to finish, in serial form. What you are going to hear is a one-dimensional stream of ncepts (words) and some supporting descriptions, thus: "page 1" - "start paragraph" - "What" - "is" - "the" - "meaning" - "of" - "information" - "question mark" - "end paragraph" - "start paragraph" … and so on.

Of urse, you no longer need to discern some things from visual cues — the order of the words and punctuation is now self-evident, whether you’re listening to Chinese, Arabic, or English, and you probably no longer care about the physical page structure of the document. But you do need to know, for instance, which words need emphasis, how the words have been llected together into sentences and paragraphs, and which sentences are actually headings, sub-headings, or quotes. Your lleague may “adorn” the text with explicit instructions, such as new paragraph, and may use audible cues (such as raising or changing his voice) to imply emphasised words or quotations.

Run-of-the-mill puters are of urse devoid of human senses (particularly the one we call "mon") and need an especially pedantic form of the document "serialisation" illustrated above to make any sense of prose, even if only to re-create the two-dimensional visual form we humans take for granted. It is still beyond the capabilities of most puters to divine any kind of meaning from a stream of words, let alone the deeper meaning we regularly infer.

Our thought experiment illustrates what early type-setters in the printing industry referred to generically as "mark-up" - a term that has found its way in to the world of puter-based document rendering, most prominently as the ‘M’ in "XML" and "HTML." Interspersed in a stream of words are "instructions" that put the two-dimensional information back into the document, allowing a puter to "render" the serialised document onto a screen or a piece of paper in a form that, visually at least, we humans are familiar with.

I’m not going to delve into the minutiae of mark-up here, but suffice to say that when you take a peek at the source of a web page, or any XML or XBRL document, all that stuff inside the angle-brackets is mark-up — instructions for a puter that make some sense of the document ntent. But, just like the layers of meaning that we perceive when we read a document, there are different kinds of mark-up, each with a different job to do.

In Part 2 of this post, to be published next week, Andy will discuss how display- and meaning-oriented markup languages culminate in the Semantic Web, which opens the information ntent in web pages to intelligent applications.