Authoritative Definitions of the Semantic Web

Revised: Oct 4 2007

News Flash: Wikipedia's and W3C's Definitions Of Semantic Web Have Changed !

Wikipedia has a new definition of the Semantic Web.

The semantic web is an evolving extension of the World Wide Web in which web content can be expressed not only in natural language, but also in a format that can be read and used by software agents, thus permitting them to find, share and integrate information more easily. [Foornote 1] It derives from W3C director Sir Tim Berners-Lee's vision of the Web as a universal medium for data, information, and knowledge exchange.

The vision of the Semantic Web is to extend principles of the Web from documents to data. This extension will allow to fulfill more of the Web’s potential, in that it will allow data to be shared effectively by wider communities, and to be processed automatically by tools as well as manually.

Note the mention of "software agents" - that is new, as is"wider communities". "Knowledge exchange" is also new, although it seem to be presented as Berners-Lee's vision.

[ Footnote 1] is a link to "W3C Semantic Web FAQS - What is the Semantic Web". It has also changed in a significant way ( see next section ).

Not that the old definition is to be dispised. The old definition of Semantic Web was more oriented toward answer what are the parts and activities of the SemWeb than who and why of it.

The Semantic Web , also known loosely as Web 3, is a project that intends to create a universal medium for information exchange by putting documents with computer-processable meaning (semantics) on the World Wide Web. Currently under the direction of the Web's creator, Tim Berners-Lee of the World Wide Web Consortium, the Semantic Web extends the Web through the use of standards, markup languages and related processing tools.

In the old definition, markup languages were especially important and, in fact, that has not changed. Markup languages are still the "body" of the SemWeb, but the new definition is more focused on 'who and why' rather than on 'what'. In other words, it shows the signs of a maturing technology.

The W3C Definition of the Semantic Web

The W3C has revised its definition of the Semantic Web in W3C Semantic Web FAQS - What is the Semantic Web ?. Again the drift is away from "parts" and toward what the parts will be doing, interlinking communities and solving problems in specific domains. The new definition states:

The vision of the Semantic Web is to extend principles of the Web from documents to data. This extension will allow to fulfill more of the Web’s potential, in that it will allow data to be shared effectively by wider communities, and to be processed automatically by tools as well as manually.

Note the mention of "wider communities". again. It continues ( slightly re-formated ):

Semantic Web technologies can be used in a variety of application areas; for example:

  • in data integration, whereby data in various locations and various formats can be integrated in one, seamless application;
  • in resource discovery and classification to provide better, domain specific search engine capabilities;
  • in cataloging for describing the content and content relationships available at a particular Web site, page, or digital library;
  • by intelligent software agents to facilitate knowledge sharing and exchange;
  • in content rating;
  • in describing collections of pages that represent a single logical “document”;
  • for describing intellectual property rights of Web pages ...

There are several other interesting FAQ questions on the new page, such "Artificial Intelligence?".  Oh oh.

The old definition stressed that the Semantic Web is a "web of data" and that the important features were common formats for interchange of data and a language for recording how the data relates to real world objects - generally the subjects of markup languages and standards.  At the foundations, this is still true despite the shift toward asking 'why is it' rather than just asking 'what is it?'.

Standards

There are several layers of base standards that define the "body" of the Semantic Web. Some focus on the Web resources such as Resource Description Framework ( RDF ) - others on metadata such as eXtendend Markup Language ( XML ). Still others concentrate on expressing query logic ( SPARQL ) or defiining rules and rule interchange ( Rule Interchange Format ). Some specialize in business-specific rule logic such as Business Rules Markup Language (BRML).

An Unending Web of Data or Metadata ? Or Both ?

Another important point in the original ( not the new ! ) W3C definition was the facility that "allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing".

From the Wikipedia on the subject of data: "In general, data consist of propositions that reflect reality. A large class of practically important propositions are measurements or observations of a variable. Such propositions may comprise numbers, words, or images".

Clearly, most of the 'data' in the Semantic Web will be a combination 'observations' and data about data, that is meta-data about the data, such as the domain of the data ( a price series for hog bellies ), the type ( decimal numbers ) and format ( currency ) as well as the data itself ( 2.34, 2.13, etc. ). So, in effect, the W3 definition blurs the distinction between data and meta-data that is usual in information systems. There are no separate data structures for data and meta-data. It is all data, even the metadata describing the data. This is implemented with a sophisticated interface and data definition language, XML.

The second interesting thing is the implication of unending journey of the person or machine through many distributed databases connected by a common query or subject, something like a highly distributed subject database.

In fact, there is a third item of interest. OMG is using the term 'objects', which may not identify the growing technical infrastructure of services via SOA as the foundation of the Semantic Web.

And, the fourth item of interest ( out of two ) is what is described as "the language for recording how the data relates to real world objects", that is the representation language. The primary means of communication between programs and programs or people and programs is XML, including the RDF and OWL standards, which are implemented in XML.