The Semantic Web

How did the Semantic Web begin ?

As most things do, the Semantic Web started small. An article in the May 2001 issue of Scientific American described a futuristic world where software agents automatically schedule an entire series of medical treatments via the Semantic Web.

The article states that:

The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users

Software agents have been a hot topic for about the last 10 years. As a simple definition, they are discrete pieces of software that do useful things in flexible ways. There are several W3.org and commercial variations of an Agent Standard. Agents will often use rules and other powerful conceptual structures to implement complex tasks.

The Scientific American article continues:

The real power of the Semantic Web will be realized when people create many programs that collect Web content from diverse sources, process the information and exchange the results with other programs. The effectiveness of such software agents will increase exponentially as more machine-readable Web content and automated services (including other agents) become available. The Semantic Web promotes this synergy: even agents that were not expressly designed to work together can transfer data among themselves when the data come with semantics

 

An Aside on Software Agents

The Wikipedia defines a software agent as: "... an abstraction, a logical model that describes software that acts for a user or other program in a relationship of agency[1]. Such "action on behalf of" implies the authority to decide when (and if) action is appropriate. The idea is that agents are not strictly invoked for a task, but activate themselves".

Despite the mention of 'abstraction' in the definition, agents are generally credited with making communication between people and machines easier and more natural than specialized rule languages or knowledge 'templates'.

The definition continues with the different types of agents including ( slightly reformatted for clarity ):

Intelligent agents (in particular exhibiting some aspect of Artificial Intelligence, such as learning and reasoning),

Multi-agent systems (distributed agents that do not have the capabilities to achieve an objective alone and thus must communicate),

Autonomous agents (capable of modifying the way in which they achieve their objectives),

Distributed agents (being executed on physically distinct machines),

Mobile agents (agents that can relocate their execution onto different processors).

Two types of agent, distributed and mobile agents, are classified according to the operating environments where they run.

However, the other three types - intelligent, multi-agent and autonomous agents - are quite different. They have the ability to communicate, reason and learn. In other words, their effectiveness is enhanced by their ability to employ language and knowledge. Particularly, intelligent agents must have extensive rule processing capabilities in order to drive their inferencing capabilities. They see the world from a rule-based perspective,.

There may be an easier way of representing the types of agents listed above by recognizing that the operating environment and degree of agent intelligence are independent.

  Distibuted Mobile
Intelligent agent Intelligent, lives on a server Intelligent, moves between servers
Multi-agent Many simple agents cooperate, lives on a server Many simple agents cooperate, moves between servers
Autonomous Agent behavior not pre-determined, lives on a server Agent behavior not pre-determined, moves between servers

 

Later, it may be useful to return to the idea of agents as a packaging of semantic services in a form appropriate to different types of knowledge-intensive tasks. For now, what is important is that the Semantic Web Services often use software agents to accomplish specific tasks.

 

The Vision

There are three important elements to the vision described above.

There seem to be four criteria important in measuring the success of the Semantic Web initiative.

 

The Fuzzy Vision

To what degree has the article's vision of the future been realized in the five years since it was published ? It's difficult to say. To some extent, the definition of the word "Semantic Web" is so loose that it may be futile to try to pin down with certainty whether a particular feature is part of the 'semantic web' application. The answer may be in the eye of the beholder.

Rather than definitions, the next few sections will focus on different, sometimes competing visions of the Semantic Web.

 

Early Definitions / Visions of the Semantic Web

The Object Web

The ideas of a "Semantic Web" had been around in various guises for several years prior to the Scientific American article. One precursor of the Semantic Web was the idea of "Web Objects" or the "Object Web" kicking around in the mid-1990s. The OMG was a big part of that phase of development of the technical infrastructure.

It really began to take off when the World Wide Web Consortium (W3C) became the authoritative source of standards for what was often called the Object Web. From the standards emerged new technologies such as CORBA and Java, enabling something like a web of communicating objects running on different servers. The terms "distributed computing" or "distributed systems were often used in this context.

While the technology can be quite complex the basic vision is simple enough.

 

The Object Web

 

Different chunks of software running on different servers send each other messages, not too complex in principle, however devilish the details might be. Much the technology of "Service Oriented Architecture" is built upon the foundation of the original OMG object standards of the 1990s. The difference is the shift from objects to services. The focus is on the interface between the pieces of software rather than whether the software is an object or not.

 

Authoritative Definitions of the Semantic Web

Revised: Oct 4 2007

News Flash: Wikipedia's and W3C's Definitions Of Semantic Web Have Changed !

Wikipedia has a new definition of the Semantic Web.

The semantic web is an evolving extension of the World Wide Web in which web content can be expressed not only in natural language, but also in a format that can be read and used by software agents, thus permitting them to find, share and integrate information more easily. [Foornote 1] It derives from W3C director Sir Tim Berners-Lee's vision of the Web as a universal medium for data, information, and knowledge exchange.

The vision of the Semantic Web is to extend principles of the Web from documents to data. This extension will allow to fulfill more of the Web’s potential, in that it will allow data to be shared effectively by wider communities, and to be processed automatically by tools as well as manually.

Note the mention of "software agents" - that is new, as is"wider communities". "Knowledge exchange" is also new, although it seem to be presented as Berners-Lee's vision.

[ Footnote 1] is a link to "W3C Semantic Web FAQS - What is the Semantic Web". It has also changed in a significant way ( see next section ).

Not that the old definition is to be dispised. The old definition of Semantic Web was more oriented toward answer what are the parts and activities of the SemWeb than who and why of it.

The Semantic Web , also known loosely as Web 3, is a project that intends to create a universal medium for information exchange by putting documents with computer-processable meaning (semantics) on the World Wide Web. Currently under the direction of the Web's creator, Tim Berners-Lee of the World Wide Web Consortium, the Semantic Web extends the Web through the use of standards, markup languages and related processing tools.

In the old definition, markup languages were especially important and, in fact, that has not changed. Markup languages are still the "body" of the SemWeb, but the new definition is more focused on 'who and why' rather than on 'what'. In other words, it shows the signs of a maturing technology.

The W3C Definition of the Semantic Web

The W3C has revised its definition of the Semantic Web in W3C Semantic Web FAQS - What is the Semantic Web ?. Again the drift is away from "parts" and toward what the parts will be doing, interlinking communities and solving problems in specific domains. The new definition states:

The vision of the Semantic Web is to extend principles of the Web from documents to data. This extension will allow to fulfill more of the Web’s potential, in that it will allow data to be shared effectively by wider communities, and to be processed automatically by tools as well as manually.

Note the mention of "wider communities". again. It continues ( slightly re-formated ):

Semantic Web technologies can be used in a variety of application areas; for example:

  • in data integration, whereby data in various locations and various formats can be integrated in one, seamless application;
  • in resource discovery and classification to provide better, domain specific search engine capabilities;
  • in cataloging for describing the content and content relationships available at a particular Web site, page, or digital library;
  • by intelligent software agents to facilitate knowledge sharing and exchange;
  • in content rating;
  • in describing collections of pages that represent a single logical “document”;
  • for describing intellectual property rights of Web pages ...

There are several other interesting FAQ questions on the new page, such "Artificial Intelligence?".  Oh oh.

The old definition stressed that the Semantic Web is a "web of data" and that the important features were common formats for interchange of data and a language for recording how the data relates to real world objects - generally the subjects of markup languages and standards.  At the foundations, this is still true despite the shift toward asking 'why is it' rather than just asking 'what is it?'.

Standards

There are several layers of base standards that define the "body" of the Semantic Web. Some focus on the Web resources such as Resource Description Framework ( RDF ) - others on metadata such as eXtendend Markup Language ( XML ). Still others concentrate on expressing query logic ( SPARQL ) or defiining rules and rule interchange ( Rule Interchange Format ). Some specialize in business-specific rule logic such as Business Rules Markup Language (BRML).

An Unending Web of Data or Metadata ? Or Both ?

Another important point in the original ( not the new ! ) W3C definition was the facility that "allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing".

From the Wikipedia on the subject of data: "In general, data consist of propositions that reflect reality. A large class of practically important propositions are measurements or observations of a variable. Such propositions may comprise numbers, words, or images".

Clearly, most of the 'data' in the Semantic Web will be a combination 'observations' and data about data, that is meta-data about the data, such as the domain of the data ( a price series for hog bellies ), the type ( decimal numbers ) and format ( currency ) as well as the data itself ( 2.34, 2.13, etc. ). So, in effect, the W3 definition blurs the distinction between data and meta-data that is usual in information systems. There are no separate data structures for data and meta-data. It is all data, even the metadata describing the data. This is implemented with a sophisticated interface and data definition language, XML.

The second interesting thing is the implication of unending journey of the person or machine through many distributed databases connected by a common query or subject, something like a highly distributed subject database.

In fact, there is a third item of interest. OMG is using the term 'objects', which may not identify the growing technical infrastructure of services via SOA as the foundation of the Semantic Web.

And, the fourth item of interest ( out of two ) is what is described as "the language for recording how the data relates to real world objects", that is the representation language. The primary means of communication between programs and programs or people and programs is XML, including the RDF and OWL standards, which are implemented in XML.

 

Vision #1 - The Semantic Web as Advanced Search Engines

Swoogle

A very interesting phenomena is Swoogle, a sort of Google for the Semantic Web

There is also an interesting comment on the Swoogle Blog, probably belonging more properly to previous section.

One vision that many of us have is that the Web is evolving into a collective brain for human society, complete with long term memory (web pages), active behaviors (web services and agents), a stream of consciousness (the Blogosphere) and a nervous system (Internet protocols).

SHOE

An ambitious example from several years ago ( from before the time of the W3 Semantic Web project and no longer maintained ) is SHOE, which may still have a few lessons for the SW five years later.

SHOE is a small extension to HTML which allows web page authors to annotate their web documents with machine-readable knowledge. SHOE makes real intelligent agent software on the web possible.

HTML was never meant for computer consumption; its function is for displaying data for humans to read. The "knowledge" on a web page is in a human-readable language (usually English), laid out with tables and graphics and frames in ways that we as humans comprehend visually.

Unfortunately, intelligent agents aren't human. Even with state-of-the-art natural language technology, getting a computer to read and understand web documents is very difficult. This makes it very difficult to create an intelligent agent that can wander the web on its own, reading and comprehending web pages as it goes.

SHOE eliminates this problem by making it possible for web pages to include knowledge that intelligent agents can actually read.

This is an very straightforward and concise description of the need for applying Semantic Web technology to the difficulties of processing HTML, for both people and computers.

Vision #2 - The Semantic Web as a Wiki on Steroids

The End of Google ?

On June 26, 2006 at 5:20am EST, the Evolving Trends web site published an article entitled "Wikipedia 3.0: The End of Google?". By June 28th, two days later, the article had reached 650,000 people - by July 1st, it was being referenced by over 6,000 other sites and had been read by close to 2,000,000 people.

This phenomena demonstrated two things. First, it demonstrated the ability of the Web to generate a tremendous surge of interest in a fairly specialized subject at short notice by selecting pithy, controversial titles.

 

Google as a Knowledge Bottleneck

Secondly, and more importantly, it seems to demonstrate a growing dissatisfaction with Google approach to classifying knowledge via search engines and indexing. Certainly anyone who has studied the efficacy of the Google indexing paradigm knows that a well-formed Google search may reveal no more than 10% of the interesting sites on a given subject, depending on circumstances. While the resources of Web with a hundred million or so pages was readily accessible by Google, a Web of ten billion pages has apparently overwhelmed the basic indexing and search technology. The information you are looking for is probably out there somewhere, but it may take a long struggle and good luck in order to find it.

Semantic Wikis may provide an alternative to the Google 'knowledge bottleneck'. There is also a certain political dimension to the restlessness with Google's near monopoly on web search combined with an emerging role as global censor of inconvenient truths. This may be fueling the dissatisfaction.

On the other hand, Google seems to be well aware of the bottleneck problem and is struggling mightily to bring simple semantic functions to the desktop that will be usable in the Semantic Web.

There is a well-written and funny parody of a future Google's titled August 2009: How Google beat Amazon and bay to the Semantic Web. In addition to presenting an excellent vision of the Semantic Web in the year 2009 ( written in 2002, but not that far away now ), it also expresses some of the deeper concerns about excessive centralization and control of information sources in a free society.

 

Vision #3 - The Semantic Web as a Semantic Network

An old concept from AI, the semantic network, may have a second life in the Semantic Web. In a semantic network, ontologies of distinct types are interpreted within evaluation networks that get their meaning from the semantic relations in which they participate. In a sense, the subjects of the ontolgies discover their roles by consequence of relationships rather than by declaration or assignment. This can be seen as a direct result of RDF 'entailment rules' and the consequent 'entailment nets'.

In the biological sciences, there is a huge movement afoot to create a workable set of medical ontologies. According the definition of 'semantic web' provided by Genomics & Proteomics ( nestled between 'self-organization and 'semiochemical', the goal is the "unification of all scientific content by computer languages and technologies that permit the interrelationships between scientific concepts to be communicated between machines".

Later sections will cover the subject of semantic networks and the Knowledge Web in great detail.

 

The Semantic Web is NOT Web 2.0 ... Well, Not Exactly

Another tier of terminology involves the "Web 2.0", or sometimes "Web 3.0". It's not entirely clear what either term means. One presumes that the earlier advances of "Web 1.0" and "Web 1.5" technology were restricted to improved content, database integration, graphical widgets, etc. However, far from clarifying definitions, using the term "Web X.0" seems to generate yet another level of debate on its own.

The Wikipedia has an fairly decent definition of "Web 2.0", much improved since the beginning of 2007.

Web 2.0, a phrase coined by O'Reilly Media in 2003[1] and popularized by the first Web 2.0 conference in 2004,[2] refers to a perceived second generation of web-based communities and hosted services — such as social-networking sites, wikis and folksonomies — that facilitate collaboration and sharing between users.

[ skipping a few paragraphs ]

As used by its supporters, the phrase "Web 2.0" can also refer to one or more of the following:

  • The transition of web sites from isolated information silos to sources of content and functionality, thus becoming computing platforms serving web applications to end-users
  • A social phenomenon embracing an approach to generating and distributing Web content itself, characterized by open communication, decentralization of authority, freedom to share and re-use, and "the market as a conversation"
  • Enhanced organization and categorization of content, emphasizing deep linking
  • A rise in the economic value of the Web, possibly surpassing the impact of the dot-com boom of the late 1990s

Earlier users of the phrase "Web 2.0" employed it as a synonym for "Semantic Web". The combination of social-networking systems such as FOAF and XFN with the development of tag-based folksonomies, delivered through blogs and wikis, sets up a basis for a semantic web environment.

It is farily clear in the defintion that Web 2.0 may represent a step in the direction of the Semantic Web. However, it looks as if the Web 2.0 as envisioned lacks the capability for machines to understand the meaning of things and communicate that meaning to people or other machines.

On the other hand, I think it is becoming clear that the Web 2.0 initiative will represent a major step in that direction, and perhaps far more than a small step in term of the way the Semantic Web is used by ordinary, non-technical people in their everyday activities. It will require powerful and sophisticated user interfaces to make the new semantic universe accessible to the non-technical 95% of people in the world rather than the technically inclined 5% of the people. Web 2.0 technology seems to be playing a major role in constructing these interfaces.

 

Issues

 A More Humane Machine Readable Language ?

The proponents of these markup languages represent their creations as an improvement over implementing rules by programming logic, and that is true from the standpoint of flexibility, but I'm not sure if they are any more readable than programming logic to ordinary human beings. But this make the end user completely reliant on ontology editors to interact with the final representation of the knowledge. Are markup languages the strength or the soft underbelly of the Semantic Web ?

It's a critical factor and, in my opinion, there doesn't seem to be a simple way to express rules in both a machine-readable and human readable form at this point of time, although there are some interesting efforts toward Semantic Web editors of various sorts.

 

Does It Translate ?

There is also an strong international flavor to the Semantic Web, even at this early stage. Unlike the Web 1.0 and maybe Web 2.0, the Semantic Web "3.0" is not going to be an English only affair  ( or nearly so ) as it was for the previous Web x.x initiatives.  Translation to and from some sort of structured form ( English or German or Spanish or whatever ) is probably still an absolute requirement for a multi-lingual rule language that is also usable by the majority of ordinary human beings.