Stepping towards a Semantic Wikipedia

(Update, check out our publication in Database for a full-length, peer-reviewed version of this article.)
It is now possible to specify the nature of the relationships between things described by Wikipedia articles directly in the context of the article.  The image below is a screenshot taken a few moments ago of the Phospholamban article on Wikipedia (with excited arrows added).  The infobox at the top right is dynamically generated from semantic markup in the article using a Wikipedia user script written by my colleague Sal and accessible from his user page

Semantic markup now live in Wikipedia

How it works
Wikilinks in the article have been annotated with the kind of relationship that they indicate using the Semantic Wiki Link (SWL) template. The template allows any Wikipedia editor (including you!) to specify the type of connection that exists between the article where the link is being placed and the target of the link. This information is encoded following the microformat pattern. Essentially, we encode the meaning of the links in class attributes that wrap the link.
This works as follows:
  1. Editor inserts a SWL into a Wikipedia article with this syntax:
    • {{SWL | target=protein kinase S | label=PKA | type =substrate_for}}
    • This means "the concept where you see this link is related to protein kinase A (labeled PKA) with the relationship type "substrate for". So, in the example above, it says: "Phospholamban is a substrate for PKA".
    • The {{}} denotes a Wikipedia template. Templates can take parameters (here parameters are separated by |'s) and use them to produce new WikiText dynamically which, in turn, is rendered as HTML when a page is loaded.
  2. When the page is rendered, the template generates the following semi-semantic HTML markup (with some formatting omitted for clarity):

  3. Programs, like the script that generated that infobox and added the green highlighting, can look for the SWL class attribute can then extract the meaning of the SWL links based on the class of its first child element - here "substrate_for".
  4. In addition, when the template is processed it adds a category to the article it is placed on that corresponds to the relationship type. (See for example, the category for substrate.) This category provides a logical grouping (e.g. all things that serve as a biochemical substrate) but, perhaps more importantly, it provides a place to record the meaning of the relationship. This meaning can be defined as text, but can also be defined through reference to external sources such as ontologies on the semantic web.


Why its awesome
This pattern makes it possible for the vast number of Wikipedia users to simply and easily contribute machine readable content to the Web. This enormous user community collaboratively created the world's largest encyclopedia and one of the most valuable websites on the planet. Who better to help build the semantic Web? While, technically, the microformat-like implementation leaves much to be desired in terms of its robustness and its precision, it is a solution that can work. This is demonstrated by the success of projects like Google's recent recipe search that are based entirely on simple microformats.
How you can help
This is a new idea that not everyone in Wikipedia will be thrilled to see. They will claim that the SWLs will clutter the markup and will not provide enough value to make it worth it because Wikipedia itself does not support semantic links. You can help by:
  • Using the template to enhance articles.
  • Writing code that makes use of the added meaning such as user scripts, aggregators, or scripts that import the relationships into other structured repositories like FreeBase or DBpedia.
  • Helping define the nature of the semantic links (at their associated category pages) and mapping them to properties defined in ontologies.
  • Discussing (and voting for) the idea on the various 'talk pages' on Wikipedia.

Why its awesome again
Did I mention that this pattern makes it possible for the vast number of Wikipedia users to simply and easily contribute machine readable content to the Web? Thats pretty cool if you think about it...

Update: To make the user script work for you so you can see the infobox, do this:

  • Create a Wikipedia user account if you don't have one already
  • Go to/create your user page. (e.g. my user name there is i9606 and my user page is located at http://en.wikipedia.org/wiki/User:I9606 )
  • Edit your user page add this to it - 
[[/common.js]]
  • Visit your new common.js add this to it - 
importScript('User:Sal9000/SWLinfobox.js');
  • When that is saved, you should be all set. Now go visit an enhanced page like Phospholamban and look for the green box at the upper right corner.
The script will run whenever you access a Wikipedia page while you are logged in to your account. Its a lot like GreaseMonkey, (which I've had some fun with in the past) but its not tied to your browser and will only work on Wikipedia. If enough people like a user script, it can be added to the default set of Wikipedia user preferences.. which would be pretty cool ;). The best part? You (or a programmer friend of yours) can write your own script and make it do whatever you want with the data!