a life of coding

Friday, June 12, 2009

Improving Common Tag : Worse is Better

Common Tag is an "open tagging format developed to make content more connected, discoverable and engaging" [commontag.org]. It mixes RDFa into XHTML to add metadata to specify metadata for a content block, most importantly a link to a common database entry that iconifies the topic like the one word tags used in many places. This is an improvement over word tags, which can be non-descript or ambiguous: does "apple" refer a fruit, a computer company, a record label, or someone's name? Tags that are acryonyms may have no meaning to the user. RDF is commonly referred to as the Semantic Web, because it helps computers link concepts together. Everyone wants the Semantic Web, but somehow it never happens... maybe its because RDFa looks like this:

<body xmlns:ctag="http://commontag.org/ns#" rel="ctag:tagged">
<span typeof="ctag:Tag" rel="ctag:means"

This is a very explicit piece of data. Much of its content is XML support structure. The semantic knowledge contained in there is:
  • the tag for this span is "en.u2" in freebase

The structure contained in there (removing any content) is:
  • xmlns:ctag="http://commontag.org/ns#"
  • rel="ctag:tagged"
  • typeof="ctag:Tag"
  • rel="ctag:means"
  • resource="http://rdf..com/ns/"

There are a lot of things that can break without actually removing any semantic meaning. If there were a typo anywhere in the structure above, your tag would be hopelessly borked - all that work (and bandwidth) for naught. More importantly, this format says that if I want people to understand my tags I have to embrace XML, and there are few things that I dislike as much as XML. Look at all the structure required because of XML, and the complicated tools that are required to manipulate XML!

Lets take that content and put in something sexier: simple HTML. Over at Hacker News, someone suggested
  • <p ctag="wikipedia/The_Beatles">We're talking about The Beatles here</p>

I like this direction, but its lacking in a gruesome way: ctag is not a valid HTML attribute. Browsers may not like it, and they certainly won't be able to read it, so it isn't as clean as the RDFa. How about this:
  • <p class="-ctag-wikipedia-The_Beatles">We're talking about The Beatles here</p>

This is valid HTML, browsers can operate on this content, and you can even style it! This is much better than the previous suggestion (which was quite good, and spurred me to write this article), but is it better than RDFa?

  • the tag for this span is "The_Beatles" at Wikipedia

  • class="-ctag-"

It is by far more concise than RDFa, but it has limitations - the tag content has to be valid inside a CSS class, which means alphanumeric, dash, and underline. There is additional flexibility if you use backslash, but this is uncommon in CSS classes and may not play nice everywhere. Most significantly, it doesn't include a link to wikipedia, only the name, and Semanic Web people really dislike that. I suspect that most people will link to Wikipedia, and if not, a search engine can figure out the most likely host. I mean, how "smart" are your tools when they can't deduce the meaning of wikipedia? If you're using an internal host or very specific database, you can always fall back to RDFa.

Worse is better. The CSS tag format I propose is not as specific as RDFa, but it is easier to implement, harder to mess up, works with non-XHTML, and easy for humans to verify. These generally overlooked and undervalued qualities make adoption easier for people, which is in the end all that really matters.


Post a Comment


Create a Link

<< Home