Semantic Markup & Microformats

July 14, 2005

Filed under: CSS — Tags: , — Tim @ 8:26 pm

Beginning with Standards

During the past year I’ve become interested in using web standards and semantic markup. The idea is to separate content from presentation. For example, I have followed a variety of holy wars over the use of <i> (italics) and <em> (emphasis) elements. Italics are presentations, emphasis describes the content. The <i> element has been depreciated in favor of <em>, but that leaves a few things hanging. For instance, how should I code a book or movie title?

I follow the convention that book titles should be italicized, eg. The Stars My Destination. But if <i> has been depreciated, that leaves <em>, which isn’t the correct choice — I’m not emphasizing the title, I just want to note that it is a book title.

So far I’ve just been talking about a visual distinction. But, as I mentioned at the beginning, I should be separating content from presentation — as well as adding meaning to that content. So, how is a machine to distinguish a book title from the rest of the content? Presumably the <cite> element, which is vaguely defined as "[c]ontains a citation or a reference to other sources." While I’m not convinced that a passing reference to a book constitutes a citation, it is better than using <em> or hacking up a meaningless <span>.

The next problem is that, by default, <cite> renders it’s content in italics. I prefer following MLA style, which calls for books, movies, plays, etc. to be italicize (or underlined), while articles, short stories, poems, etc. are quoted: The Stars My Destination (a novel), Fences (a play), Close Encounters of the Third Kind (a movie), "The Tyger" (a poem), "Bastille Day" (a song), etc. Clearly a basic <cite> element won’t work for all of these variations.

Additionally, at work I have to use AP style for marking up works. AP has a number of quirks, beginning with no italics. Additionally, books, movies, plays, poems, etc. are capitalized and quoted; reference works are not quoted, nor receive any other distinguishing marks.

So, with different types of works and different style guidelines, the <cite> element as-is simply won’t suffice. We’ll need to add some classes in order to distinguish types of work. This will give us hooks for styling — <cite class="book"> can be italicized or underlined, <cite class="poem"> would be render normal and even have quotes automatically added around it (at least in CSS 2 compliant browsers). Of course, if you need to follow AP or other style, changing just a couple of properties in the style sheet will take care of that. In the end, we’ve managed to visually style elements to our desire and added a bit of meaning.

Enter Microformats

With the advent of microformats, I now see a way to add more meaning to the classes I would have used simply for styling purposes. We can add meaningful values the <cite> (or other) element that indexers and others can make use of, all without adding, changing or otherwise hacking existing (X)HTML.

Dougal Campbell brought up the idea of a microformat for music tracks and other media on the microformats mailing list. I gather that there are others interested in this as well.

His idea goes far beyond what I was thinking (he mentions track name, running time, etc., ala ID3 tags) and that’s a good thing. Mention has been made of coming up with a format that would capture ISBN, author, editor, pages, etc. for books, magazines, et al. I’d like to see this microformat get created, as long as it is done in a modular format.

In other words, Amazon.com could use the full format for marking up books, but in a blog entry, I can use just a minimum of mark up to distinguish that I’m referring to a book; i.e. <cite class="scific novel">The Stars My Destination</cite>.

I’d also like to keep it element independent. In other words, I may implement this using the <cite> element, but if you feel strongly about using <em> or definition lists, you can do that. In fact, a definition list is probably a good way for Amazon.com to mark up titles (a block of title-related information) versus a passing reference (an inline mention).

I think that this is an exciting proposition and am interested to hear from others. What are your thoughts?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

flickr » Wall Map