Making the Web Searchable: The Story of SearchMonkey

Certainly worthwhile reading this article.

Last week at the SemTech 2008 Conference that took place in San Jose, Yahoo! Researcher Peter Mika spoke in detail about the company’s new SearchMonkey search platform initiative. Mika talked broadly about his work looking at metadata on the web, and how that led to the birth of SearchMonkey. This post is based on notes from that talk.

History of Web Page Annotations;

The motivating question for Mika’s presentation was: How can we make web search better by leveraging web annotation? There are many kinds of annotations, but Mika focused on simple data and lightweight semantics, and began by reviewing the history and evolution of annotations to explain how we got to where we are today.

One of the first methods of annotating HTML was Simple HTML Ontology Extensions (SHOE). This method allowed for the declaration of ontologies as well as relationships between the entities on HTML pages. The problem with it was that it introduced new tags that were not part of standard HTML and were not recognized by most browsers.

In 2003 Tantek Celik started work on Microformats – a way to embed light semantics using XHTML. Microformats are now driven by a community of developers, which evangelizes existing formats and is working on new ones. The major focus of this effort is to leverage standards, but Microformats are limited because they don’t share common syntax. Every microformat looks different and there are no ontologies, and no schemas.

Things get particularly complicated when you start combining different Microformats, for example, when you describe that a person wrote a review at a particular event. In addition to this, Microformats have no concept of unique identity, and for this reason are largerly incompatible with other Semantic Web efforts. Yet, Microformats took off and have become somewhat widespread. So, the take away here is that simple things can quickly gain adoption.

Another way of providing metadata that emerged recently is tagging. As an example, Flickr uses tags for photos to enable its users to annotate and describe the content. The problem with tags is that there is no agreement on meaning, so the same tag on Flickr and del.icio.us can mean different things, and there’s no way to be sure which tag means what. Tags are a much more personal way of annotating information; they are not objective.

Published by Reinout te Brake

Reinout is a games investor and strategic business consultant specializing in the games industry. Reinout established his credentials through his own successful investments, start-ups, consulting and (advisory) board positions that led through time to strong bonds with key stakeholders in this fast paced industry. He is known for his outstanding results in the gaming industry. He has worked with many game studios around the globe and is therefore well known in the international gaming industry. Check out his games podcast; https://www.game-consultant.com

%d bloggers like this: