The last day of SES New York kicked off with Andrew Tomkins, Chief Scientist at Yahoo! Research. In his keynote, Tomkins outlined the next generation of Web search. A few of the historic trends he covered include the Internet’s movement from curiosity to mainstream, fragmentation of content, value migrating into the ecosystem and content semantics are being unlocked. In an example, he outlined the current process for booking a trip to Tuscany…which involves using search engines to select flights, book hotels and research restaurants. The best part of the example as Tomkins consistent use of Google as the default search engine (points for realism). He then carries it through to purchasing an espresso machine (which requires product and price info, then retailer reviews). The most significant change in search behavior is the ubiquity of payment options and ability to validate credibility of manufacturers and sellers. Another observation he outlines is Web services are not integrated into search tasks. Moving into the future trends, Tomkins segmented published content produced each day:
- Published content: 3-4 GB – advertising or subscription-based content
- Professional Web content: 2 GB – corporate content
- User generated content: 8-10 GB – reviews, ratings, forums
- Private text content: 3 TB (terra bytes) – text messages, email, etc.
- Upper bound on typed content: 700 TB – if everyone on the planet typed on a PC all day
In terms of metadata trends (amount of content produced daily across the Web):
- Reviews: 10 MB
- Tags: 40 MB
- Anchortext: 100 MB
- Pageviews: 180 GB
Fragmentation is a significant trend today, in terms of both sites and areas of interest. Currently, no Web site owns more than 10 percent of page views (Yahoo! being the largest) and value is transitioning to the ecosystem as Web 2.0 continues to bloom. A related trend is that content access is fragmenting (i.e. Tomkins cell phone number is visible to MIT alumni) which will put additional stress on the existing infrastructure. With evolving technologies like AJAX, sites are now more of a “Choose Your Own Adventure” experience. Tomkins then transitioned to the search engine interface. Until 2005, not much changed with search results. Since then, the engines have had to address rich media, aggregation, task analysis and personalization. For example, if you search on “The game plan” into Yahoo! you’ll see a search assistance blog with suggested terms, followed by information about the movie (including reviews and theaters) before getting to the organic search results. In another example on MSN Live, search for the Apple iPod Nano. Lastly, he conducted a travel-related search where flight information appears based on a city and flight query. In the next section, Tomkins outlined the opening ecosystem, including the structured Web, but has been met with limited success. Engines are looking for a killer app: essentially something that supports semantic Web. Yahoo! believes search I is the killer app: abstracts that provides enough information to help you make a decision, whether it involves additional clicks or not. For example, a search for a restaurant includes Yelp content in one box: photos, address, phone number, ratings and reviews. In a news search, an abstract may include photos, statistics and a link to the article. The critical components of abstract-based presentations include microformats (hCard, hEvent, hReview, hAtom, XFN, etc.), RDFa and eRDF markup, OpenSearch extensions (pull) and Atom/RSS feeds that embed structured data (push). New vocabulary is required: dataRSS, Atom, Dublin Core, Creative Commons, FOAF, GeoRSS, MediaRSS, RDF, RDFS, etc. Yahoo! And other engines are looking to standardize the vocabulary and technology. Yahoo!’s OpenSearch platform does not modify ranking, but does affect presentation (thus click-throughs and possibly conversions). Better abstracts create a better user experience, so Yahoo! (and searchers) will reward well-designed abstracts. To sum up: as user needs become more complex, content grows and fragments and value migrates to the ecosystem, the opportunity is to expose semantics by enabling interoperability through OpenSearch. If you weren’t able to attend this session, it’s difficult to explain the impact the session had (on me at least). I see the insights Tomkins provides as the next generation of search and a huge opportunity for everyone that jumps on board.