Transcript Slide 1

Suhas Suhas (ss3474)
COMS E6125 Web Enhanced Information Management
Professor Gail Kaiser
Department of Computer Science
Columbia University
Spring 2009
04/07/2009
- Jack of all trades, master of none!
Survey affirms the need for deeper, more specific, more relevant search results
Traditional search engines often do not know what the user wants based on the
search query
What's the point of giving half a millions less relevant search results knowing
the user will not look after 30 top results
Irrelevant Ad’s !!!
Welcome to the relatively new tier in the Internet search, an industry consisting of
search engines that focus on specific slices of content
If a user decides to use a search engine which says it's a "health information search
engine for doctors", instead of one that is labeled a "health information search
engine for patients", the user has already helped to reduce the ambiguity of his/her
search queries even before he/she type in his/her query
Users searching VSEs are typically closer to purchase or looking for particular
information. In other words, if users have gotten as far as a vertical search, they’ve
essentially classified themselves as interested consumers
Also called an a focused crawler or topical crawler which is nothing but a web crawler
that attempts to download only web pages that are relevant to a pre-defined topic or
set of topics (Area of the Vertical search)
Topical crawling generally assumes that only the topic is given, while focused
crawling also assumes that some labeled examples of relevant and not relevant pages
are available
A focused crawler ideally would like to download only web pages that are relevant to a
particular topic and avoid downloading all others. Therefore a focused crawler may
predict the probability that a link to a particular page is relevant before actually
downloading the page
A possible predictor is the anchor text of links
Can also use the complete content of the pages already visited to infer the similarity
between the driving query and the pages that have not been visited yet.
In another approach, the relevance of a page is determined after downloading its
content. Relevant pages are sent to content indexing and their contained URLs are
added to the crawl frontier; pages that fall below a relevance threshold are discarded
Current vertical search engines spiders address different ways of combining contentand link-based Web analyses and integrating them with graph search algorithms
QUERY
TRADITIONAL VERTICAL SEARCH ENGINE
TIME
SEARCH ENGINE
TAKEN BY
TIME
TAKEN BY
TRADITION
AL SEARCH
VERTICAL
ENGINE
ENGINE
PRECISION@ 10 PRECISION
(TRADITIONAL)
@ 10
(VERTICAL)
SEARCH
(SECONDS) (SECONDS)
Springs
www.google.com
www.thomasnet.com
0.24
0.14
0.3
0.9
Dryer
www.google.com
www.chemindustry.com
0.26
0.20
0.1
1.0
www.findlaw.com
0.23
0.10
0.3
0.8
Legal Software www.google.com
• Professional users can save time
• Easier to search
• Superior quality of results and greater sorting
• Spam Free
• Provide access to more of the surface web
• Provide access to the deep web
• Advertisers can generate highly relevant leads through the combination of
focused demographics that specialist publishers provide and search keyword
targeting
Google Loves Vertical too 
Getting people to know about it
Getting them to try it. And after they try it,
Getting them to comeback and use it again
The biggest challenge however is just getting people to know that there is Vertical
Search offering, and getting them to look at it!!!