Hyper-Searching the Web

Download Report

Transcript Hyper-Searching the Web

Hyper-Searching the
Web
Search Engines
Basic Search
(index)
Cluster Search
(themes)
Meta-search
(outsource)
“Smarter” meta-search
(themes + outsource)
Basic search engine
• Examples: AltaVista, InfoSeek, HotBot,
Lycos, Excite, Google, etc
• Maintains an index for every word found
• Processes through crawling, indexing, and
returning results
Basic search engine
• Different ranking systems used
-most use heuristics (easiest solution)
counts # of keywords that appear
-Google uses PageRank
Basic search engine
• No idea of searcher’s intent so “best”
result hard to achieve
• Problems with synonymy and polysemy
ex. car and automobile
ex. jaguar
• One solution: store semantic relations
-only can help w/synonmy
• Can’t identify concepts/author intent
ex. IBM site does not say “computer”
Cluster search engine
• Example: Clusty
• Clusters results into categories/themes
• Can show results that would be ranked
lower in another search engine
-due to different meanings in words,
can show the less searched-for
Meta-search engine
• Examples: Dogpile, Surfwax, Copernic, etc
• Sends searcher’s query to a database of
search engines
• Claimed to not be any better than
database; often the referenced search
engines are small, free, commercial
• Users can create their own on Google of
up to 5,000 URLs as “database”
“Smarter” meta-search engine
• Example: Clever project (n/a online yet)
• Includes clustering and linguistic analysis
Google
Cat – feline
Cat – power
“cat”
Cat – equipment
“cat”
AltaVista
Clever
Cat – scans
“cat”
etc.
Yahoo
The Clever Project
• Uses hyperlinks to locate hubs and
authorities
“a respected authority is a page that is
referred to by many good hubs; a useful
hub is a location that points to many
valuable authorities”
The Clever Project
• Obtains a list of webpages from a
standard index & follows hyperlinks
to increase own database
-resulting collection = “root set”
-each page gets numerical hub &
authority score
The Clever Project
• Similar to PageRank in determining
method – guesses & constant
calculations
-useful by-product: clusters sites
• Adds to competition because competitors
don’t have to acknowledge their
competition through hyperlinks
Clever vs. Google
GOOGLE
- gives initial rankings
- keeps pages indpt.
of queries
- faster
- looks forward
“link to link”
CLEVER
- root sets per keyword
- page priority through
query context
- forwards & backwards
“hub and authority”
- sometimes too broad
ex. Fallingwater