Web Search Uis

Download Report

Transcript Web Search Uis

Challenges in Web Search
Amit Singhal
Web Search
• Crawl, Index, Search
– Crawl and Index
• freshness
• coverage (page selection, deep web)
– Search
• adversarial IR, trust
• evaluation
• partitioning the query space
Crawl and Index
• Freshness
– pages are deleted, created, changed
– How to keep the index fresh?
• Coverage
– which 2.5B pages to index?
– lot of useful information in databases
– How to index “hidden” content?
Search
• Adversarial IR
– all useful signals are spammed
Search
• Trust
– how much can we trust a site
• an article hosted at BBC is much more
trustworthy than the same article hosted at
yet-another-news-company.com
– How trustworthy is a site, and how to
use this information in ranking?
Search
• Evaluation
– the collection changes continuously
• rel. pages become non-rel., and vice-versa
– can’t easily freeze a copy
• relevance is a function of rendering
– need all images, all redirects, CSS, …
• linkage characteristics change over time
– query space is huge (over 150M/day)
• most popular query: 0.037%, 10th most popular: 0.011%
• need a very large query set, expensive
– How to evaluate given changing collection and a
very big query space?
Search
• Ranking in a huge query space
– specific methods work well for specific
query types
• e.g strong proximity helps for people names
– identify query type and use type-specific
ranking algorithms
– How to partition the query space into
meaningful and useful partitions?
Web Search
– How to keep the index fresh?
– How to index “hidden” content?
– How trustworthy is a site, and how to use this
information in ranking?
– How to evaluate given changing collection and a
very big query space?
– How to partition the query space into
meaningful and useful partitions?
 It is a capital mistake to theorize before one
has data. Insensibly one begins to twist facts
to suit theories, instead of theories to suit
facts.
Sir Arthur Conan Doyle (1859 - 1930)