B. Tan, Predicting Users` Site Preferences in Web Search

Download Report

Transcript B. Tan, Predicting Users` Site Preferences in Web Search

Predicting User’s Site Preference in Web Search
Bin Tan
Motivation
We
trust some websites more than others.
In web search, we’d like to see results from
these preferred websites.
The project is to learn a user’s site
preferences from past web search interactions
and to rerank the search results to reflect these
preferences.
Examples
I
go to wunderground.com for weather information, but in
Google weather.com is top-ranked.
For me DAIS means dais.cs.uiuc.edu, and I don’t care
about sites with other meanings of DAIS.
When looking for papers, I would prefer results from
portal.acm.org
The k-NN Approach
Supervised learning problem: Given a query q
and a search result r, predict if r’s site s is
preferred by the user.
k-Nearest neighbor approach: Find k past
queries most similar to q and the preferred
sites for these queries. Then determine if s is
preferred using, e.g. weighted majority votes.
For each web search, we need a log record
containing

Time
of search
Keyword terms and their weights
Preferred sites and their confidence
More Details
Each
query is characterized by a set of terms
(query keywords + frequent terms in the search
results).
I
only consider terms appearing in the clicked results.
I use TF-IDF weighting.
Distance
between two queries is computed as
the dot product of the term vectors
It’s hard to determine whether a site is
preferred by a user!
clicked
result <> relevant result <> preferred site
I associate each clicked result with a value in [0, 1]
representing confidence of site preference.
Confidence value is determined heuristically: time spent in
a webpage, # results clicked …
Preferences are mostly topic-dependent!
Implementation
Efficiency Issues
Limit
the number of search records, terms
and sites: Use FIFO or LRU to evict obsolete
items
Build an inverted index from terms to search
records
Also an inverted index from sites to search
records
Given a query q with characterizing terms T
and its top 50 results’ sites S, use the indices
to find all past searches whose terms intersect
with T and preferred sites intersect with S
Implemented
on top of UCAIR Toolbar, a
Google Toolbar-like browser plug-in that
enables search result personalization.
Only pull up one result to the top whose site is
most likely to be preferred by the user.
Probabilistic Approach
Maximize
O(L=1|w,q)
P ( L  1 | q, w)
P (q | w, L  1) P ( w | L  1)

P ( L  0 | q, w) P (q | w, L  0) P ( w | L  0)
(Still exploring)
After the user searched for “dais” and clicked
on the “dais.cs.uiuc.edu” result, we respond
to queries “dais” and “database uiuc” by
putting “dais.cs.uiuc.edu” at the top.
Original top results