Transcript powerpoint
CS276A
Text Information Retrieval, Mining, and
Exploitation
Lecture 8
31 Oct 2002
Recap: IR based on Language
Model
Information
need
P(Q | M d )
generation
query
M d1
d1
M d2
d2
One night in a hotel, I saw this late night
talk show where Sergey Brin popped on
suggesting the web search tip that you
should think of some words that would
likely appear on pages that would answer
your question and use those as your
search terms – let’s exploit that idea!
M dn
…
…
dn
document collection
Recap: Query generation
probability (unigrams)
Ranking formula
p(Q, d ) p(d ) p(Q | d )
p(d ) p(Q | M d )
The probability of producing the query given the
language model of document d using MLE is:
pˆ (Q | M d ) pˆ ml (t | M d )
tQ
tQ
tf(t ,d )
dld
Unigram assumption:
Given a particular language model,
the query terms occur independently
M d : language model of document d
tf( t ,d ) : raw tf of term t in document d
dld : total number of tokens in document d
Recap: Query generation
probability (mixture model)
P(w|d) = Pmle(w|Md) + (1 – )Pmle(w|Mc)
Mixes the probability from the document
with the general collection frequency of the
word.
Correctly setting is very important
A high value of lambda makes the search
“conjunctive-like” – suitable for short queries
A low value is more suitable for long queries
Can tune to optimize performance
Perhaps make it dependent on document size
(cf. Dirichlet prior or Witten-Bell smoothing)
Today’s topics
Relevance Feedback
Query Expansion
Relevance Feedback
Relevance feedback = user feedback on
relevance of initial set of results
The user marks returned documents as
relevant or non-relevant.
The system computes a better representation
of the information need based on feedback.
Relevance feedback can go through one or
more iterations.
Relevance Feedback: Example
Image search engine (couldn’t find relevance
feedback engine for text!)
url:
http://nayana.ece.ucsb.edu/imsearch/imsearch.ht
ml
Initial Query
Results for Initial Query
Relevance Feedback
Results after Relevance
Feedback
Rocchio Algorithm
The Rocchio algorithm incorporates relevance
feedback information into the vector space
model.
The optimal query vector for separating
relevant and non-relevant documents:
Unrealistic: we don’t know all relevant
Rocchio Algorithm
Used in practice:
Typical weights: alpha = 8, beta = 64, gamma = 64
Tradeoff alpha vs beta/gamma: If we have a lot of
judged documents, we want a higher beta/gamma.
But we usually don’t …
Relevance Feedback in
Probabilistic Information Retrieval
How?
Relevance Feedback in
Probabilistic Information Retrieval
We can modify the query based on relevance
feedback and apply standard model.
Examples:
Binary independence model
Language model
Binary Independence Model
n
O( R | q, d ) O( R | q)
i 1
• Since xi is either 0 or 1:
O ( R | q, d ) O ( R | q )
xi 1
p( xi | R, q)
p( xi | NR, q)
p( xi 1 | R, q)
p( xi 0 | R, q)
p( xi 1 | NR, q) xi 0 p( xi 0 | NR, q)
Binary Independence Model
O ( R | q, d ) O ( R | q )
xi 1
p( xi 1 | R, q)
p( xi 0 | R, q)
p( xi 1 | NR, q) xi 0 p( xi 0 | NR, q)
Used as before.
We assume this is =1
in simple retrieval.
In relevance
feedback, we
have data to estimate
probability of
non-occurrence.
Binary Independence Model
p( xi 1 | R, q)
p( xi 0 | NR, q)
O ( R | q, d ) O ( R | q )
qi p ( xi 0 | R, q ) qi p ( xi 1 | NR, q )
Note that we have 3 relevance “states” now:
relevant, non-relevant and unjudged.
Positive vs Negative Feedback
Positive feedback is more valuable than
negative feedback.
Many systems only allow positive feedback.
Relevance Feedback: Assumptions
A1: User has sufficient knowledge for initial
query.
A2: Relevance prototypes are “well-behaved”.
Either: All relevant documents are similar to a
single prototype.
Or: There are different prototypes, but they
have significant vocabulary overlap.
Violation of A1
User does not have sufficient initial
knowledge.
Examples:
Misspellings (Brittany Speers)
Cross-language information retrieval (hígado)
Mismatch of searcher’s vocabulary vs collection
vocabulary (e.g., regional differences, different
fields of scientific study: genetics vs medicine,
bernoulli naïve bayes vs binary independence
model)
Violation of A2
There are several relevance prototypes.
Examples:
Burma/Myanmar
Contradictory government policies
Pop stars that worked at Burger King
Often: instances of a general concept
Good editorial content can address problem
Report on contradictory government policies
Relevance Feedback on the Web
Some search engines offer a similar/related pages
feature (simplest form of relevance feedback)
But some don’t because it’s hard to explain to average
user:
Google (link-based)
Altavista
Stanford web
Alltheweb
msn
Yahoo
Excite initially had true relevance feedback, but
abandoned it due to lack of use.
Relevance Feedback: Cost
Long queries are inefficient for typical IR
engine.
Long response times for user.
High cost for retrieval system.
Other Uses of Relevance Feedback
Following a changing information need
Maintaining an information filter (e.g., for a
news feed)
Active learning
Topics for next quarter
Active Learning
Goal: create a training set for a text classifier
(or some other classification problem)
One approach: uncertainty sampling
At any point in learning, present document
with highest uncertainty to user and get
relevant/non-relevant judgment. (closest to p(
R) = 0.5 )
Scarce resource is user’s time: maximize
benefit of each decision.
Active learning significantly reduces the
number of labeled documents needed to learn
Active Learning
somehow Build
Initial classifier
retrain classifier
classifier selects
most uncertain
document
documents
user labels
document
most uncertain
document
Active Learning
Could we use active learning for relevance
feedback?
Relevance Feedback
Summary
Relevance feedback has been shown to be
effective at improving relevance of results.
Full relevance feedback is painful for the user.
Full relevance feedback is not very efficient in
most IR systems.
Other types of interactive retrieval may
improve relevance by as much with less work.
Forward Pointer: DirectHit
DirectHit uses indirect relevance feedback.
DirectHit ranks documents higher that users
look at more often.
Not user or query specific.
Pseudo Feedback
Pseudo feedback attempts to automate the
manual part of relevance feedback.
Retrieve an initial set of relevant documents.
Assume that top-ranked documents are
relevant.
Pseudo Feedback
initial
query
apply relevance
feedback
retrieve
documents
documents
label top k
docs relevant
highest ranked
documents
Pseudo Feedback
Not surprisingly, the success of pseudo feedback
depends on whether relevance assumption is true.
If it is true, pseudo feedback can improve precision
and recall dramatically.
If it is not true, then the results will become less
relevant in each iteration. (concept drift)
Example: ambiguous words (“jaguar”)
Bimodal distribution: depending on query,
performance is dramatically better or dramatically
worse.
Unfortunately, hard to predict which will be the case.
Pseudo-Feedback: Performance
Query Expansion
In relevance feedback, users give additional
input (relevant/non-relevant) on documents.
In query expansion, users give additional input
(good/bad search term) on words or phrases.
Query Expansion: Example
Also: see altavista, teoma
Types of Query Expansion
Refinements based on query log mining
Common on the web
Global Analysis: Thesaurus-based
Controlled vocabulary
Automatically derived thesaurus
Maintained by editors (e.g., medline)
(co-occurrence statistics)
Local Analysis:
Analysis of documents in result set
Controlled Vocabulary
Automatic Thesaurus
Generation
High cost of manually producing a thesaurus
Attempt to generate a thesaurus automatically
by analyzing a collection of documents
Two main approaches
Co-occurrence based (co-occurring words are
more likely to be similar)
Shallow analysis of grammatical relations
Entities that are grown, cooked, eaten, and digested
are more likely to be food items.
Co-occurrence based is more robust,
grammatical relations are more
accurate.
Why?
Co-occurrence Thesaurus
Simplest way to compute one is based on term-term similarities
in C = AAT where A is term-document matrix.
wi,j = (normalized) weighted count (ti , dj)
dj
ti
m
n
Automatic Thesaurus Generation
Example
Automatic Thesaurus Generation
Discussion
Quality of associations is usually a problem.
Very similar to LSI (why?)
Problems:
“False positives”
Words deemed similar that are not
“False negatives”
Words deemed dissimilar that are similar
Query Expansion: Summary
Query expansion is very effective in increasing
recall.
In most cases, precision is decreased, often
significantly.
Sense-Based Retrieval
In query expansion, new words are added to
the query (disjunctively). Increase of matches.
In sense-based retrieval, term matches are
only counted if the same sense is used in
query and document. Decrease of matches.
Example: In sense-based retrieval, “jaguar” is
only a match if it’s used in the “animal” sense
in both query and document.
Sense-Based Retrieval: Results
Expansion vs. Sense-Based
Retrieval
Same type of information is used in pseudo
relevance feedback and sense-based retrieval.
But: disambiguation is expensive
Indexing with senses is complicated
Automatic sense-based retrieval only makes
Why?
sense for long queries
If senses are supplied in interactive loop, then
it’s easier to add words rather than senses
Exercise
What is the factor of increase of the index
size?
How could you integrate confidence numbers
for senses?
Interactive Retrieval
Query expansion and relevance feedback are
examples of interactive retrieval.
Others:
Query “editing” (adding and removing terms)
Visualization-based, e.g., interaction with visual
map
Parametric search
Resources
MG Ch. 4.7
MIR Ch. 5.2 – 5.4
Singhal, Mitra, Buckley: Learning routing queries in a query
zone, ACM SIGIR, 1997.
Yonggang Qiu , Hans-Peter Frei, Concept based query
expansion, Sigir, 1993.
Schuetze: Automatic Word Sense Discrimination,
Computational Linguistics, 1998.
Buckley, Singhal, Mitra, Salton, New retrieval approaches
using smart: trec4, nist, 1996.
Gerard Salton and Chris Buckley. Improving retrieval
performance by relevance feedback. Journal of the
American Society ]or In:formation Science, 41(4):288-297,
1990.