relevance model
Download
Report
Transcript relevance model
Web Image Retrieval Re-Ranking with
Relevance Model
Wei-Hao Lin, Rong Jin, Alexander Hauptmann
Language Technologies Institute
School of Computer Science
Carnegie Mellon University
International Conference on Web Intelligence(WIC’03)
Presented by Chu Huei-Ming
2005/02/24
Reference
• Relevance Models in Information Retrieval
– Victor Lavrenko and W.Bruce Croft
– Center for Intelligent Information Retrieval
Department of Computer Science University of Massachusetts (UMass)
– 11-56 2003 Kluwer Academic Publishers. Printed in the Netherlands
2
Outline
•
•
•
•
Introduction
Relevance Model
Web image retrieval re-ranking
Estimating a Relevance Model
– Estimation form a set of examples
– Estimation without examples
• Ranking Criterion
• Experiment
• Conclusion
3
Introduction (1/2)
• Most current large-scale web image search engines exploit text and
link structure to “understand” the content of the web images.
• This paper propose a re-ranking method to improve web image
retrieval by reordering the images retrieved from an image search
engine.
• The re-ranking process is based on a relevance model.
4
Introduction (2/2)
•
The web image retrieval with relevance model re-ranking
5
Relevance Model (1/2)
• Mathematical Formalism
–
is a vocabulary in some language
– C is a large collection of documents
– Define the relevant class R to be the subset of document in C which are
relevant to some particular information needs
– Define the relevance model to be the probability distribution
Prw | R
– For every word w , the relevance model gives the probability that
observe w in the randomly selected some document D form the relevant
class R and then picked a random word from D
6
Relevance Model (2/2)
• The important issue in IR is capturing the topic discussed in a sample
of text, and to that end unigram models fare quite well.
• The choice of estimation techniques has a particularly strong influence
on the quality of relevance models
7
Web image retrieval re-ranking (1/2)
• For each image I in the rank list returned from a web image search
engine, there is one associated HTML document D
• Can we estimate the probability that the image is relevant given text of
document D, i.e. Pr(R|D) ?
• By Bayes’ Theorem
Pr R | D
Pr D | R Pr R
Pr D
(1)
– Pr(D) is equal for all documents and assume every document is equally
possible
– Pr(D|R) is needed to estimate if we want to know he relevance of the
document
8
Web image retrieval re-ranking (2/2)
• Suppose the document D is consisted of words w1 , w2 ,..., wn
• Apply common word independence assumption
n
Pr D | R Pwi | R
( 2)
i 1
• Pr(w|R) can be estimated without training data
9
Estimating a Relevance Model
• Estimation form a set of examples
– There has full information about the set R of relevant documents
• Estimation without examples
– There has no examples from which we could estimate Pw | R directly
10
Estimation form a set of examples (1/2)
• There has perfect knowledge of the entire relevant class R
• The probability distribution P(w|R) is a randomly picked word from a
random document D R will be the word w
• Let pD | R denote the probability of randomly picking document D
from the relevant set R. Assume each relevant document is equally
likely to be picked at random
1 / R if D R
p D | R
otherwise
0
• |R| is the total number of document in R
(3)
11
Estimation form a set of examples (2/2)
• The probability of randomly picking a document D and then observing
the word w is
( 4)
Pw, D | R Pml w | D pD | R
Pml w | D # w, D / D
(5)
• Assumed that the document model of D completely determines word
probabilities , when fix D , the probability of observing w is
independent of the relevant class R and only depends on D
Pw | R Pml w | D pD | R
DC
( 6)
• The smoothing is achieved by interpolating the maximum-likelihood
probability from (5) with some background distribution P(w) over the
vocabulary.
(7 )
Psmoothw | D D Pw | D 1 D Pw
12
Estimation without examples (1/6)
• In the ad-hoc information retrieval, there has only a short 2-3 word
query, indicative of the user’s information need and no examples of
relevant documents
13
Estimation without examples (2/6)
• Assume that for every information need there exists an underlying
relevance model R
• Assigns the probabilities Pw | R to the word occurrence in the relevant
documents
(8)
Pw | R Pw | Q
• Given a large collection of documents and a user query Q q1 , q2 ,..., qk
Pw, q1 , q2 ,..., qk
Pq1 , q2 ,..., qk
(9 )
Pw | R Pw | Q Pw, q1 , q2 ,..., qk
(10)
Pw | Q
14
Estimation without examples (3/6)
• Method 1: i.i.d(random) sampling
• Assume that the query words q1 , q2 ,..., qk and the word w in relevant
documents are sampled identically
• Pick a distribution D C with probability p(D) and sample from it
k+1 times. Then the total probability of observing w together with
q1 , q2 ,..., qk is
Pw, q1 , q2 ,..., qk pD Pw, q1 , q2 ,..., qk | D
(11)
DC
• Assumed w and all qi are sampled independently and identically to
each other
k
(12)
Pw, q1 , q2 ,..., qk | D Pw | D Pqi | D
i 1
• Final estimation
k
Pw, q1 , q2 ,..., qk pD Pw | D Pqi | D
DC
i 1
(13)
15
Estimation without examples (4/6)
• Method 2: conditional sampling
• Using chain rule and make the assumption that query words are
independent given word w
k
P( w, q1 , q2 ,...qk ) P( w) P(qi | w, qi 1 , qi 2 ,...q1 )
i 1
(14)
k
P( w) P(qi | w)
i 1
(15)
• To estimate the conditional probabilities P(qi | w) we compute the
expectation over the universe C of our unigram models
P(qi | w) Pqi | Di PDi | w
(16)
DC
16
Estimation without examples (5/6)
• Additional assumption that qi is independent of w once we picked a
disbution Di
• Then the final estimation for the joint probability of w and query is
k
P( w, q1 , q2 ,...qk ) P( w) Pr( qi | w)
(17 )
i 1
P( w) Pqi | Di PDi | w
i 1 Di C
k
(18)
17
Estimation without examples (6/6)
• The word prior probability is
P( w) Pw | D PD
(19)
DC
• The probability of picking a distribution Di based on w is
PDi | w
Pw | Di Pw
PDi
( 20)
• P(Di) is kept uniform over all the documents in C
18
Comparison of Estimation Method1 and 2
•
Probability Ranking Principle
•
•
Document ranked the decreasing probability ratio
Cross-Entropy
H R || D Pw | R log Pw | D
n
Pd1 ,..., d n | R
Pdi | R
Pd1 ,..., d n | N i 1 Pdi | N
w
19
Ranking Criterion (1/2)
• Ranking by (2) will favor short document
• Use Kullback-Leibler(KL) divergence to avoid the short document
bias
Pr v | Di
DPr w | Di || Pr w | R Pr v | Di log
Pr v | R
v
•
•
Prw | Di is the unigram model from the document associated with
rank i image in the list
Prw | R is the aforementioned relevance model
20
Ranking Criterion (2/2)
21
Experiment (1/4)
• Test the idea of re-ranking on six text queries to a large-scale web
image search engine, Google Image Search. From July 2001 to March
2003 there are 425 million images indexed by it
• Six queries are chosen from image categories in Core Image Database
• Each text query is typed into Google Image Search and top 200 entries
are saved for evaluation
• The 1200 images for six queries are fetched, they are manually labeled
into three categories: relevant, ambiguous, irrelevant
22
Experiment (2/4)
23
Experiment (3/4)
• For each query, send the same keywords to Google Web Search and
obtain a list of relevant documents via Google Web APIs
• Top-ranked 200 web documents are removed all the HTML tag and
filter out the words appearing in the INQUERY stop-word list and
stem words using Porter algorithm
• The smoothing parameter is 0.6
24
Experiments (4/4)
•
The average precision at DCP over six queries
25
Conclusion
• The average precision at the top 50 documents with the precision
improvement from the original 30-35% to 45 %
• The Internet users are usually with limit time and patience, high
precision at top-ranked documents will save user a lot of efforts and
help them find relevant images more easily and quickly
26