Relevance Feedback
Download
Report
Transcript Relevance Feedback
Relevance Feedback
• Limitations
– Must yield result within at most 3-4 iterations
– Users will likely terminate the process sooner
– User may get irritated at seeing same
documents repeated after every iteration
• It has proven to increase the effectiveness of
retrieval
Designing a Relevance Feedback
System
• Use positive or negative relevance judgments
• Where to apply relevance judgments (query,
profile, document, retrieval algorithm)
• Term weight modification. E.g.,
– Increase the weight for terms that appear in
relevant docs
– Add new terms found in relevant docs that are
frequently mention in connection with query term
Genetic Algorithms
• Several possible solutions are generated in
parallel
• The best few of these solutions is chosen and
replicated, while the poor ones eliminated
• Replicated solutions creates a breeding
population, from which new solutions arise
• The breeding is accomplished by by an
exchange of some of the characteristics of the
chosen solutions in a crossover operation
Genetic Algorithms (cont.)
• Hill climbing is avoided by
– Pursue multiple solutions in parallel, and
discard the low hills
– Introduce new characteristic values at low rate
through mutation process (random exchange)
• Relevance Feedback
– Relieves the user of the burden of assigning
term weights
• Begins with no weights. Generates query variants by
assigning term weights randomly
Genetic Algorithms (cont.)
• Query variants are vector of query term weights
• Each query variant is used to search the
documents in the database
• Evaluate each variant with equation on pg. 226
• The variants with highest value creating the
most replications
• The resulting breeding population is developed
to the same size as the original population
Natural Language Processing
• Focus on structure more than meaning,
consequently problems are
– Syntactic ambiguity e.g., they are visiting relatives
– Deep structure of a sentence e.g., grace
– May or may not be semantically correct e.g.,
Colorless green ideas sleep furiously
– Syntactic rules do not apply to e.g., boolean queries
Natural Language Processing (cont.)
• Semantic Analysis
– Even more elusie e.g., red herring, carrying coals
to Newcastle
• Techniques for Semantic Analysis
– Latent semantic indexing uses multidimensional
scaling methods to identify concepts
– Dialogue Analysis involves interaction that each
time clarifies further what is to be retrieved
Citation Processing
• Use of cited documents to enhance the
description of a primary document
• Some use co-citation as a measure of document
similarity I.e., number of papers that cite both
• Bibliographic coupling, when two documents
cite the same document
• Design problems: Locating citations,
interpretation, eliminate duplicate/useless,
Hypertext Links
• Means of connecting 2 distinct pieces of text
• Consists of an identifier and a pointer
• Possibly aid retrieval by suggesting hyperlinks
given in top ranked document retrieved
• Do not follow links from linked documents
• Information Filtering: Eliminate large
segments of database from consideration
• Passage Retrieval: Identifying relevant
sections within a large document encyclopedia
Image and Sound Processing
• Techniques for evaluating and manipulating
images directly
• Voice recognition
• Animation and sound: compare to those in
libraries
• Music can use style and then pattern
matching