Topic: Relevance Feedback Example Questions

Download Report

Transcript Topic: Relevance Feedback Example Questions

Lecture 13: Midterm Review
SIMS 202:
Information Organization
and Retrieval
Prof. Ray Larson & Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 10:30 am - 12:00 pm
Fall 2004
http://www.sims.berkeley.edu/academics/courses/is202/f04/
IS 202 – FALL 2004
2004.10.12 - SLIDE 1
Lecture Overview
• Midterm Review
– The administrative details
– The “Rules” for the exam
– We will go through the sample questions and
discuss them
– Open question/answer period
IS 202 – FALL 2004
2004.10.12 - SLIDE 2
Lecture Overview
• Midterm Review
– The administrative details
– The “Rules” for the exam
– We will go through the sample questions and
discuss them
– Open question/answer period
IS 202 – FALL 2004
2004.10.12 - SLIDE 3
Midterm Exam Details
• Date: 10/14/2004 Time: 10:30-12:00
• The exam is open-book, open note AND open
computer
• There will be 8-10 questions on the exam
• You may use your own laptop, or one of the
computers in the lab. The results of your work
are to be printed
• The exam can be hand-written if you wish, if so
be sure to bring:
– Pens/Pencils
– Calculator
– (Paper will be provided on the exam itself, but you
may want to bring scratch paper)
IS 202 – FALL 2004
2004.10.12 - SLIDE 4
Midterm Exam Details
• The exam will cover the first half of the course,
that is primarily it will be on the topics covered
concerning Information Retrieval
• Questions will be worth a specific number of
points and these will be stated on the exam itself
• Partial credit will be awarded for partial answers
• In your answers, please balance conciseness
with illustration of all of the requested
information
– In other words, don't write a lot of things that aren't
asked for, but try to address all of what is asked for
IS 202 – FALL 2004
2004.10.12 - SLIDE 5
Lecture Overview
• Midterm Review
– The administrative details
– The “Rules” for the exam
– We will go through the sample questions and
discuss them
– Open question/answer period
IS 202 – FALL 2004
2004.10.12 - SLIDE 6
Rules
• Do your own work
• No discussion during the exam
– Yes, IM counts as discussion!
– Yes, email counts as discussion!
• You are on your honor to not look at other
student’s work (you may want to review the
University policies on academic dishonesty)
• PROVIDE PROPER ATTRIBUTION for ideas
taken from other sources (online or printed)
IS 202 – FALL 2004
2004.10.12 - SLIDE 7
Rules
• Questions CAN and SHOULD be asked of
me or the TA’s
• Issues/Corrections/Answers for details will
be put up on the screens in 202
• We will also put these up on a web page
for those in the Lab
IS 202 – FALL 2004
2004.10.12 - SLIDE 8
Lecture Overview
• Midterm Review
– The administrative details
– The “Rules” for the exam
– We will go through the sample questions and
discuss them
– Open question/answer period
IS 202 – FALL 2004
2004.10.12 - SLIDE 9
Study Guide
• To study for the exam:
• Be sure you understand the material that was
covered in lectures and have read and absorbed
the corresponding material in the readings
• Be sure you can do activities similar to what was
done in the homework assignments
• We will have questions that require you to
generalize from what you've learned and
synthesize ideas
– So be sure you have thought about the ideas covered
in lecture, readings, and homework assignments
IS 202 – FALL 2004
2004.10.12 - SLIDE 10
Study Guide
• Alison suggests that you might want to
bookmark online or printed resources so
that you can quickly find the topics that
you need
IS 202 – FALL 2004
2004.10.12 - SLIDE 11
Example Questions
• These are available on the Class Web site
• Note that these examples are NOT the
exact questions that will be on the exam
but are similar to questions that have been
used in the past
• There will be questions that ask you to do
something with supplied data
– For example, given some data, design an ER
diagram describing the data elements and
their relationships
IS 202 – FALL 2004
2004.10.12 - SLIDE 12
Example Questions
• The example questions on the web site are
organized (approximately) in the order that
the topics were presented during the course:
–
–
–
–
–
–
–
–
Information
The Search process
Documents and Statistics of Text
Queries, Ranking, and the Vector Space Model
IR Systems and Implementation
Relevance Feedback
Evaluation of IR Systems
Database Design
IS 202 – FALL 2004
2004.10.12 - SLIDE 13
(Approximate) Course Schedule
• Retrieval
– Overview
– Introduction to the Search
Process
– Boolean Queries and Text
Processing
– Web Search Issues and
Architecture
– Statistical Properties of
Text and Vector
Representation
– Probabilistic Ranking &
Relevance Feedback
– Evaluation
– Interfaces for Information
Retrieval
– Database Design
IS 202 – FALL 2004
• Organization
–
–
–
–
–
–
–
–
–
–
–
–
Phone Project Introduction
Categorization
Knowledge Representation
Lexical Relations and
WordNet
Metadata Introduction
Controlled Vocabularies
Introduction
Facetted Classification
Thesaurus Design and
Construction
Semantic Web
Multimedia Information
Organization and Retrieval
Metadata for Media
Phone Project Presentations
2004.10.12 - SLIDE 14
Review of Course Content
• We can draw on:
– 14 sets of Slides (including this one and the
Math Review slides)
– Handout papers
– The Reader
– Textbooks
– Assignments
– Discussion questions and issues
IS 202 – FALL 2004
2004.10.12 - SLIDE 15
Example Questions
• Topic: Information
• Example Questions:
– What is the information life cycle?
– What are different ways of measuring
information? What are different ways of
defining information?
IS 202 – FALL 2004
2004.10.12 - SLIDE 16
Example Questions
• Topic: Document Representation and
Statistical Properties of Text
• Example Questions:
– What is the significance of Zipf's law for
weighting of terms in information retrieval?
– What kinds of errors can a stemming
algorithm produce?
IS 202 – FALL 2004
2004.10.12 - SLIDE 17
Example Questions
• Topic: Queries, Ranking, and the Vector Space Model
• Example Questions:
– What is the difference between a search engine that uses the
vector space ranking algorithm on natural language queries and
a system that uses Boolean queries?
– What is the role of coordination level ranking in a faceted
Boolean system?
– Describe the following information need in terms of a faceted
Boolean query. What kinds of weighting algorithms can be
applied to a faceted query like this?
``I would like to find articles about the effects of the passage of
the independent investigator statute by Congress on how the
U.S. president chooses an attorney general.''
– Why do different web search engines return different sets of
documents for the same query?
– Redo the computations of Assignment 3 part 3 using different
values for TF.
IS 202 – FALL 2004
2004.10.12 - SLIDE 18
Example Questions
• Topic: IR systems and Implementation
• Example Questions:
– Draw and label a diagram that shows the major
components of an IR system.
– What are the special features of the Cheshire II
information access system?
– What is the purpose of an inverted index? How is it
used to generate answers to Boolean queries?
– Convert the contents of a set of documents (short
texts) into an inverted index representation.
IS 202 – FALL 2004
2004.10.12 - SLIDE 19
Example Questions
• Topic: Evaluation of IR Systems
• Example Questions:
– Define precision. Define recall. Define
relevance. How are the three interrelated?
– Under what circumstances is high recall
desirable? Under what circumstances is high
precision?
– What is the main purpose of TREC? How
does it differ from earlier evaluation efforts?
IS 202 – FALL 2004
2004.10.12 - SLIDE 20
Example Questions
• Topic: The Search Process
• Example Questions:
– Search and retrieval is part of a larger
process. Name some other components of
that process.
– How/why doesn't the Bates berry-picking
model fit with the standard information
retrieval model?
– How (fundamentally) does search on a
directory system like Yahoo differ from search
on Altavista or Google?
IS 202 – FALL 2004
2004.10.12 - SLIDE 21
Example Questions
• Topic: Relevance Feedback
• Example Questions:
– What is main the difference between relevance feedback as
defined in the literature and the more current web-based notion
of "more like this"?
– Given a query, three documents marked as relevant, and the
Rocchio formula for relevance feedback given in class, compute
the vector for the new query that results.
– The Koenemann & Belkin study found results in three conditions
for relevance feedback: opaque, transparent, and penetrable.
Consider the different ways people have implemented systems
for predicting which web page to show the user next. How do the
differences in these systems correspond to the different
relevance feedback
IS 202 – FALL 2004
2004.10.12 - SLIDE 22
Example Questions
• Topic: Database Design
• Example Questions:
–
–
–
–
How is a database different than a file system?
What are the benefits of a database system?
What do we mean by data independence?
What are the benefits/drawbacks of the primary
database models?
– Entity-Relationship Diagrams -- what are they for, how
do you create them?
– How do you normalize a relational model database?
– What is a join?
IS 202 – FALL 2004
2004.10.12 - SLIDE 23
Lecture Overview
• Midterm Review
– The administrative details
– The “Rules” for the exam
– We will go through the sample questions and
discuss them
– Open question/answer period
IS 202 – FALL 2004
2004.10.12 - SLIDE 24
Your Questions
• What other topics would you like more
explanation for?
IS 202 – FALL 2004
2004.10.12 - SLIDE 25
Be prepared, and good luck!
IS 202 – FALL 2004
2004.10.12 - SLIDE 26