Text Based Information Retrieval H02C8A H02C8B

Download Report

Transcript Text Based Information Retrieval H02C8A H02C8B

Text Based Information Retrieval
H02C8A
H02C8B
Marie-Francine Moens
Karl Gyllstrom
Katholieke Universiteit Leuven
Study points: 4 or 6
Language: English
Periodicity: Taught in the second semester
e-mail: [email protected]
[email protected]
2010-2011
Text Based Information Retrieval 2010-2011
information retrieval
Indexing: storingCrawling:
documents so they
Discovering
can be quickly
documents on the
retrieved when users web
search
Clustering: finding
Retrieval: finding
Ranking:
deciding
similar documents
good documents
to of the relevant
which
so they can be
answer users’
documents
are
the
retrieved together or
queries
best
stored on same
servers
E.g., text categorization, information extraction,
text clustering, summarization,
cross-language and cross-media retrieval, ...
Text Based Information Retrieval 20102011
Aims of the course
•
•
•
•
Acquire the fundamental techniques for text
based information retrieval and text mining
Learn to design, partially implement, and
evaluate a text based information retrieval
system
Acquire insights into current research
questions
Illustrate with commercial applications
•
1 lesson: speaker of an international company
(e.g., Microsoft, Yahoo)
Text Based Information Retrieval 20102011
Prerequisites
• Basic knowledge of:
– Probability theory and statistics
– Information theory
– Linear algebra
– (Machine learning)
Text Based Information Retrieval 20102011
Course material
• Course slides and exercise questions/solutions can be
downloaded from the Toledo platform
– http://toledo.kuleuven.be
– Background literature
Text Based Information Retrieval 20102011
Evaluation
• An assignment (grading: 33.3%): At the start of the course
(week 7) the student can choose an assignment (paper or
programming exercise), which regards a specific problem in
information retrieval. The assignment is due during week
21. A score of 50% or more on this assignment is
transferred to the second exam session.
Large programming assignment for 6 study points, choice
for paper only for 4 study points
• Theory exam (grading: 33.3 %): Oral with written
preparation, closed book.
• Exercise exam (grading: 33.3%): Written, open book.
Text Based Information Retrieval 2010
-2011
mail: [email protected]
[email protected]