CSM06 Information Retrieval

Download Report

Transcript CSM06 Information Retrieval

CSM06 Information Retrieval
Lecture 1c – Module Overview
Dr Andrew Salway
[email protected]
AIM
• The module introduces a range of data
structures and algorithms for information
retrieval, as well as some of the underlying
theory and approaches. The emphasis will be
on information retrieval for the web, but
organisation-wide archives and personal
media collections will also be considered.
• Students are encouraged to develop practical
experience through case studies of current
information retrieval systems and applications,
and through the development and evaluation
of part of an information retrieval system.
LEARNING OUTCOMES
By the end of the module you should be
able to:
• Compare and contrast the theory, techniques and
applicability of the Boolean Model of IR, the Vector Space
Model of IR and the ranking techniques used by web search
engines
• Apply a range of techniques for information retrieval and
justify the selection of techniques for a particular
information retrieval application
• Demonstrate an appreciation for the challenges currently
facing information retrieval the developers of information
retrieval systems for the web, and the extent to which
current web search-engines, address these challenges
• Explain how visual data presents different challenges than
text data, and compare and contrast approaches and
systems for image and video retrieval
• Discuss current research themes in the field of information
retrieval, and research and evaluate advanced IR
techniques
How the module runs…
• Lectures: Tuesdays 10-1: reading
the slides should not be a substitute
for attending
• Assessment: 60% unseen written
examination; 40% one piece of
coursework (set next week)
• Coursework tutorials: some
of the scheduled lecture time
will be used to give feedback
on coursework – extra
coursework tutorials will be
arranged if required
Outline of Module Content
Text Information Retrieval
Lecture 2
· Tokenization, stemming, stop lists à inverted
index; STAIRS data structure
· Boolean Model: underlying set theory
· The challenges of synonymy and polysemy
· Vector space model: texts and queries as
vectors; cosine distance
Lecture 3
Enhancing the vector space model:
· Term Frequency – Inverse Document
Frequency
· Relevance Feedback
· Latent Semantic Indexing
· Generating clusters for query expansion
Outline of Module Content
Web Search
Lecture 4
· Characteristics of information on the web
· Ranking techniques for web search engines
· Link analysis - Page Rank algorithm
Lecture 5
· Finding similar pages: companion algorithm
and co-citation algorithms
· Enhancing users’ queries: geographic
queries; questions into queries - TRITUS
Lecture 6
· Visualising the results set: InfoCrystal,
TileBars, Vivismo, KartOO.
· Further features of web search engines
Outline of Module Content
Visual Information Retrieval
Lecture 7
· Kinds of image metadata
· The challenge of the semantic gap and the
sensory gap
· Content-based Image Retrieval – QBIC,
BlobWorld
· Web image retrieval using collateral text –
WebSEEK, Google, Munson, Yanai
Lecture 8
· Video retrieval: Informedia, Google Video,
Blinkx TV
· Learning associations between visual
features and words using web data
Outline of Module Content
Lecture 9
· Group coursework presentations
Lecture 10
· REVISION
How the module runs…
“Resource based, independent learning”
• Consider spending about 3 hours per
week on reading and doing exercises, in
addition to the coursework
• Resources include:
– Lectures and lecture slides, and some
additional notes
– Exercises
– Set reading and further reading
– Your lecturer ([email protected])
– Module web-page: bookmark this and check
it regularly. Lecture slides and other
important information will normally be
available in advance of the lecture.
• Any problems? Let me know!
What is ‘Set Reading’?
• Set Reading is considered essential for
the module
• You are expected to make your own
notes from the set reading to give
yourself a broader and deeper
understanding of the material covered
in lectures
• Key sections / pages / ideas will be
pointed out when the reading is set
• All the Set Reading will be available
either online or from the Library Article
Collection service
What is ‘Further Reading’?
• It is NOT essential to do Further
Reading
• The Further Reading is given if you
want more details, and / or alternative
explanations for the topics covered in
the lecture
• The alternative explanations may be
useful to help you understand the
lecture better
• The extra detail may be appropriate if
you are doing coursework in the area,
or if you are just interested!
• BUT, be careful not to overload
yourself with ‘Further Reading’ – it is
NOT essential for the module
Is there a set textbook?
• There is NOT a textbook that you
have to buy for this module.
• Some reading will be set from the
books listed on the next slide, but
copies will be available in the
library article collection.
Books Referred To During the
Module…
• Baeza-Yates and Ribeiro-Neto (1999), Modern Information Retrieval,
Addison Wesley.
• Baldi, Frasconi and Smyth (2003), Modeling the Internet and the
Web, Wiley.
• Belew, R. K. (2000), Finding Out About, Cambridge University Press.
• Weiss et al. (2005), Text Mining: predictive methods for analysing
unstructured information, Springer.
• Hock (2001), The Extreme Searcher’s Guide to Web Search Engines,
2nd Edition, CyberAge Books.
• Del Bimbo (1999), Visual Information Retrieval, Morgan Kaufmann.
• G. J. Kowalski and M. T. Maybury (2000), Information and Storage
Retrieval Systems. Kluwer Academic Publishers.
• K. Sparck Jones and Peter Willett (1997), Readings in Information
Retrieval. Morgan Kaufmann.