Transcript notes

Information Retrieval
First lessons
Basic ideas

User needs information


Information sources




Distinguish data, information, knowledge
Very well organized, indexed, controlled
Totally unorganized, uncharacterized, uncontrolled
Something in between
Connect the two in a way that matches
information needs to information available.
The role of databases

Databases hold specific data items






Organization is explicit
Keys relate items to each other
Queries are constrained, but effective in retrieving the data
that is there
Databases generally respond to specific queries with
specific results
Browsing is difficult
Searching for items not anticipated by the designers
can be difficult
The Web



Extreme opposite of a database
No organization, no overall structure, no index
or key to the content
Searching and browsing are supported, but
generally are not complete. (You will not know
if you got every good response to your request.
You may be able to tell that you got the
response that meets your need, but may not
know if you got the best response available.)
Digital Library




Something in between the very structured
database and the unstructured Web.
Content is controlled. Someone makes the
entries. (Maybe a lot of people make the
entries, but there are rules for admission.)
Searching and browsing are somewhat open,
not controlled by fixed keys and anticipated
queries.
Nature of the collection regulates indexing
somewhat.
How do we know the response is
good?

Precision


Of the results returned, what percentage are
meaningful to the goal of the query?
Recall

Of the materials available that match the query,
what percentage were returned?
Text: Retrieval process
The process sequence
Query entered
Results Ranked
Query
Interpreted
Index
searched
Items
retrieved
The collection




Where does the collection come from?
How is the index created?
Those are important distinguishing
characteristics
Inverted Index -- Ordered list of terms related
to the collected materials. Each term has an
associated pointer to the related material(s).
CITIDEL



An example Digital Library
All items are relevant to computing education
Visit at


http://www.citidel.org
Part of the National Science Digital Library

http://www.nsdl.org