ppt - Courses
Download
Report
Transcript ppt - Courses
SIMS 296a-3:
Current Topics in Information
Access
Marti Hearst
Fall ‘98
Today
Introductions
Goals and Course Requirements
Administrivia
Topics
What is Information Access
Current Topics (an outline)
Intro to IA
Goals
Become expert on the state-of-the-art in
timely topics related to information access
Begin getting research results.
Course Requirements
To get S/U credit for the class
Lead two discussions
Do the readings
Attend the meetings
Course Requirements
To get a grade in the class
Do the above
Do one of the following (optionally with the help of
a faculty member and/or another student):
Write a publishable survey paper on an
emerging area of information access.
Do research that should lead to a publishable
research paper on a new idea, method,
analysis, or vision statement for an emerging
area of information access.
Implement and/or evaluate code to further an
information access research project.
Administrivia
Sign up sheet
Readings
Other questions?
Outline
What is Information Access?
Goals, Tasks, Types of data
Standard Information Retrieval
Assumptions, Techniques, Evaluation
Current Topics
Candidate topics
What is Information Access?
Information Access:
The process by which users use
information technology to seek, organize,
and understand information.
Focus: information expressed as text.
Information Retrieval
Task Statement
Build a system that retrieves documents that
users are likely to find relevant to their queries.
This set of assumptions underlies the field of
Information Retrieval.
Information Retrieval
Assumptions
The system has available only preexisting, “canned” text passages.
Its response is limited to selecting from
these passages and presenting them to
the user.
It must select, say, 10 or 20 passages out
of millions or billions!
Top 10 Research Issues for IR
What do people want from IR?
By Bruce Croft, DLIB Magazine, Nov 95
Based on work observations from work on
public-domain systems, including:
THOMAS
American Memory Project (Library of
Congress)
The order of importance does not correspond
to many IR researchers’ priorities.
The same can be said for AI researchers.
Top 10 Research Issues for IR
Bruce Croft, DLIB Magazine, Nov 95. In descending order of
importance.
Integrated Solutions
Distributed IR
Efficient, Flexible Indexing and Retreival
“Magic” (Effective Vocabulary Expansion)
Interfaces and Browsing
Routing and Filtering
Effective Retrieval
Multimedia Retrieval
Information Extraction
Relevance Feedback
Other Issues
Mundane issues are important
Spelling Correction
Fast display of initial results
Less important but more interesting from
many researchers’ points of view: (Bruce Croft,
DLIB Magazine, Nov 95)
Multilingual IR
Data Mining (in text databases)
Text Categorization
Matching Tasks, Collections, and
Search Systems
Typical WWW search is not the whole
picture.
Different information needs require:
different collections
different search systems and strategies
Compare:
general WWW
newswire and magazines
medical journal articles
Match Task and Search Type
WWW Tasks: (from www.cnet.com/Content/Reviews/Compare/Seach/ss1a.html)
Find how-to pages for Doom.
Purchase plane tickets and hotel for a trip to Java.
Find the top five all-time scoring leaders in the national
hockey league.
Find a recipe for potato latkes.
Find the tide tables for Maui.
Characteristics:
Timely, specific, found via help from human
agents and in well-known resources before the
WWW.
Match Task and Search Type
Newswire & Magazine Tasks: (from the TREC
collection)
Find articles on research into cures for osteoporosis.
Find articles on the effects of recycling of tires on the
environment.
Find information on jail and prison overcrowding and how
inmates are forced to cope with those conditions.
Find discussion of an existing or proposed insurance plan
(governmental, commercial or individual) and the coverage it
provides for long term care confinements in an institution.
Characteristics:
Complex combinations of topics.
Research-oriented
Either timely or retrospective
Match Task and Search Type
MEDLINE Tasks: (From OHSUMED, medir.ohsu.edu/pub/ohsumed)
Are there adverse effects on lipids when progesterone is
given with estrogen replacement therapy?
Pathophysiology and treatment of disseminated
intravascular coagulation.
Reviews on subdurals in the elderly.
Effectiveness of etidronate in treating hypercalcemia of
malignancy.
Characteristics
Research-oriented
Technical
Cause and Effect, Implications
The Problem of Information Access
Main problem:
Computers can’t understand natural
language.
Therefore:
Information access systems must guide
users to information of interest by
approximate methods.
General common methods:
word match
topic directories
Why Text is Tough
Abstract concepts difficult to represent
(AI-Complete)
“Countless” combinations of subtle,
abstract relationships among concepts
Many ways to represent similar concepts
space ship, flying saucer, UFO, figment of imagination
Concepts are difficult to visualize
High dimensionality
Tens or hundreds of thousands of features
Why Text is Tough
I saw Pathfinder on Mars with a telescope.
Pathfinder photographed Mars.
The Pathfinder photograph mars our
perception of a lifeless planet.
The Pathfinder photograph from Ford has
arrived.
The Pathfinder forded the river without
marring its paint job.
Outline
What is Information Access?
Goals, Tasks, Types of data
Standard Information Retrieval
Assumptions, Techniques, Evaluation
Current Topics
Candidate topics
User Interfaces
Quality Assessment
Text Data Mining
Student suggestions
Tools for Information Access
User Interfaces
(information visualization)
Information Access
(information retrieval)
Language and
Content Analysis
Task Analysis
Current Topics
User Interfaces
Incorporating “personal” information
Automated “Agents” vs. User Initiated Steps
Support for the dynamic process of
information access
How to organize large search results
Categories, clusters, combinations of these
Question Answering
Others?
Current Topics
Quality Assessment
Issues:
How to define quality
Rating methods
Different fields (medicine, business)
Techniques
Visitation patterns and times
“Social” techniques
Link structure (co-citation patterns)
Link structure + content
Current Topics
Text Data Mining
Visualizating the contents of large text
collections
Automatically discovering associations
within text collections
Discovering useful patterns
Spotting anomalies
*Finding chains of associated information
*I have a proposal for this
Current Topics
Cognitive modeling/AI techniques
Your idea goes here:
For Next Time
Do background reading
Think about which topics to pursue
I will present more background
information