Presentation - Conference Sites hosted by Acadia University Library

Download Report

Transcript Presentation - Conference Sites hosted by Acadia University Library

Integrating User Context
into the Digital Library
Elaine Toms
Associate Professor &
Canada Research Chair (Tier II) in
Management Informatics
Faculty of Management
Dalhousie University
Access 2004, Halifax, NS, October 15, 2004
What is being sought?
A known needle in a known haystack
A known needle in an unknown haystack
An unknown needle in an unknown haystack
Any needle in a haystack
The sharpest needle in the haystack
Most of the sharpest needles in the haystack
All the needles in the haystack
Affirmation of no needles in the haystack
Things like needles in any haystack
Let me know whenever a new needle shows up
Where are the haystacks?
Needles, haystacks – whatever
Source: Matt Koll, ASIDIC1999
I need cash for
the weekend
User: withdraw
Machine: how much
At the
ATM…..
User: $200
Machine checks balance
Machine: insufficient funds
At the
Search Engine
I want to buy a
palm -- latest
model
User: where can I buy a palm?
Computer checks index and produces a
list of relevant items:
1. Gardening places that supply palms
2. Electronic stores that sell the tungsten
3. Tarot card suppliers
4. Medical trials on limb replacement
User: hmm, no palms? limb replacement?
Never heard of that, but my son is doing a
project on medical discoveries. I’ll look at #4
In the Library
I have to do a
term paper
In the Library
I still have to
do a term
paper!!
Selects business
here
InNothing
the Library
about doing a
term paper
The Problem

Our assumptions about search






That the right query exists
That we need to maximize precision & recall
That a person’s information need is stable and
remains static
That value to the user is in the resulting
document set
That users can articulate what they really want
and that they really know what they want
That the system knows what the user really
wants
Search
box
Lists
Progress
in search interface design
Search circa 1970
?
Search circa 2000
?-
Search
How people find
information
Enter a URL into a
browser or select
from bookmarks
70%
60%
50%
Enter
keywords in a
search engine
40%
30%
20%
10%
0%
direct navigation
Arbitrary
selection from
current page
‘surfing’
search engines
Source: http://www.statmarket.com
information
void
Uh huh!
The
Information
Quest Problem
What is he talking about?
Let me see what matches.
I need a….
Here is what you asked for
Success!
Huh?
The Situation

USER





Perceive information
void
Materialize that
perception into
something operational
Translate into a query
Interpret system
response
Revise the query

SYSTEM



Must intuit from words
what the problem is
Must make a match
with the database
Must display for the
user a suitable response
Classic Search Model
Problem
Query
Surrogate
Text
Representation
Representation
User created
System created
Match
System solution: content matching!
Topical
relevance
is no longer
sufficient
Source: Adam
Challenge for Search

Searching in heterogeneous systems is a
complex task


Huge number of pages/documents
Variable in genre, format, language



Now cope with dynamic content


No longer just text; now images, video, code,
statistics, biological data, etc.
Both structured and unstructured information
Temporal and conditional elements
Relationships beyond the usual



Hyperlinks
Collections by communities, by individuals
Where it has been plus who has used it
Search Engine
Use
 Full text – the author’s
 Expert indexing
 User paths, e.g., recommenders
 Word/phrase/feature patterns
 Link analyses, e.g., reputation
lacks context
Context in Search
Environment
Situation
Work Task
Knowledge
Motivation
Experience
Physical ability
Cognitive ability
Learning style
Language
Mental Model
Member of a
group or
community
Information
Task
?
Age/Date(s)
Information Design
Format
Language
Genre
Authenticity
Topic
Domain
Created for a
purpose or
group
Medium
Resources
User
Object
Desired Outcome
Characteristics
not included
plus actions
over time
Orientation
Overview
Evidence
Explanation
Instructions
Definition
and so on
that is…
Novel
Accurate
Interesting
Reliable
Different
Nothing
and so on
Context in Search
Environment
Situation
Work Task
Information
Task
?
Created for a
purpose or
group
Medium
User
Resources
Desired Outcome
Object
Member of a
group or
community
Orientation
Overview
Evidence
Explanation
Instructions
Definition
and so on
that is…
Novel
Accurate
Interesting
Reliable
Different
Nothing
and so on
Our Studies
Research Question &
Design

Question:



What factors affect how people find
information?
What role does task domain have on the
search process?
Design



Exploratory and Experimental
Holistic approach to search process
Mixed paradigm: Quantitative and Qualitative
Study Context: TREC



Framework for
cooperative
research
Sponsored by
NIST, ARPA, etc.
Use common




data set
search tasks
time frame
Annual quasi
Molson race’
http://trec.nist.gov
Participants

48 participants:


19 males & 29 females
26-35 yrs (50% of
participants)
Non-experts

no search training
Highly Educated

Median Age:





80% had an
undergraduate degree
68% in humanities or
social science
Long term, but
moderate web use


Over 90% have used web
for > 2 years
65% spend from 1 to 10
hours per week on the
web
Tasks

Search Tasks




16 questions
4 representing each domain: Consumer
health, Travel, Shopping & Research
Each task completed by 12 people
Each participant completed 4 tasks:


One from each domain
Two as Questions and two as Keywords
(or phrases)
Sample Tasks




Tell me the name of a website where you can
find material on global warming.
Identify three interesting things to do during a
weekend in Kyoto, Japan.
List two of the generally recommended
treatments for _________________ (Fill in the
blank with a health-related matter that
interests you).
Find two websites that will let you buy a(n)
_________ online. (Fill in the blank with a
product that interests you)
Modified
Google
Interface
Procedures
Demographic &
Experience Survey
1. Pre-Task
Questionnaire
2. Search
Task
3. Post-Task
Questionnaire
4. Talk-after
Interview
Post-session
interview
Repeated
four times
Data
Results: Time in each
Search State by Task
Domain
Shopping
Task: Functional
Informational
Caveats:
Product names did
not match
Store names could
not be isolated
Restrict to
“purchasing”
Limiting to Specs
Goal: Find a Product type or Brand or Store
Hitlist: Identifying a place to purchase
Results:
Identifying specifications
Doing product comparison
Caveats:
Can I buy?
Is this
Canadian?
Caveats:
Compare 5
models…
Does this have an
X?
Conclusions from Study

Domain of the task – its context
– is a differentiator among
search processes

Need a new approach to the
search interface – one that
captures the rich interactions
that take place in the quest for
information
Need a task-based approach –
specialized information
appliances

What
peopl
e
need
Search
Specialized Shopping
Engine
Froogle is still too frugal!
Research
What stage in doing research?
Looking for a topic
Looking for background
material to understand topic
Looking for evidence and
original sources
Almost finished
Source: Vakkari
Contextual Factor:
Work Task
Environment: Bioinformatics Research
 Situation: Problem solving
 Work Task: How to do a functional
analysis of a gene

Novel gene
with
unknown
function
Work Task contains 12
processes and 2 key
decision points.
DNA sequence
fragments
1
(1)
Assemble
contigs
2
Each process uses
unique data and tools,
and mostly requires
human decision
making.
Longer/complete
DNA sequence
3
(2) Identify
open
reading
frame
4
List of ORF
sequences
5
(3)
Translate
DNA
sequence
8
6
9
10
11
12
13
14
Amino acid
sequence
(4)
Search for
homologous
sequences
7
List of homologous
sequences
To multiple alignment path
Source: Bartlett, 2004
(5)
Multple alignment
vs. domain/motif
path
To domain/motif path
Model Detail (Step 14)
Definition
Rationale
Tools
Input Data
Output Data
Interpretation
Next Steps
Caveats
14 PROTEIN PROFILE
• identifies characteristics of the putative protein
• identifies chemical characteristics of the putative protein
• identifies possible structural and functional regions of the protein
• one-step approach - completes multiple analysis in one step
• PredictProtein; DART; InterPro
• amino acid sequence
• listing of chemical characteristics of the putative protein
• diagram of sequence identifying characteristics of the regions of
sequence
• ideally will find very high (>=80%) to complete homology over the
functional domain
• results will suggest a putative function or activity for the protein gives a
“big picture” profile of the protein, and provides direction for further
detailed analysis
• go to a specialized database to further investigate features of interest
• laboratory verification
• since multiple analyses are done at once, the analysis cannot be refined
to optimize each one
Contextual Factor: Work
Task
Conclusions
 Work (and likely play) tasks are complex,
somewhat procedural but require many
instances of human decision making
 Highlights need to integrate information
retrieval processes into the work task
 And highlights the need to understand
work tasks so that IR systems can be
designed for that task or community
Mixed Contextual Factors



Environment: Software Engineering Consultants
Situation: Problem solving in software
engineering
Work Task:

Example: “Is there an equivalent of a nanny process in
WAS version 5.0. In other words, how can I run server
processes as monitored processes so that they can be
restarted if they go down?”

Is part of an ‘Engagement’ – identified ‘sets of
information context’: task, software, platform
uses multiple types of Information Tasks: learning
about, looking things up, finding tools


Task Type =
Configuration
Deployment
Design
Develop
Performance
Programming
Installation
Migration
Security
Relationship among task type,
information task and the
document type
Information Task
(Desired Outcome) =
Instruction
learning about
how to do it
finding advice
looking up facts
finding a solution
finding a tool
Document Type =
Hints&Tips
Infotopic
Presentation
Redbook/redpiece
Technical article
Tutorial
White paper
Digital Libraries

Key question – still outstanding


How can individuals access the information
they need at a moment in time, given their
persona at that time, and the resources
available at that time?
Leverage user context
Digital Libraries
What stage in doing research?
How can we
integrate our
understanding
of the research
process into
DLs?
Looking for a topic
Looking for background
material to understand topic
Looking for evidence and
original sources
Almost finished
Elaine Toms
Associate Professor &
Canada Research Chair (Tier II) in
Management Informatics
Faculty of Management
Dalhousie University
Email: [email protected]