Haystack - Boston KM Forum
Download
Report
Transcript Haystack - Boston KM Forum
Haystack:
Per-User Information Environments
David Karger
Motivation
Web Search Tools
Indices
Taxonomies
search by keyword
A lot like libraries...
Library catalogues
Dewey digital
classify by subject
Cool site of the day
New book shelf,
suggested reading
Is a universal library enough?
Library/web Limitations
Huge
Only published material
Too many answers, mostly irrelevant
Miss info known to few, leading-edge content
Rigid
All get same search results
Even if come back and try again
The library is the last place we look
Start with Bookshelf
I try solving problems using my data:
My organization:
Information gathered personally
High quality, easy for me to understand
Not limited to publicly available content
Personal annotations and metadata
Choose own subject arrangement
Optimize for my kind of searching
Adapts to my needs
Then Turn to a Friend
Leverage
Shared vocabulary
They know me and what I want
Personal expertise
They organize information for their own use
Let them find things for me too
They know things not in any library
Trust
Their recommendations are good
Last to Library/web
Answer usually there
But hard to find
Wish: rearrange to suit my needs
Wish: help from my friends in looking
E.g. NY public library catalogue
Lessons
Individualized access: The best tools adapt to
individual ways of organizing and seeking data.
Individualized knowledge: People know much
more than they publish. That knowledge is
useful to them and others.
End user: understands their data the best, so
should control organization and presentation
Problems with Current Tools
Applications designed by few for use by many
Users discover uses/needs for other info
Tool cannot store, cannot support interaction
Users discover connections between info
Developers decide what information is important
Provide model to hold that information
Provide interfaces to view/manipulate that info
If connected info is in different applications, neither app
can record connection
People could do a lot more with information, if
environment let them record/use what they know
Haystack Approach
Data Model
User Interface
Strengthen UI tools to show rich data model to user
And let them navigate/manipulate/share it
Adaptation
Define rich data model that lets user represent all interesting info
Rich search capabilities
Machine readable so agents can augment/share/exchange info
People are lazy, unwilling to “waste time” telling system what to
do, even if it could help them later
System must introspect about user actions, deduce user needs
and preferences, and self-adjust to provide better behavior
Collaboration
As system gathers information from one user, share with others
Rich data model maximizes useful knowledge transfer
Data Model
A semantic web of information
Motivation
Tremendous amount of information is relational
Named relationships
Collections
Written by, married to, traveling to, owned by…
Directories, bookmarks, menus, albums
Families, workgroups,
Web links
People can take huge advantage of navigating
relationships
Network of relationships much more “structured”
than a textual description, but much less regular
than a spreadsheet/database
The Haystack Data Model
W3C RDF/DAML standard
Arbitrary objects,
connected by named links
Doc
A semantic web
Links can be linked
No fixed schema
HTML
Haystack
User extensible
Add annotations
Create brand new attributes
D. Karger
Outstanding
RDF Lowers Barriers
Location Independent
Application Independent
Can add attributes as needed, leave them out if unimportant
Enables powerful search
Simple, common language suitable for variety of information types
Enables interlinking and exchange of information from all apps
Extensible
Universal Locators, even for local data (as may become non-local)
Based on broad variety of attributes
Support for data agents
Extract information from raw data
Make available for search and other forms of navigation
Where does data come from?
Pull from outside sources
Active user input
Plug-in agents opportunistically extract data
Passive observation of user
Interfaces let user add data, note relationships
Mining data from prior data
Web, databases, news feeds…
Plug-ins to other interfaces record user actions
Other Users
Data
Extraction
Services
Machine
Learning
Services
Spider
RDF Store
Web
Observer
Mail
Observer
Haystack
UI
Web
Viewer
User Interface
Uniform Access to All Information
Current Barriers to Information Flow
Partitions by Location
Partitions by Application
Mail reader for this, web browser for that, text editor for those
To-do list, but without needed elements
Invisibility
Some data on this computer, some on that
Remote access always noticeable, distracting
Where did I put that file?
Tendency for objects to have single (inappropriate) location
(folder)
Missing attributes
Too lazy to add keywords that would aid searching later
Goal: Task-Based Interface
When working on X, all information relevant to X
(and no other) should be at my fingertips
Planning the day: to-do list, news articles, urgent email,
seminars
Editing a paper: relevant citations, email from coauthors,
prior versions
Hacking: code modules, documentation, working notes,
email threads
Location, source and format of data irrelevant
Sign of Need: Email Usage
Email as to-do list
Anything not yet “done” kept there
Reminder email to ourselves
Single interface containing numerous document types
Overflowing Inboxes
Navigate only by brute-force scanning
Unsafe file/categorize anything: out of sight, out of mind
Interface Options
Folders
Out of sight, out of mind
Still need applications to see data
Which is the right folder?
Desktops
Allow arbitrary data types
But coupling between applications & data types too light
A smear of many tasks, so hard to focus
Hundreds of icons, tens of windows, huge menus
No partitioning
Databases
OK if you have a degree in database administration
Interface is impoverished---long lists of tuples
The Big Picture
User Interface Architecture
Views: Data about how to display data
Views are persistent, manipulable data
View
View 2
UI data
UI data
Mapping
Data to be displayed
Underlying
information
Mapping 2
Semantic User Interface
Present information by
assembling different views
together
Information manipulation
decoupled from presentation
New views can be added without
mucking with data types
New data types can be added
without designing new UIs
Uniform support for features
like context menus
Actions apply to objects on
screen in various “roles”
E.g. as word, as title of mail
message, as member of
collection
View for Favorites collection
View for cnn.com
View for yahoo.com
View for ~/documents/thesis.pdf
Persistence of Views
Views are data like all other data
Stored persistently, manipulated by user
User can customize a view
View for particular task can be cloned from another
Can evolve over time to need of task
To an extent previously limited to sophisticated UI
designer
Views can be shared
Once someone determines “right” way to look at data,
others can benefit
Role of Schemata
Benefits
Risks of Enforcement
Deters lazy users from entering data
Prevents creative users from stretching the boundaries
Is there a middle ground?
Help people look at information the right way
Help creators avoid creation mistakes
Can schemata be “advisory”?
One or many?
If each user makes own schema, how translate?
Brief look
Adaptation
Learning from the User over Time
Approach
Haystack is ideally positioned to adapt to user
RDF data model provides rich attribute set for learning
In particular, can record user actions with information
(the flexible UI can capture easily)
Extensive record can be built up over time
Introspect on that information
Make Haystack adapt to needs, skills, and preferences of
that user
Observe User
Instrument all interfaces, report user actions to
haystack
Discover quality
What does the user visit often?
Discover semantic relationships
Mail sent, files edited, web pages browsed
What gets used at the same time?
Discover search intent
Which results were actually used?
Learning from Queries
Searching involves a dialogue
First query doesn’t work
So look at the results, change the query
Iterate till home in on desired results
Haystack remembers the dialogue
instead of first query attempt, use last one
record items user picked as good matches
on future, similar searches, have better query plus
examples to compare to candidate results
Use data to modify queries to big search engines, filter
results coming back
Mediation
Haystack can be a lens for viewing data from the
rest of the world
Stored content shows what user knows/finds useful
Selectively spider “good” sites
Filter results coming back
Compare to objects user has found useful in the past
Can learn over time
Example - personalized news service
Collaboration
Haystack’s Ulterior Motive
Hidden Knowledge
People know a lot that they are
Haystack passively collects that knowledge
Without interfering with user
Once there, share it!
Willing to share
But too lazy to publish
RDF---uniform language for data exchange
Challenges
As people individualize systems, semantics diverge
Who is the “expert” on a topic? (collaborative filtering)
Example
I want info on probabilistic models in data mining
My haystack doesn’t know, but “probability” is in lots of
email I got from Tommi Jaakola
Tommi told his haystack that “Bayesian” refers to
“probability models”
Tommi has read several papers on Bayesian methods in
data mining
Some are by Daphne Koller
I read/liked other work by Koller
My Haystack queries “Daphne Koller Bayes” on Yahoo
Tommi’s haystack can rank the results for me…
Summary
Rich data Model
User Interface
Extensibly shows rich data model to user
Lets them navigate/manipulate it
Adaptability
Lets user represent all interesting info
Supports sophisticated searches
Accessible to information agents
System may introspect about user actions, deduce user needs and
preferences, and self-adjust to provide better behavior
Collaboration
As system gathers information from one user, share with others
Rich data model maximizes useful knowledge transfer
More Info
http://haystack.lcs.mit.edu/
(initial release available for download)
[email protected]