Transcript lcs35
Individualized
Knowledge Access
David Karger
Lynn Andrea Stein
Web Search Tools
Indices
search by keyword
Taxonomies
A lot like libraries...
Library catalogues
Dewey Digital
classify by subject
Cool site of the day New book shelf,
suggested reading
Is a universal library enough?
Library/Web Limitations
Huge:
too many answers, mostly irrelevant
Only published material
miss info known to few, leading-edge content
Rigid:
all get same search results
even if come back and try again
The library is the last place we look
Bookshelves First
My data:
information gathered personally
high quality, easy for me to understand
not limited to publicly available content
annotations
My organization:
choose own subject arrangement
optimize for my kind of searching
Adapts to my needs
Then a Friend
Leverage
they organize information for their access
so quickly find things for me
Personal expertise
they know things not in any library
Trust
their recommendations are good
Shared vocabulary
they know me and what I want
Last the Library
Answer usually there
but hard to find
would be nice to rearrange to my needs
For hardest problems, need librarian
they have broad knowledge of library
but not as deep as an expert on question
Lessons
Individualized access: The best tools adapt
to individual ways of organizing and
seeking data.
Individualized knowledge: People know
much more than they publish. That
knowledge is useful.
Haystack:
a Tool for Oxygen
Independent but interacting repositories
that adapt to their individual users
Individualize access
My data collection, organization
My search tools, with answers for me
Leverage individual knowledge
Collaborative retrieval with others
Motivate people to organize their data for
their own benefit and thus for others’
Example
Have probabilistic models been used in
data mining?
My haystack doesn’t know, but “probability” is
in lots of mail I got from Tommi Jaakola
Tommi told his haystack that “Bayesian”
refers to “probability models”
Tommi has read several papers on Bayesian
methods in data mining
His haystack suggests them to mine
Research Threads
Heterogeneous data and metadata
archive whatever user wants
Human-Computer Interaction
let user express/use own organizational rules
observe user to detect unexpressed knowledge
Machine learning
use gathered data to improve performance
Collaborative filtering
use others’ decisions to help me
My data
Haystack archives anything
web pages browsed, email sent and received,
documents written, scanned images, home
directory, people known, projects worked on
And any properties, relationships
text of object (if know how)
author, title, color, citations, quotations,
annotations, quality, last usage
Users freely adds types, relationships
Gathering My Data
Active user input
interfaces let user add data, note relationships
Mining data from haystack
plug-in services opportunistically extract data
e.g., find author/title/text in MSWord document
or, detect that one document quotes another
Observing user
plug-ins to other interfaces report user actions
web pages browsed, mail sent, queries made
Adaptation
Remember user’s attempts to tune a query
instead of first query attempt, use last one
record items user picked as good matches
future similar queries do better right away
Stored content shows what user knows/likes
modify queries to big search engines
filter results coming back
personalized “cool site of the day”
Collaborative Access
Leverage others’ work organizing data
no need to “publish” expertise
exposed automatically
self interest helps others
Privacy/permission concerns
allowing exposure easier than publishing
much public info: mailing lists, papers read
Whose opinions matter?
people I mail, w/shared data, referrals
collaborative filtering techniques
Conclusion
Libraries are not enough
Haystack teases out individual knowledge
Individualizes information access for user
Exposes individual knowledge to benefit
community
Current status: individual-user prototype.
Some data extraction, observation, adapting.
Collaborative version in future.