Individualized Knowledge Access

Download Report

Transcript Individualized Knowledge Access

Individualized
Knowledge Access
David Karger
Lynn Andrea Stein
Mark Ackerman
Ralph Swick
Information Access
A key task in Oxygen: help people manage
and retrieve information
Three overlapping projects:

Haystack:
 information storage and retrieval
 application clients
Semantic Web: next-generation metadata
 Volt: collaborative access

Presentation Overview
Motivation

Information access behavior and goals
System Design & Architecture
Data Model
 Interacting data and UI components

Working applications
Base haystack
 Frontpage
 Volt

Motivation
Problem Scenario
I try solving problems using my data:
Information gathered personally
 High quality, easy for me to understand
 Not limited to publicly available content

My organization:
Personal annotations and meta-data
 Choose own subject arrangement
 Optimize for my kind of searching

Adapts to my needs
Then Turn to a Friend
Leverage
They organize information for their own use
 Let them find things for me too

Shared vocabulary

They know me and what I want
Personal expertise

They know things not in any library
Trust

Their recommendations are good
Last to Library/web
Answer usually there
But hard to find
 Wish: rearrange to suit my needs
 Wish: help from my friends in looking

Lessons
Individualized access

Best tools adapt to individual ways of
organizing and seeking data
Individualized knowledge
People know more than they publish
 That knowledge is useful to them and others

Collaborative use

Right incentives lead to sharing and joint use
Haystack
Individualized access


My data collection, organization
Search tools tuned for me
Collaborate to leverage individual knowledge


Access unpublished information in others’ haystacks
Self interest
public benefit
Lens to personalize access to the world library

Rearrange presentation to suit my personal needs
Example
Info on probabilistic models in data mining







My haystack doesn’t know, but “probability” is in lots
of email I got from Tommi Jaakola
Tommi told his haystack that “Bayesian” refers to
“probability models”
Tommi has read several papers on Bayesian
methods in data mining
Some are by Daphne Koller
I read/liked other work by Koller
My Haystack queries “Daphne Koller Bayes” on
Yahoo
Tommi’s haystack can rank the results for me…
System Design
Gathering Data
Haystack archives anything

Web pages browsed, email sent and
received, address book, documents written
And any properties, relationships
Text of object (for text search)
 Author, title, color, citations, quotations,
annotations, quality, last usage

Users freely add types, relationships
Semantic Web
Arbitrary objects,
connected by
named links
No fixed schema

User extensible
HTML
Doc
Haystack
Sharable by any
application

A new “file system”?
D. Karger
Outstanding
Gathering Data
Active user input

Interfaces let user add data, note relationships
Mining data from prior data

Plug-in services opportunistically extract data
Passive observation of user

Plug-ins to other interfaces record user actions
Other Users
Data
Extraction
Services
Machine
Learning
Services
Spider
Triple Store
Web
Observer
Proxy
Mail
Observer
Proxy
Volt
Viewer/
Editor
Web
Viewer
Sample Applications
Sample Applications
Because everything uses the Semantic
Web constructions, a variety of
application clients can share information
Web Browser---data viewer
 FrontPage---personalized information filter
 Volt---collaboration tool

Haystack via Web
Web server
interface
Basic operations:



Insert objects
View objects
Queries
Haystack via Web
Haystack via Web
Viewer shows one
node and
associated arrows
Service notices
we’ve archived a
directory; so
archives the
objects it contains
(and so on…)
Haystack via Web
Services detect
document type,
extract relevant
metadata
Output can specialize
by type of object
Mediation
Haystack can be a lens for viewing data
from the rest of the world
Stored content shows what user
knows/likes
 Selectively spider “good” sites
 Filter results coming back

 Compare to objects user has liked in the past

Can learn over time
Example - personalized news service
News Service
News Service
Scavenges articles from your favorite news
sources

Html parsing/extracting services
Over time, learns types of articles that interest
you

Prioritizes those for display
Content provider no longer controls viewing
experience

No more ads
Personalized News Service
Collaborative Access
Want to leverage others’ work in
organizing information
No need to “publish” expertise
 Exposed automatically---without effort
 Self interest helps others

Volt
Volt is about collaboration between people

The Haystack architecture allows easy
collaboration among individuals
 semantic web references to Haystack objects
Individuals share parts of their Haystack
 Group spaces and shared notebooks

Volt
Collaborators
Those I interact with
Frequent mail contact
 Frequent visits to their home page

Those with shared content
And who have same opinions about
content
 Collaborative filtering techniques

Referrals
Expertise search engine
Expertise Beacon
Volt Expertise Beacons
Group spaces and shared notebooks

Create individual and group profiles
Profiles can be used to find other people
Allows targeted search
 “Who else is working on this project?”

User controls visibility/privacy
Summary
Next generation information access
Semantic Web

provides a language and capabilities for meta-data
Haystack



teases out individual knowledge,
stores it in a coherent fashion, and
allows a variety of application clients to leverage
individual meta-data
Volt

turns individual knowledge into a community resource
More Info
http://haystack.lcs.mit.edu/
http://www.w3c.org/2001/sw
[email protected]
[email protected]
[email protected]
[email protected]