MyLifeBits - Interactive Computing Lab

Download Report

Transcript MyLifeBits - Interactive Computing Lab

A Personal Database for Everything
Inspired by Memex
www.MyLifeBits.com
Gordon Bell, Jim Gemmell, Roger Lueder
Original slides:
http://research.microsoft.com/en-us/um/people/gbell/Bell_MyLifeBits_Talk_SIGMOD_050614_web.ppt
Outline
•
•
•
•
•
•
How has the project evolved?
How do we use MyLifeBits?
How is it built?
How large is the database?
What is the vision?
What is left and how can you help?
I am data
Ambience and Presence:
Being there while being here
Dining at home on the “Orient Express”
History: The remote worker
re-discovers the PERSONAL computer
Oct 1998
Can we
scan your
books and
put them
online?
Raj Reddy
Sure! Don’t worry
about copyright
stuff. Microsoft
has lots of
lawyers
1999 – Scanning starts in earnest
“we” start to scan
My docs and archive
Library/file cab
X- Employer
Active Employer
Library/file cab
Employer
Self
..
..
Biographical
Project
Employer
Project
Project
Employer
Business
Invests,
family $s,
& Legal
Library/file cab
Library/file cab
Library/file cab
Library/file cab
X-Employer
Library/file cab
Library/file cab
Library/file cab
Library/file cab
Library/file cab
<1980s Library/file cab
Project
Project
Personal,
including
Medical
Gordon, You
should be using
a database.
Jim, I don’t
need no stinkin’
database!
Now that it’s in Cyberspace
How do you remember the 20,000+ file names?
Or in which of 1500 folders they live?
What’s about a tool for finding stuff?
Jan 2001 CACM
“A Personal Digital Store”
• 16 GB; +2/yr
• A good place to stop
• Began search for
search engines,
especially for email.
• Jim suggests that we
build a system that
would be easier to
use and have many
more capabilities.
Re-discovery of Memex
As We May Think, Vannevar Bush, 1945
“A memex is a device in which an individual stores all his
books, records, and communications, and which is
mechanized so that it may be consulted with exceeding
speed and flexibility”
• Full-text search, text & audio annotations, and hyperlinks
2001 Capture goes beyond paper
Even more capture
• Telephone calls, more video, all web pages
visited, usage logging, radio, TV…
2003 - SenseCam
Feb 2005
Epiphany!
Memex is a database
&
personal TP system
Steve Mann timeline
Visually
impaired
UW 2004
“I sensed”
Clarkson MIT
c2001
MyLifeBits Software
Everything goes in a database
• MyLIfeBits need all the features of a database
(Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup,
Replication)
• If we didn’t use one, we’ll eventually create one!
• Files as blobs; sync with file system for legacy apps
• We are part of Jim Gray’s Bay Area Research Lab
SQL
MyLifeBits Software
Room
Capture
GPS import &
Map display
SenseCam
Import files
VIBE
logging
MyLifeBits
Shell
Text
annotation
tool
Voice
annotation
tool
Screen saver
MyLifeBits
store
Radio
capture
& EPG
Internet
Browser
tool
Legacy
applications
database
IM capture
MAPI
interface
files
PocketPC
transfer
tool
Outlook
interface
TV capture
tool
PocketRadio
player
Telephone
capture tool
TV EPG
download tool
Legacy
email client
MyLifeBits Schema
(simplified)
Images
Music
Phone calls
Event
types
Link types
Events
Event log
Links
Tasks
People
Items
Notes
SenseCam
Data
Email
GPS data
Saved searches
Resource
entities
Web pages
Entity types
Window, key,
mouse log
Demo Clips & Screens
747 Screen…
Vue de jour
Reports
Add item to
collection(s)
Refine email shell
Refine email shell2
Pivoting:
contact> call>
t>
web page
Refine by classification--dentist
GPS Photo location
SenseCam
Timeline
Google??
The Shape & Size of Gordon’s
LifeBits
Number
Tiff 2720
.PDF 2793
.PPT 1642
Other 6775
.xls 1358
MyLifeBits
3/26/2005
Doc&Rtf 13051
206K items
101 GB
Pictures 38638
Web pages 53454
eMail 84345
by number of
Items.
GB
Doc&Rtf 1.0
.PPT 4.0
Other 1.0
Web pages 4.1
MyLifeBits
3/26/05
.PDF 4.4
101 GB
206 K items
Pictures 7.1
Tiff 7.5
Video 58.9
By Size (GB)
Audio 10.9
Size (MB) by Type
Bell Growth:
1GB/month
=1 TB/lifetime
1995-2004 of email (incl. attachments)
16000
300
MB
Msg/year
250
12000
200
8000
150
100
4000
50
20
04
20
03
20
02
20
01
20
00
19
99
19
98
19
97
0
19
96
19
95
0
10000
15,000 photos
1000
100
Year
1997
1999
2001
2002
2003
2005
Mpix
.25
1
2
3
4
5
Manufacturer
Ricoh
Kodak
Canon
Sony
Sony
Panasonic
10
1
1895
1905
1915
1925
1935
1945
1955
1965
1975
1985
1995
2005
Monthly & Lifetime Storage Use
Item
Daily number
Total* MB|GB
Month|Life
1 MB Books|reports
0.1
3
5KB Emails
100
13
5
13
10
100
75 KB Web pages|docs
100
188
100 MB Music
0.1
250
40,000
1,000
1,000
1,250
4
200,000
100 KB Image scans
0.4 MB Photos
1 KB/s Listened audio, speech
50 KB Daily photos
2 GB/hr TV
Observations about use(rs)
1.
2.
3.
4.
5.
6.
On Apps:
– Search is the “killer app” pretty much as Bush described.
– Screen savers “memory refreshers” also provide ambience
– Where did my day to?
Users are unwilling to spend time managing their computers or data.
– User-input meta-data e.g. Dublin Core – naïve’ Librarian’s dream.
– Meta-data, classification, etc. must be automatic
– Great scheme for classification using facets. It requires work.
Time is the most important meta-data. Photos: place (GPS), subject.
Folders are a good and bad idea.
– Most users don’t know what they are or how they work
– If used, over time, they become useless: too many, miss-file, etc.
User should put “every” information fragment into the system. e.g.,
to dos, call backs, business cards numbers, attention events. It pays.
Same information in multiple places always becomes obsolete.
Evolution: Silo Apps on isolated DB islands
vs.
Cut & Paste across apps
•Contacts: email, instant messages, phone, correspondence
•Family and organizational relationships
•Location of people, organizations, etc.
•Photo database: who, where, when, what…
•Money payees, phone, etc.
•Health providers and caregivers
•User written apps in excel or access
Common ground with WinFS:
Items, Links & Meta-data
Outlook_CalendarItems2
PK,FK1
Photo of Event
item_id
Subject
Start
End
Description
Location
Creation Time
Modified
TAPI_PhoneCalls2
PK,FK1,I1
I1
Caller in Phone Call
item_id
Phone
Call Type
CID
CID Name
CID #
Begin
End
Seconds
Connected
Ended
Roaming
Trimmed
Recorded
Transcript
IMG_Images2
Annotates
PK,FK1,I1,I2,I3
I1
I2
I3
item_id
Width
Height
Date Taken
Camera Make
Camera Model
Latitude
Longitude
Elevation
PhotoFinder - Shneiderman and Kang
Challenges
The “dear appy” problem
Dear Appy,
How committed are you?
Please come back to me.
Forever yours truly,
Lost and forgotten data
• Who’s responsible?
–Media or 8 track cassette, 8” floppy
–Evolving platform, file, and database
–Evolving, incompatible standards & formats for legacy
data that disregard ancestors
–Evolving and/or disappearing apps
Automatic classification problem
• XML on bills and imported content… transactions
• We need to download classifications rather than build
them
– Definitions & synonyms should help find what I want
• Today it is too expensive to manually classify scanned
paper. E.g. “right time” meta-data is critical!
• We hope “the system” can classify papers and other
documents e.g. bills. Ideally, build Dublin Core
• In 10 years we need all documents to appear
electronically & classified with a little help from me
More challenges
•
•
•
•
•
•
•
•
Dear Appy: Monitoring and automatic migration of files that are
unlikely to be understood on future platforms as well as platform
migration.
Get What I Need: Endless, but evolutionary improvements in
search: misspellings, stemming synonyms
Endless frontier of schema and extensions to them for new
applications e.g. making org charts, family relationships.
Capture, Archival and Retrieval of Personal Experiences (CARPE)… a
whole new game!
Versioning is essential
Scaling.. We don’t know what happens at a Terabyte
What can, should be, or will be in the cloud? Books… videos
Will we be allowed to use such systems? Copyright laws vary: E.g.
ripping CDs, copy of anything, photos, conversations
Challenges
• Data-types
– Quantity expanding i.e. info explosion
– New capabilities e.g. real time create new data-types
– Meta-data to increase value & provide pivots
• Going beyond a PC to a distributed environment
–
–
–
–
–
Network environment, including media center
Into the cloud
Periphery… smart buildings, objects,
Backup, migration, and caching for beyond a Terabyte
Expanding network: PC > LANs > web > P2P
• Schema sharing among disparate systems
• CARPE (real time data capture)
– Rooms, phone calls, SenseCam, Health transducers, etc.
• Security, privacy, forgetfulness, deniability, etc.
www.MyLifeBits.com