MLB_Stanford_Talk Nov 11, 2005

Download Report

Transcript MLB_Stanford_Talk Nov 11, 2005

Systems to Capture Everything:
Beyond cameras and desktops
www.MyLifeBits.com
Gordon Bell, Jim Gemmell, Roger Lueder
Outline

MyLifeBits aka Memex
How has the project evolved?
 How do we use MyLifeBits?
 How is it built?
 Shape of the database?


CARPE- Continuous archiving and
recording of personal experience
What is the vision?
 Relevance for devices and software?

I am data
History: Telepresence
Tele-presentations
Tele-meetings
Ambience and Presence:
Being there while being here
Dining at home on the “Orient Express”
History: The remote worker rediscovers the PERSONAL
computer
Oct 1998
Can we
scan your
books and
put them
online?
Raj Reddy
Sure! Don’t worry
about copyright
stuff. Microsoft
has lots of
lawyers
1999 – Scanning starts in earnest
“we” start to scan,
put content into
folders & files
My docs and archive
Library/file cab
X- Employer
Active Employer
Library/file cab
Employer
Self
..
..
Biographical
Project
Employer
Project
Project
Employer
Business
Invests,
family $s,
& Legal
Library/file cab
Library/file cab
Library/file cab
Library/file cab
X-Employer
Library/file cab
Library/file cab
Library/file cab
Library/file cab
Library/file cab
<1980s Library/file cab
Project
Project
Personal,
including
Medical
Now that it’s in Cyberspace
How do you remember the 20,000+ file names?
Or in which of 1500 folders they live?
What’s about a tool for finding stuff?
Jan 2001 CACM
“A Personal Digital Store”




16 GB; +2/yr
A good place to
stop
Began search for
search engines,
especially for email.
Jim suggests that
we build a system
that would be
easier to use and
have many more
capabilities.
2001 Capture goes beyond paper
Gordon, You
should be using
a database.
Jim, I don’t
need no stinkin’
database!
Re-discovery of Memex
As We May Think, Vannevar Bush, 1945
“A memex is a device in which an individual stores all
his books, records, and communications, and which
is mechanized so that it may be consulted with
exceeding speed and flexibility”
 Full-text search, text & audio annotations, and
hyperlinks
Even more capture

Telephone calls, more video, all web pages
visited, keyboard and mouse usage logging,
radio, TV…
2003 - SenseCam
Feb 2005
Epiphany!
Memex is a database
&
personal TP system
Demo Clips & Screens
747 Screen…
Vue de jour
Timeline
Pivoting:
contact>
call>
t>
web page
GPS Photo location
Reports
The Stew family tree
Copyright Mark Stewart, 2004
Vibe report
Quindi Meeting Capture
SenseCam
SenseCam around Cambridge
MyLifeBits Software
Everything goes in a database

MyLIfeBits need all the features of a database
(Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup,
Replication)



If we didn’t use one, we’ll eventually create one!
Files as blobs; sync with file system for legacy apps
We are part of Jim Gray’s Bay Area Research Lab
SQL
MyLifeBits Software
Room
Capture
GPS import &
Map display
SenseCam
Import files
VIBE
logging
MyLifeBits
Shell
Text
annotation
tool
Voice
annotation
tool
Screen saver
MyLifeBits
store
Radio
capture
& EPG
Internet
Browser
tool
Legacy
applications
database
IM capture
MAPI
interface
files
PocketPC
transfer
tool
Outlook
interface
TV capture
tool
PocketRadio
player
Telephone
capture tool
TV EPG
download tool
Legacy
email client
Common ground with WinFS:
Items, Links & Meta-data
Outlook_CalendarItems2
PK,FK1
item_id
Subject
Start
End
Description
Location
Creation Time
Modified
Photo of Event
TAPI_PhoneCalls2
PK,FK1,I1
I1
Caller in Phone Call
IMG_Images2
PK,FK1,I1,I2,I3
Annotates
I1
I2
I3
item_id
Width
Height
Date Taken
Camera Make
Camera Model
Latitude
Longitude
Elevation
item_id
Phone
Call Type
CID
CID Name
CID #
Begin
End
Seconds
Connected
Ended
Roaming
Trimmed
Recorded
Transcript
PhotoFinder - Shneiderman and Kang
The Shape & Size of
Gordon’s LifeBits
.PPT, 1815
Tiff, 2832
.pdf, 3527
.xls, 1455
Video, 1303
MyLifeBits
10/31/2005
Audio, 5083
Doc&Rtf,
13764
Pictures,
43812
Web pages,
70918
eMail, 97271
242K items
110 GB
by number
of Items.
eMail, 343
Doc&Rtf, 1198
mny, 134
NULL, 127
PPT, 4637
MyLifeBits
10/31/05
PDF 5027
Web pages
5791
Tiff 8078
Size (MB) by Type
Pictures, 8998
Video, 62735
Audio, 12502
110 GB
242 K items
By Size (GB)
Bell Growth:
1GB/month
=1.1 TB/lifetime
10000
15,000 photos
1000
100
Year
1997
1999
2001
2002
2003
2005
Mpix
.25
1
2
3
4
5
Manufacturer
Ricoh
Kodak
Canon
Sony
Sony
Panasonic
10
1
1895
1905
1915
1925
1935
1945
1955
1965
1975
1985
1995
2005
Monthly & Lifetime Storage Use
Item
1 MB Books|reports
5KB Emails
0.1
100
Total*
MB|GB
Month|Life
3
13
100 KB Image scans
0.4 MB Photos
75 KB Web pages|docs
100 MB Music
5
10
100
0.1
13
100
188
250
40,000
1,000
1,000
4
1,250
200,000
1 KB/s Listened audio, speech
50 KB Daily photos
2 GB/hr TV
Daily
number
Observations about use(rs)
1.
2.
3.
4.
5.
6.
7.
Cell phone sized device (CPSD) will be the platform!
On Applications… think about CPSD as the platform and context

Search is the “killer app” pretty much as Bush described.

Screen savers “memory refreshers” also provide ambience

Where did my day to?
Users are unwilling to spend time managing their computers or data.

Meta-data, classification, etc. must be automatic

User-input meta-data e.g. Dublin Core – naïve’ Librarian’s dream.

We have nice scheme for classification using facets. It requires work.
Time is the most important meta-data. Photos: place (GPS), subject.
Folders are a good and bad idea.

Most users don’t know what they are or how they work

If used, over time, they become useless: too many, miss-file, etc.
User should put “every” information fragment into the system. e.g., to dos,
call backs, business cards numbers, attention events. It pays.
Same information in multiple places always becomes obsolete.
Capturing Everything:
Phone calls in context of cell phone as a
platform for communication and capture
 Formal Meetings
 Rooms
 Everything in daily life
 Personal health and medical monitoring
 Memex for scientists and engineers

BodyMedia Output
Polysomnogram for sleep apnea.
Real time health monitoring
Microsoft Research SensCam II
Sensors:
VGA camera w/ wide-angle lens
 light level in R,G,B and white
 ambient temperature
 passive infrared for person detection
 accelerometers
 three, programmable buttons, LEDs, sounder
 audio level & audio recording
 USB 2 and SD memory. 1-2 K photos/day
 Not GPS

SenseCam University Grant Program
MSFT supplies money, software, SenseCams





Memex vision: Notebook for engineers & scientists
Medical & health: observations & memory recall,
including diet and exercise
Education: How do people learn?
Help me learn/remember!
Tourist e.g. museum experience
Plumbing


Security
Filtering many images, voice & location annotation
More real time experience capture
Real time medical & health monitoring
 MIT. Deb Roy home capture to understan
how his children learn
 U. of Tokyo. Ubiquitous home
 Columbia U. Voice & sound record & profile
 MIT. iDat. Electronic lab that records
everything into your notebook

Experience
Retrieval in a Ubiquitious Home
Experience Retrieval in a Ubiquitous Home
(chamds, byon, yamasaki, aizawa)@hal.k.u-tokyo.ac.jp
MIT iDAT Project aka notebook
Samsung challenge

Going beyond plain old photography and videography




Print, view, and file in scrapbook or shoebox
Digitized bits offers worldwide sharing and easy sharing
Screensaver is useful, but is it a killer app?
The cell phone sized device (CPSD)… one device




Next generation platform
Phones and messaging e.g. sms, mail, web, iM, blogging
Audio, photo, video record and viewing (incl. broadcast)
Within 5 years and with supplemental devices, will take on
the PC
Capture, storage, retrieval, and display
Challenge putting them together

Capture ….




Storage




Cell phone sized devices (CPSD). The “killer app”!!
Consumer… photo, video, audio… experience
Professional
Capture
Archival
Retrieval = f(use). Archive… ambience
Display



Personal: Cell phone
PC
Wall
www.MyLifeBits.com
BONUS SLIDES
Challenges

Data-types




Quantity expanding i.e. info explosion
New capabilities e.g. real time create new data-types
Meta-data to increase value & provide pivots
Going beyond a PC to a distributed environment





Network environment, including media center
Into the cloud. Especially important for social aspects
Periphery… smart buildings, objects,
Backup, migration, and caching for beyond a Terabyte
Expanding network: PC > LANs > web > p2p(eer)

Schema sharing among disparate systems

CARPE (real time data capture)
 Rooms, phone calls, SenseCam, Health transducers, etc.

Security, privacy, forgetfulness, deniability, etc.
More challenges








Dear Appy: Monitoring and automatic migration of files that
are unlikely to be understood on future platforms as well as
platform migration.
Get What I Need: GWIN…Endless, but evolutionary
improvements in search: misspellings, stemming synonyms
Endless frontier of schema and extensions to them for new
applications e.g. making org charts, family relationships.
CARPE… a whole new game!
Versioning is essential
Scaling.. We don’t know what happens at a Terabyte
What can, should be, or will be in the cloud? Books… videos
Will we be allowed to use such systems? Copyright laws
vary: E.g. ripping CDs, copy of anything, photos,
conversations
The “dear appy” problem
Dear Appy,
How committed are you?
Please come back to me.
Forever yours truly,
Lost and forgotten data

Who’s responsible?
Media or 8 track cassette, 8” floppy
Evolving platform, file, and database
Evolving, incompatible standards & formats for
legacy data that disregard ancestors
Evolving and/or disappearing apps
Is Cyberspace a safe store?
Don’t your physical records
e.g. paper last forever?
What about information on
your CDs, tapes, hard drives,
solid state devices?
Automatic classification problem


XML on bills and imported content… transactions
We need to download classifications rather than
build them




Definitions & synonyms should help find what I want
Today it is too expensive to manually classify
scanned paper. E.g. “right time” meta-data is critical!
We hope “the system” can classify papers and other
documents e.g. bills. Ideally, build Dublin Core
In 10 years we need all documents to appear
electronically & classified with a little help from me