Transcript ppt

My CIDR Epiphany:
Real World Data, Schema, and Environment
Michael Franklin
UC Berkeley
Post SIGMOD PC Research Symposium
(old persons track)
February 11, 2005
Michael Franklin, UC Berkeley
How it Happened or why it sometimes
pays to hang around until the end of a conference
•
•
•
•
The “gloom and doom” panel
DeWitt’s gong show challenge
Grappa consumption & staying up too late
A great last session on sensor/stream
processing, including:
– Jennifer Widom’s Trio Talk
– Shawn Jeffery’s HiFi Talk
– Sam Madden’s Probabilistic
Sensor Net Talk
Michael Franklin, UC Berkeley
The SIGMOD Credo
Codd made relations,
all else is the work of man.
Leopold Kronecker (paraphrased by Raghu Ramakrishnan)
Michael Franklin, UC Berkeley
Database Management: Then
Michael Franklin, UC Berkeley
Database Management: Now
Michael Franklin, UC Berkeley
RM has been tremendously successful,
but at a cost
• Shoehorn the world into regular, flat tables.
– This works particularly well for data that looks
like regular, flat tables.
• Ignore inconvenient facts about real world.
– Source of a multi-billion $/yr consulting industry.
• But, new applications, environments,
devices, user expectations, are finally
reaching a tipping point —
stretching the model beyond
its inherent capabilities.
Michael Franklin, UC Berkeley
Relational Model Assumptions:
Real World Data
All data in the database is 100% Valid
The facts in the database are self-consistent
Anything outside of the DB does not exist
Time and space are just regular attributes
Data items unambiguously map to real world
entities
Michael Franklin, UC Berkeley
RM Assumptions: Schema
All data conforms to a strict schema
These schemas and their relationship to the
data don't change much
Everyone agrees on the meaning of the data
No one cares where the data came from
Michael Franklin, UC Berkeley
RM Assumptions: Environment
Users know exactly what they want to ask of the
database
Users want absolute answers (no satisficing)
Queries can be independent of the user’s context
All data is always available
Michael Franklin, UC Berkeley
Bridging the Physical Divide
• We need to build systems that more
realistically model the real world (and all
its ambiguity)
• We need to build systems that support
users and conform to their goals,
requirements, and habits (not vice versa)
• This is going to require new data and
query models, and likely another 30 years
of work to get it right.
Michael Franklin, UC Berkeley
RM Assumption Cheat Sheet
(A baker’s dozen)
1)
2)
3)
4)
5)
All data in the database is 100% Valid
The facts in the database are self-consistent
Anything outside of the DB does not exist
Time and space are just regular attribute
Data items unambiguously map to real world entities
Real World
Data
6)
7)
All data conforms to a strict schema
These schemas and their relationship to the data
don't change much
Everyone agrees on the meaning of the data
No one cares where the data came from
Schema
8)
9)
10) Users know exactly what they want to ask of the
database
11) Users want absolute answers (no satisficing)
12) Queries can be independent of the user’s context
13) All data is always available
Michael Franklin, UC Berkeley
Environment