Transaction Research History and Challenges

Download Report

Transcript Transaction Research History and Challenges

Transaction Research
History and Challenges
Invited talk for session on
Systems Perspectives on Database Technology;
Achievements and Dreams Forgotten
@ ACM SIGMOD 2006, Chicago, Ill, 27 June 2006
Jim Gray
Microsoft
http://research.microsoft.com/~gray/talks/
Thanks to: Phil Bernstein, Surajit Chaudhuri,
Dave DeWitt, Rick Snodgrass, Gerhard Weikum
1
Databases Are State
•
•
•
•
DB is a collection of facts
Store the facts
Find the facts
Combine the facts to make new ones
2
Transactions are State Changes
• And of course these changes are state (facts)
my meta data is just your data
It’s all rock ‘n roll to me.
It’s turtles all the way down.
3
Transactions Have a LONG History
•
•
•
•
•
•
Years Before Present
Timescale
First clay tablets were transaction records 6000
General ledger – lots of technology there 1000
100
Punched cards
50
Batch (tape) transactions
25
Online (concurrency & durability issues)
0
And now…
What next?
• I believe it is back to clay tablets….
4
“Formal” Transaction Notion
Interesting History
• Formalization happened concurrently in
many groups: GE, IBM, MIT, Tokyo, …
• Many others saw it as useless
– Transactions don’t give THE right answer,
they just
give A answer.
• Heated debate among the “enlightened”
• “Winner” was wrong, but right at the time.
5
ACID came to define
Transaction
“elevator pitch”
•
•
•
•
Atomicity:
All or nothing
Consistency: Preserve application invariants
Isolation:
No concurrency surprises
Durability: No commitments lost
Lesson: Simple Story Matters.
It is IMPORTANT to get it right.
6
Red-Green Balls Example
What’s Wrong With This Picture?
Initial state:
A: Change all Red to Green
B: Change all Green to Red
A, B
A+B Not allowed.
Why? Not a one-at-a time schedule.
B, A
Even people who have worked on this for 40 years,
still puzzle about things like this.
The “answer” is subtle.
Probably there is no “answer”. We get to set the rules.
7
The Virtue of Transactions
• They are simple
– Convert complex errors into simple go / no-go
• Simplifies component composition
• Simplifies distributed system error handling
(especially useful in a “cluster”)
• Lampson:
– Transactions are “pixie dust” that
you sprinkle on your program to make it reliable.”
8
But…
Technology Clouded our/my Thinking
• Disk & RAM Storage was expensive
• Accesses were expensive
• So we discarded old values, Some kept old versions: Prime
did update-in-place
Codasyl, Oracle, Rdb,
but “garbage collected
• Makes it
– easy to find current state
– possible to find old state from log.
• But, many applications want data lineage
– databases don’t optimize for that.
• But… now storage is “free”.
9
Keep everything!
Restatement
It is a Mistake to Update Data
• Discards information!
• You should only ADD information
• Examples
– clay tablets,
– general ledger
– punched cards
– Batch processing Old-Master New-Master
10
Correct Solution
Temporal Databases
• No Update! No Delete!
• Only Insert and Read-@-Time
grouped into transactions
• Every item has time dimension(s)
– transaction time, valid time,…
• This is BETTER than clay tablets & punched cards
(they did not have valid time & transaction time)
• Same as general ledger
• References:
– Bernstein, Hadzilacos, Goodman “transaction” book.
– Snodgrass et. al., Temporal SQL
– Reed Atomic Actions thesis
11
Lesson
• Technology can warp your (my) thinking
• No update with clay tablets, cards, tape
• Disks allowed/encouraged update
(precious disk space).
• Now that disk space is free, …
I see the error of my ways
• But… at the time (1970..2005) it was “right”
– Systems that tried, “failed” (e.g. Postgress, TSQL)
• Real lesson:
Good ideas can go bad
Good ideas may have to wait12
What About Durability
• Discussion so far:
atomic-consistent-isolated (ACI) state change
• Durability always used replicas.
– Log replica is compact but “useless”
• Want object-replica
• Want security, query, …
• Log is just a technology for replicas.
– Replica technology has made huge progress.
– Problem: too many solutions .
• Durability requires geo-plex
Lots of Copies Keep Stuff Safe (LOCKSS)
• Challenge today is to simplify options.
13
What’s WRONG With
ACID Transactions?
• Transactions are an UN-availability feature
• Correctness/consistency…
Fight with “Do it now!!!”
(Lesson: “Do it now!!!” usually wins)
• Users hate to wait!
• Transactions are
– Good within an organization: I trust you!
– Bad across organizations: Can I depend on you?
14
Workflow – Still An Elusive Goal
• If X-is good, recursive-X is better
• What is the generalization of transaction?
If they are atomic, what are molecules?
How to compose them?
• Great!! progress on
Multi-level Transaction Model (Weikum-Vossen book)
• Limited progress on
– workflow
– parallelism within transactions…
15
Workflow Progress
• There are LOTS of workflow systems.
• What “concepts” have helped?
– Compensation model
– Simple metaphors (e.g. Sagas)
– Commit–Abort dependencies
16
Aside: The Software Crisis and
Transactional Memory
• Software systems are getting too complex.
• Try-catch fault handling model
– Huge advance
– Unworkable in complex systems.
• Multi-core and Many-core
force parallel programming
• So, Software is in crisis (as usual).
• Transactional memory (treat methods as subtransactions) simplifies error handling.
• Reminiscent of Randell’s Recovery Blocks and…
• Great progress in this space, challenging problems.
17
• PS: they definitely update in place.
Transaction Research Advice
for 2007…
• Think in terms of temporal databases
– Transactions of Insert and Read@time
• Simplify replication (as a path to Durability)
– LOCKSS is the key to durability
– Temporal model may make it easier
• Don’t give up on workflow
– It is too important.
– Non ACID workflow?
But, all my advice on it has been a dead end.
• Simpler programming model with Transactions?
– Cleaner & Simpler fault handling.
– Many-core parallelism?
18
19
The abstract I promised to talk about.
•
•
•
Database Operating Systems:
Storage & Transactions
Database systems now use most of the technologies the research community
developed over the last 3 decades:
Self-organizing data, non-procedural query processors, automatic-parallelism,
transactional storage and execution, self-tuning, and self-healing.
After a period of linear evolution, database concepts and systems are undergoing
rapid evolution and mutation -- entering a synthesis with programming languages,
with file systems, with networking, and with sensor networks. Files are being
unified with other types and becoming first-class objects. The transaction model
appears to be fundamental to the transactional memory needed to program multicore systems in parallel.
Workflow systems are now a reality. The long-heralded parallel database machine
idea of data-flow programming has begun to bear fruit. Each of these new
applications of our ideas raise new and challenging research questions.
Blue are undelivered promises
So it goes.
20