Transcript - EdShare

Transaction Models
COMP3017 Advanced Databases
Dr Nicholas Gibbins – [email protected]
2012-2013
Overview
• Data Processing Styles
• Flat Transactions
• Context and Savepoints
• Chained Transactions
• Nested Transactions
• Distributed Transactions
• Multi-level Transactions
You should already know about:
• Transactions
• Schedules
• Serial Schedules
• (Conflict-)Serializable Schedules
• ACID
• Two Phase Locking (2PL)
• Timestamp Ordering
Data
Processing
Styles
Processing Styles
• Batch processing
• Time sharing
• Real time processing
• Client-server systems
• Transaction-oriented processing
Transaction
Models
Data
Duration
Guarantees of
ReliConsistability
ency
Work
Pattern
No of
Work
Sources
Batch
Private
Long
Normal
None
Regular
10
Time
Sharing
Private
Long
Normal
None
Regular
10
RTP
Private
Very
Short
Very
High
None
Random
10
ClientServer
Shared
Long
Normal
None
Random
10
Trans
Proc
Shared
Short
Very
High
ACID
Random
10
1
2
3
2
5
Performance
Criteria
Availability
Throughput
Normal
Response
Time
Normal
Response
Time
High
Throughput
High
Throughput
& Resp Time
High
Transaction Oriented Processing
• Sharing access to data
• Some batch transactions
• Variable requests
• Many terminals
• Repetitive workload
• High availability
• Mostly simple functions
• System takes care of recovery
• Automatic load balancing
Flat
Transactions
Flat Transactions
• The simplest type of transaction
• Basic building block for other models
• Only one layer of control by the application - hence "Flat”
– Begin Work
– Commit Work or Abort Work
• "All or nothing" processing
Outcomes of Flat Transactions
1
BEGIN WORK
Operation 1
Operation 2
...
Operation k
COMMIT WORK
2
BEGIN WORK
Operation 1
Operation 2
...
(I can't go on!)
ROLLBACK WORK
3
BEGIN WORK
Operation 1
Operation 2
...
!
Rollback due to
external cause
Successful
completion
Application
requests
termination
Forced
termination
Example
BEGIN WORK
Apply delta to account balance
Branch id
Teller id
Account id
Delta
Read new value of balance
Apply delta to teller balance
Account
DB
Teller
DB
Apply delta to branch balance
Branch
DB
New balance
Write everything to history file
Send output
COMMIT WORK
History
File
Example
BEGIN WORK
Branch id
Teller id
Account id
Delta
Perform database updates
If
Withdrawal and account now overdrawn
ROLLBACK WORK
New balance
Else
Send output
COMMIT WORK
Limitations
• Simple control structure cannot model more complex
applications
– Trip planning
– Bulk updates
• Does not allow much co-operation
• May be too strict
• Databases used in advanced applications
– CAD/CAM
– Office automation
– Software development
Limitations
Long-lived transactions present particular challenges:
• More likely to be interrupted
– Rollback not acceptable
• May access many data items
– Cannot hold locks indefinitely
– Others need the data
– Will lead to deadlocks
Requirements
Controlling computations in a distributed multi-user
environment primarily means:
– Containing the effects of arbitrary operations as long as there might
be a necessity to revoke them
– Monitoring the dependencies of operations on each other in order to
be able to trace the execution history in case faulty data are found at
some point
Context and
Savepoints
Transaction Processing Context
For a simple program
output_message = f {input_message}
More typically however,
f {input_message, context} -> {output_message, context}
where context might be a cursor position
Transaction Processing Context
It is normally the private context which matters:
– Transaction
– Program
– Terminal
– User
Context Examples
Counters
– How often the program was called
User-related data
– Last screen sent
– Last order number worked on
Information about durable storage
– Last record read
– Most recently modified tuple
Transaction
Models
Transaction Processing Context
Transaction
Models
• Preservation of context is needed for long-lived
transactions
• One way is for the application to maintain the context as a
database record
Flat Transactions and Savepoints
BEGIN WORK
(=SAVE WORK:1)
Action
Work 'covered' by Savepoint 2
Action
SAVE WORK:2
Action
Action
Action
Action
SAVE WORK:3
Action
ROLLBACK WORK(2)
SAVE WORK:4
Action
Action
ROLLBACK WORK(4)
Persistent Savepoints
• Following a system crash, restart in-flight transactions
from their most recent savepoints
• Needs whole context preserved
– System variables
– Program variables
• Programming languages have limitations
– A persistent programming language would be needed
Chained
Transactions
Chained Transactions
BEGIN WORK
CHAIN WORK
- Commit results
- Keep [some] locks
- Keep cursors
- Continue processing
- No-one else can
access kept objects
- Application cannot
rollback to before
'Chain Work'
Savepoints versus Chained Transactions
Both allow substructure to be imposed on a long-running
application program
– Database context is preserved
– Cursors are kept
Commit vs Savepoint
– Chained - rollback only to previous ‘savepoint’
– Savepoints - can rollback to arbitrary savepoint
Locks
– Chained frees unwanted locks
Savepoints versus Chained Transactions
Work lost
– Savepoints more flexible than flat transactions, as long as the system
does not crash
Restart
– Chained can restart from most recent commit, as long as no
processing context hidden in local programming variables
Both organise work into a sequence of actions
Nested
Transactions
Nested Transactions
Top-level transaction
Subtransactions
Tk
BEGIN WORK
Tk1
Subtransactions
Tk11
BEGIN WORK
BEGIN WORK
Invoke subtran
Invoke subtran
Tk12
Invoke subtran
COMMIT WORK
Tk2
Invoke subtran
COMMIT WORK
BEGIN WORK
Invoke subtran
COMMIT WORK
BEGIN WORK
COMMIT WORK
Tk3
Invoke subtran
BEGIN WORK
Invoke subtran
COMMIT WORK
COMMIT WORK
Tk31
BEGIN WORK
COMMIT WORK
Three Rules for Nested Transactions
• Commit Rule
• Rollback Rule
• Visibility Rule
Commit Rule
The commit of a subtransaction makes the results accessible
only to the parent
The final commit happens only when all ancestors finally
commit
Rollback Rule
If any [sub]transaction rolls back, all of its subtransactions
roll back
Visibility Rule
Changes made by a subtransaction are visible to its parent
Objects held by a parent can be made accessible to
subtransactions
Changes made by a subtransaction are not visible to its
siblings
Observations
Subtransactions are not fully equivalent to flat transactions.
They are
– Atomic
– Consistency preserving
– Isolated
– Not durable, because of the commit rule
Observations
Nesting and program modularisation complement each
other
– Well designed module has a clean interface, and no global variables
– If it touches the database, the database is a large global variable
– If the module is protected as a subtransaction, then
database changes are kept clean too
Nested transactions permit intra-transaction parallelism
Emulation
Emulating Nesting with Savepoints
Tk
Tk1
Tk11
S1
S11
Tk12
S12
Tk2
S2
S3
Tk3
Tk31
S31
Commit
Observations
Using savepoints is more flexible than nested for internal
recovery
– Can roll back further
True nested is needed in order to run subtransactions in
parallel (Intra-transaction parallelism)
– Emulating with Savepoints needs 'subtransactions' to be run in strict
sequence
True nested can pass locks selectively
– More flexible than Savepoints
– “Similar but different”
Distributed
Transactions
Distributed Transactions
A flat transaction that has to visit several nodes to collect
data
– Differs from nested; only one level
– If a subtransaction, which is just a slice of the main transaction,
commits or rolls back, the whole transaction does the same
Multi-Level
Transactions
Multi-Level Transactions
• A generalized and more liberal version of Nested
transactions
• Multi-level transactions have the ability to
pre-commit the results of a subtransaction
– Therefore cannot do unilateral backout of updates
• A 'Compensating Transaction' is needed to reverse effects,
if necessary
• Scheme of layering object implementation ensures ACIDity
at the root level
Compensating Transactions
• The compensating transaction needs careful thought!
• It must be available from the point of subtransaction
commit
• It must not itself abort
– Needs provision to survive system failure
– Needs provision to survive internal error
(eg SQL call fails)
• If it is not needed, the compensating transaction must be
disposed of when the application finally completes
Example
Transaction
SELECT ...
INSERT ...
Insert Tuple
Update
page
Insert address
table entry
UPDATE ...
SELECT ...
Insert B-tree Entry
Locate
position
Insert
entry
Layering
• Abstraction hierarchy
– The entire system consists of a strict hierarchy of objects with their
associated operations
• Layered abstraction
– The objects of layer 'n' are completely implemented by using
operations of layer 'n-1'
• Discipline
– There are no shortcuts that allow layer 'n' to access objects on a layer
other than 'n-1'
• Other models use early commit
– Open Nested, Sagas, Engineering transactions ...
Exercise
The Problem
How can we update one million bank
accounts as one transaction?
Flat Transaction
Update all accounts using a single flat transaction:
• Operation performed exactly once
• Unacceptable price for restart
Chained Transactions
Decompose into sequence of 1 million single (chained)
transactions:
• Performance overhead
• Only last transaction rolled back if failure
• No longer atomic overall
• No restart context preserved
• No state of chain overall preserved
Locking
• Flat transaction - one ACID unit which locks all records for
the duration
• Chained transaction - one record locked at a time, so other
updates can occur during the run
Recovery
• Recover by reading the log?
– Not standard application code
– Should be a better way
Mini-Batch
• Execute a sequence of transactions
– Each updates a small number of records (a mini-batch)
• Atomicity of whole task maintained by application by
keeping context data in the database
• Could be achieved using a crash-resistant chained
transaction, with maintenance of the state of the chain
made explicit
• This would not be strictly atomic
– Each transaction is atomic
– If there is a failure, the whole task is not rolled back. It restarts with
the most recent mini-batch and carries on
Long-Lived Transaction Requirements
• Minimize work lost
– Split up to control the amount of work lost
• Recoverable computation
– There must be ways to temporarily stop the computation without
having to commit the results and without causing rollback because of
shutdown
• Explicit control flow
– System must be able to control the sequence
– Possible to proceed along prespecified path, or to remove the effects
of what has happened so far
Summary
Summary
Why Database Transaction Models?
– To provide a framework for straightforward programming
– Simple models have been enormously successful
– Some further models may do the same