Transaction Management

Download Report

Transcript Transaction Management

Transaction Management
An Overview
Copyright © 2003 – 2013 by Curt Hill
Transaction
• Any one execution of a user
program
– A sequence of SQL statements
– A program that accesses the DBMS to
accomplish a similar action
• Mostly interested in transactions
that change the database
– An interaction with queries is also
interesting
• A transaction is the unit of interest
for concurrent execution and
recovery Copyright © 2003 – 2013 by Curt Hill
Examples
• Buying a product
– Finding how many there are
– Removing the sold ones from stock
– Problem is two requests that appear
simultaneously and cannot both be
satisfied
• Complicated queries
– Any query where a single table appears
more than once in the From
– The multiply referenced tables should
be the same even with concurrent
updates Copyright © 2003 – 2013 by Curt Hill
The acronym ACID
• Atomic
– Transaction perceived to be indivisible
• Consistent
– Transforms database from one consistent
state to another
• Isolated
– Understandable without regards to any
other agents or transactions
• Durable
– Once committed permanence is
guaranteed even with system crashes
Copyright © 2003 – 2013 by Curt Hill
Atomic
• Either all the actions are applied or
none of them are applied
• No incomplete actions are allowed
• Since a transaction is made up of
many smaller actions:
– Some of these may be done before a
problem occurs
– The DBMS must be able to undo any of
the smaller pieces of a transaction if
the entire transaction is aborted
Copyright © 2003 – 2013 by Curt Hill
Errors
• What causes a transaction to fail?
• Expected problem
– The withdraw amount exceeds the ATM
machine or source account
– The transaction aborts itself
• Transient problem
– For reasons seen later the DBMS
aborts the transaction and restarts it
later
• System error
– Disk failure, power failure among
Copyright © 2003 – 2013 by Curt Hill
others
Durability
• A transaction may either commit or
rollback
• Committed transactions must be durable
• Rolled back transaction must be
completely undone
– As if they were never executed at all
– Queries are no problem, updates are
• Once a transaction executes a commit or
rollback this should be accomplished
even if the system crashes
Copyright © 2003 – 2013 by Curt Hill
Consistency
• Two domains:
– A database is consistent if all
constraints are met
– A database is consistent if it correctly
models some real-world situation
• The first is met by the normal
checking of the DBMS
• The second is the responsibility of
the transaction
Copyright © 2003 – 2013 by Curt Hill
Consistency example
• Transferring money from one
account to another
– Removing the money from one account
leaves the database inconsistent in the
second sense
– This is corrected when the money is
added to the second account
• Both actions must be in the same
transaction
Copyright © 2003 – 2013 by Curt Hill
Isolation
• Interleaving of the actions within
several different actions may occur
• The DBMS must guarantee that
result will be the same as if they
were completely serialized
• This interleaving is from concurrent
execution
• Without interleaved execution,
performance will be poor
Copyright © 2003 – 2013 by Curt Hill
Transaction Schedules
• A transaction is a list of actions
• The actions include:
– Reads and writes of tuples
– Commit or rollback commands
– Others: arithmetic, comparisons, etc
• Two lists may only interact through
the reads and writes
• A transaction schedule is a ordering
all of the actions from a group of
transactions
Copyright © 2003 – 2013 by Curt Hill
Schedules
• A schedule interleaves the actions
of several transactions
• A complete schedule has all the
actions of all the transactions
• A serial schedule removes
interleaving
• Isolation dictates that the serial
schedule ends with the same result
as an interleaved schedule
Copyright © 2003 – 2013 by Curt Hill
Concurrency
• Can we afford a serial schedule?
– One transaction completely finished
before the next one is begun
• No
– The impact on performance is too large
• Instead
– We have to handle the transactions
concurrently
• The issues are discussed in the
concurrency presentation
Copyright © 2003 – 2013 by Curt Hill
Serializability
• A serializable schedule has an equivalent
effect to a serial schedule
• The serializable schedule allows
interleaved operations, while the serial
schedule does not
• The idea is that we get concurrent
execution and consistent results
• Consider some examples using two
transactions
Copyright © 2003 – 2013 by Curt Hill
Notes on Examples
• We have two transactions, T1 and T2
• Each does two reads and writes
• These will be denoted by R(A) and
W(A)
– Read A (a page)
– Write A (a page)
Copyright © 2003 – 2013 by Curt Hill
Example 1 – No
Commonality
T1
T2
R(A)
W(A)
R(X)
W(X)
R(B)
R(Y)
W(Y)
W(B)
Copyright © 2003 – 2013 by Curt Hill
No Commonality
• Easy example
• Since there are no pages in common
• Any interleaving works
– Provided the order is maintained on
each side
• Painless and free concurrency
• Somewhat more difficult if the Write
is an insert in a B+Tree index which
causes splits
– Then the pages could be different and
still interfere
Copyright © 2003 – 2013 by Curt Hill
Example 2 – Commonality
T1
T2
R(X)
W(X)
R(X)
W(X)
R(Y)
W(Y)
R(Y)
W(Y)
Copyright © 2003 – 2013 by Curt Hill
Commonality
• This serializes the same as T1
completed and then T2
• The order of this could change and
give a different serial
• Such as the following
Copyright © 2003 – 2013 by Curt Hill
Example 2 – Again
T1
T2
R(X)
W(X)
R(X)
W(X)
R(Y)
W(Y)
R(Y)
W(Y)
Copyright © 2003 – 2013 by Curt Hill
Again
• This serializes the same as T2
completed and then T1
• Either of these are acceptable
• All we have to do is guarantee that
our interleaved schedule is
equivalent to some serial schedule,
not a particular serial schedule
• We do not care which customer gets
the product as long as game is fair
Copyright © 2003 – 2013 by Curt Hill
Non-Serializable Schedule
T1
R(X)
T2
R(X)
W(X)
R(Y)
W(X)
R(Y)
W(Y)
W(Y)
Copyright © 2003 – 2013 by Curt Hill
No Equivalent Serial
Schedule
• T2’s write of X is lost
• T1’s write of Y is also lost
• This assumes that the entire page is
given to or received from buffer pool
• There are many other interleavings
that also lose something
• This lost update is one of several
concurrent execution anomalies
Copyright © 2003 – 2013 by Curt Hill
Interleaved Execution
Anomalies
• The above is similar to updating a
shared variable in memory
• Databases allow both commit and
rollback operations which cause
further problems
• These are unlike the shared memory
problem
Copyright © 2003 – 2013 by Curt Hill
Four combinations
• RR
– Two separate transactions doing a
Read
– This is only always harmless case
• RW
– T1 wants to Read and T2 write page
• WR
– T1 wants to Write and T2 read page
• WW
– Both want to write same page
– This was first example seen
Copyright © 2003 – 2013 by Curt Hill
Reading Uncommitted Data
• Uncommitted data is any data
modified by a transaction
• The modification may be in the past
or future
• Reading uncommitted data is called
a dirty read
Copyright © 2003 – 2013 by Curt Hill
An Aborted Transaction
Problem
T1
T2
R(X)
W(X)
R(X)
Rollback
T1
R(X)
W(X)
T2
R(X)
W(X)
Rollback
Copyright © 2003 – 2013 by Curt Hill
Rollback Problems
• Notice in previous that T1 was a query,
not an update that had problems
• This is an example of a WR problem
• We do not need a rollback to cause the
problem
• Consider the next situation where each
record has $10,000
– T1 increases this by 10% and T2 increase by
$1000
Copyright © 2003 – 2013 by Curt Hill
A WR Problem
T1
T2
R(X)
W(X)
R(X)
W(X)
R(Y)
W(Y)
Commit
R(Y)
W(Y)
Commit
Copyright © 2003 – 2013 by Curt Hill
Results of the WR anomaly
• The X record gets increased by 2000
– 2000 = 10000*10% + 1000
• The Y record gets increased by 2100
– 2100 = (10000+1000)*10%
• Not consistent or serializable
• T2’s read of X and T1’s read of Y are
dirty reads
Copyright © 2003 – 2013 by Curt Hill
RW Problems
• An RW problem can result in an
unrepeatable read
• Two reads in a row that yield
different results
Copyright © 2003 – 2013 by Curt Hill
An Unrepeatable Read
T1
T2
R(X)
R(X)
W(X)
Commit
R(X)
W(X)
Commit
Copyright © 2003 – 2013 by Curt Hill
What’s the fix?
• Lock an object in one of several
ways to prevent this type of
interleaving
• The most common protocol is Strict
Two Phase Locking
• AKA Strict 2PL
• There are other protocols as well
Copyright © 2003 – 2013 by Curt Hill
Types of Locks
• Shared lock
– Used for reading
• Exclusive lock
– Used for writing but also allows reading
• The lock may be on a:
–
–
–
–
Tuple
Page
Relation
This is its granularity
• The larger the granularity the less
concurrency and easier to manage
Copyright © 2003 – 2013 by Curt Hill
Strict 2PL Rules
• A transaction requests a lock when it
desires to access the object
– Shared lock for reads and exclusive for writes
• It holds all locks until complete
– Either a commit or rollback
• These locks are inserted by the DBMS,
not necessarily observed in the
transaction
• A transaction that cannot get the lock is
suspended until the item is available
• If both reading and writing is desired get
the exclusive lock
Copyright © 2003 – 2013 by Curt Hill
Some examples revisited
•
•
•
•
Two new commands are added
S(A) gets a shared lock on A
X(A) gets an exclusive lock on A
These locks are held until a commit
or rollback
Copyright © 2003 – 2013 by Curt Hill
WW Problem with Locks
T1
X(X)
R(X)
W(X)
X(Y)
R(Y)
W(Y)
Commit
T2
X(X)
Suspended
until Commit
R(X)
W(X)
X(Y)
R(Y)
W(Y)
Commit
Copyright © 2003 – 2013 by Curt Hill
WR Problem with locks
T1
S(X)
Shared lock causes
suspension until
rollback
T2
X(X)
R(X)
W(X)
Rollback
R(X)
Copyright © 2003 – 2013 by Curt Hill
Another WR
T1
X(X)
R(X)
W(X)
R(Y)
W(Y)
Commit
T2
X(X)
Exclusive lock suspended
until commit occurs
R(X)
W(X)
R(Y)
W(Y)
Commit
Copyright © 2003 – 2013 by Curt Hill
DeadLock
T1
X(X)
R(X)
W(X)
X(Y)
Suspended since T2 has Y
T2
X(Y)
R(Y)
W(Y)
X(X)
Suspended since T1 has X
Both are now deadlocked
Copyright © 2003 – 2013 by Curt Hill
What do we do?
• The usual solution is timers
• When a transactions suspension for
a lock exceeds a threshold value
– Abort it
– Rollback all actions
– Restart the whole transaction
Copyright © 2003 – 2013 by Curt Hill
Performance
• What does locking do to the
performance of a DBMS?
• It must slow the DBMS
• This is better than incorrect results
• It will slow serializable schedules
that otherwise had no problems
– This disregards lock bookkeeping
• Consider a previous example
Copyright © 2003 – 2013 by Curt Hill
A Serializable Schedule
T1
T2
R(X)
W(X)
R(Y)
W(Y)
Commit
R(X)
W(X)
R(Y)
W(Y)
Commit
Copyright © 2003 – 2013 by Curt Hill
Commentary
• This schedule was equivalent to the
serial schedule of T1 followed by T2
• It also had nice concurrency
• Introducing locking preserves
correctness but destroys
concurrency
Copyright © 2003 – 2013 by Curt Hill
A Serialed Schedule
T1
X(X)
R(X)
W(X)
X(Y)
R(Y)
W(Y)
Commit
T2
X(X)
Lock prevents any action
until T1’s commit
R(X)
W(X)
X(Y)
R(Y)
W(Y)
Commit
Copyright © 2003 – 2013 by Curt Hill
SQL
• The transaction statements of SQL
are:
– Commit
– Rollback
– Begin
• Not needed for simple queries
• Commit is the default for changes if
neither is given
Copyright © 2003 – 2013 by Curt Hill
Begin
• Begin is used to show boundaries
• Form is:
BEGIN;
or
BEGIN TRANSACTION
• The transaction is then terminated
by the Commit or Rollback
Copyright © 2003 – 2013 by Curt Hill
Example
• BEGIN;
Insert Into course VALUES
('PHYS', 141, 4, 'College Physics')
Insert Into course VALUES
(‘CSCI', 242, 3, ‘Data Structures')
COMMIT
• The more common usage is through a
program which determines whether to
commit or rollback
Copyright © 2003 – 2013 by Curt Hill
Save Points
• A save point is a named point to
complete the rollback
• Example:
SAVEPOINT xyz
…
ROLLBACK to SAVEPOINT xyz
• This prevents rolling back to BEGIN
• This was introduced in SQL 1999
Copyright © 2003 – 2013 by Curt Hill
Granularity
• The size of the thing to lock
– Tuple
– Page
– Relation
• Tradeoff
– Locking large things impedes
concurrency
– Locking small things requires more
overhead and allows bad things to
happen
Copyright © 2003 – 2013 by Curt Hill
Phantom Read Problem
• Two reads with different results
• T1 queries table and finds all rows with
certain criteria
• Does exclusive lock on all of these
• T2 inserts a new record which matches
the criteria
• T1 now re-queries the criteria and gets a
different set
• Putting a shared lock on whole table
solves problem but limits concurrency
Copyright © 2003 – 2013 by Curt Hill
Recovery
• The Recovery Manager is responsible for
several important issues
• Removing rollbacks from the file system
• Guaranteeing that the commit changes
are preserved in the file system
• Defending against crashes and rebuilding
the database when one occurs
• Recovery management is covered in a
subsequent presentation
Copyright © 2003 – 2013 by Curt Hill
Closing Thoughts
• ACID is the ideal of database
reliabilty
• This is not always possible in
distributed databases
– CAP theorem precluded
– There we settle for BASE
Copyright © 2003 – 2013 by Curt Hill