Transaction Processing Concepts - Department of Computer Science

Download Report

Transcript Transaction Processing Concepts - Department of Computer Science

Transaction Processing Concepts
Prof. Sin-Min Lee
Department of Mathematics and
Computer Science
Introduction to Transaction
Processing
•
•
•
•
•
Single-User Vs. Multi-User Systems
Read and Write Operations of a Transaction
Problems in Concurrency Operations
Types of Transaction Failures
Serializability of Schedules
Single-User Vs. Multi-User System
• A DBMS is single-user if at most one user at a
time can use the system
• A DBMS is multi-user if many user can use the
system concurrently
• In a multi DBMS, the stored data items are the
primary resources that may be accessed
concurrently by user programs, which are
constantly retrieving information from and
modifying the database. The execution of a
program that accesses or change the contents of
the database is called a transaction.
Objectives
• Function and importance of transactions.
• Properties of transactions.
• Concurrency Control
– Meaning of serializability.
– How locking can ensure serializability.
– Deadlock and how it can be resolved.
– How timestamping can ensure serializability.
– Optimistic concurrency control.
– Granularity of locking.
• Recovery Control
–
–
–
–
Some causes of database failure.
Purpose of transaction log file.
Purpose of checkpointing.
How to recover following database failure.
• Alternative models
transactions.
for
long
duration
Transaction Support
Transaction
Action, or series of actions, carried out by user or
application, which accesses or changes contents of
database.
• Logical unit of work on the database.
• Application program is series of transactions with
non-database processing in between.
• Transforms database from one consistent state to
another, although consistency may be violated
during transaction. (If the system crashes, each
transaction's changes are reflected in the
persistent database either entirely or not at all.)
Transaction Support
– A transaction is a sequence of one or more SQL
operations treated as a unit ( another working
definition).
– SQL standard (and Oracle):
– Transaction begins automatically when first SQL
command is issued.
– Transaction ends (and new one begins) when
"COMMIT" command is issued or session ends.
– Alternative "AUTO COMMIT" mode turns each
statement into a transaction.
Example Transaction
Transaction Example
• Book passenger Smith on Flight QF9 on
22/10/99 in seat A22
• 1.INSERT INTO PASSENGER VALUES
(‘Simth’,’98198333’,’VISA’,’4940123456781234’);
2. INSERT INTO SEAT VALUES
(‘QF9’,’22-10-99’,’A22’,’Smith’,’Vegetarian’);
3. INSERT FLIGHT SET seats_left = seats_left -1
WHERE flight_code = ‘QF9’ AND flight_data = ’2210-99’;
step 1 by itself is inconsistent
step 1&2 without step 3 would make seats_left
inconsistent
Step 1,2 &3 make sense together.
Transaction Support
• Can have one of two outcomes:
– Success - transaction commits and database reaches a
new consistent state.
– Failure - transaction aborts, and database must be
restored to consistent state before it started.
– Such a transaction is rolled back or undone.
• Committed transaction cannot be aborted.
• Aborted transaction that is rolled back can
be restarted later.
State Transition Diagram for
Transaction
Properties of Transactions
•Four basic (ACID) properties of a transaction are:
Atomicity , Consistency, Isolation and Durability
Atomicity Each transaction's operations are
executed all-or-nothing, never left "half
done."
•E.g., If system crashes before transaction commits, no
effects of transaction remain in database - transaction
can start over when system comes back up.
•E.g., If error or exception occurs during a transaction,
partial effects of the transaction are undone.
•Question: How is this guarantee achieved?
•Implemented by COMMIT and ROLLBACK
Properties of Transactions
Consistency Must transform database from
one consistent state to another.
Implemented
by
database
integrity
constraints.
Properties of Transactions
Isolation
Partial effects of incomplete
transactions should not be visible to other
transactions.
Implemented by locking
•Two transactions can be executed in perfect isolation if they
run serially
•However, running transactions strictly serially results in low
throughput and poor response time
•To improve efficiency, transactions can be interleaved. This
is called concurrency.
•The operations of one transaction can be freely interleaved
with another so long as they do not conflict( operate on the
same data)
•Transaction can run concurrently
Properties of Transactions
Durability Effects of a committed transaction
are permanent and must not be lost because
of later failure.
Implemented by transaction logging and
recovery
DBMS Transaction Subsystem
DBMS Transaction Subsystem
Concurrency Control
Process of managing simultaneous operations on
the database without having them interfere with
one another.( Centralized system with concurrent
access by several users)
• Prevents interference when two or more users are
accessing database simultaneously and at least
one is updating data.
• Although two transactions may be correct in
themselves, interleaving of operations may
produce an incorrect result.
Concurrency Example
• Database consisting of two items, X and Y
• Only criterion for correctness: X=Y
• Two concurrent transactions
T1: X
X+1
T2: X
2*X
Y
Y+1
Y
2*Y
Initially, X=10 and Y=10
Need for Concurrency Control
• Three examples of potential problems
caused by concurrency( transaction
conflict scenarios)
– Lost update problem (Dirty Write)
– Uncommitted dependency problem ( Dirty
Read)
– Inconsistent analysis problem ( Fuzzy Read).
Lost Update Problem
• Successfully
completed
update
is
overridden by another user.
• T1 withdrawing £10 from an account with
balx, initially £100.
• T2 depositing £100 into same account.
• Serially, final balance would be £190.
Lost Update Problem
• T1 overwrites data written by T2
• Loss of T2's update avoided by preventing T1
from reading balx until after update.
Uncommitted
Problem
Dependency
• Occurs when one transaction can see
intermediate results of another transaction
before it has committed.
• T4 updates balx to £200 but it aborts, so balx
should be back at original value of £100.
• T3 has read new value of balx (£200) and uses
value as basis of £10 reduction, giving a new
balance of £190, instead of £90.
Uncommitted
Problem
Dependency
• T3 reads sata changed but uncommitted by T4
• Problem avoided by preventing T3 from reading balx
until after T4 commits or aborts.
Inconsistent Analysis Problem
• Occurs when transaction reads several values
but second transaction updates some of them
during execution of first.
• Sometimes referred to as dirty read or
unrepeatable read.
• T6 is totaling balances of account x (£100),
account y (£50), and account z (£25).
• Meantime, T5 has transferred £10 from balx to
balz, so T6 now has wrong result (£10 too high).
Inconsistent Analysis Problem
• T5 modifies data over which T6 is aggregating
• Problem avoided by preventing T6 from
reading balx and balz until after T5 completed
updates.
Serializability
• Objective of a concurrency control protocol is to
schedule transactions in such a way as to avoid
any interference.
• Could run transactions serially, but this limits
degree of concurrency or parallelism in system.
• Serializability identifies those executions of
transactions guaranteed to ensure consistency.
Serializability
Schedule
– Sequence of reads/writes by set of concurrent
transactions.
– A sequence of operations from one or more
transactions
– In concurrent transactions, the operations are
interleaved
Schedule
Operations
– read(Q,q)
read the value of the database item Q and store in the
local variable q.
– write(Q,q)
write the value of the database item Q and store in the
local variable q.
– other operations such as arithmetic
– commit
– rollback
Example: A “Good” Schedule
• One possible schedule,
initially X = 10, Y=10
•Resulting Database: X=21,Y=21, X=Y
Example: A “Bad” Schedule
• Another possible schedule
•Resulting Database X=22,Y=21, X=Y
The Problem
• In the previous schedule, the result was
“incorrect.”
• What do we mean by correctness?
•
Definition of Correctness: The concurrent execution of a set of
transactions is said to be correct if it is “equivalent” to some serial
execution of the those transactions.
• Several notions of equivalence, e.g., result equivalent
• Serial means one after another
• For transactions T1 and T2 there are only two serial
schedules.
• T1, T2
• T2, T1
Serializability
– Serial Schedule
Schedule where operations of each transaction
are executed consecutively without any
interleaved operations from other transactions.
• No guarantee that results of all serial
executions of a given set of transactions
will be identical.
Nonserial Schedule
• Objective of serializability is to find
nonserial schedules that allow transactions
to execute concurrently without interfering
with one another.
• In other words, want to find nonserial
schedules that are equivalent to some serial
schedule. Such a schedule is called
serializable.
Serializable
• Definition: A schedule is said to be serializable if the result of
executing that schedule is the same as the result of executing some
serial schedule.
– A schedule, S, is serializable if S produces the same results as either
– T1, T2
– T2, T1
•
A simplifying assumption: Operations in a schedule are
–
–
–
–
read X
write X
commit
abort
Example
•
Consider a simple transaction that processes an
order
•
PRODUCT( prodno,qoh)
ORDER(ordno,prodno,qty)
Read qoh
SELECT qoh INTO oldqoh FROM PRODUCT WHERE
prodno =1;
Calculate new qoh
newqoh :=oldqoh – 10;
Update qoh
UPDATE STOCK SET qoh = newqoh WHERE prodno
=1;
INSERT order
INSERT INTO ORDERS VALUES (1234,1,10);
1.
2.
3.
4.
Example
• Consider two transactions TA and TB
running concurrently.
• Various schedules are possible for
interleaving the operations.
1 2 3 4 1 2 3 4 Serial (1)
1 2 3 1 4 2 3 4 Seriablizable(1)
1 2 3 1 2 4 3 4 Seriablizable(1)
1 2 3 4 1 2 3 4 Serial (2)
A Nonserial Example
• Example, with an initial
database satisfying X = Y
• S2 is not (conflict or
result) equivalent to a
serial execution of T3, T4
or T4, T3.
Summary of Basic Concepts
• Each transaction preserves database
consistency.
• The serial execution of a set of transactions
preserves database consistency.
• In a concurrent execution, steps of a set of
transactions may be interleaved.
• A (possibly concurrent) schedule is serializable
if it is equivalent to a serial schedule.
Determining Serializability
• Given a schedule S, how do we determine that it
is serializable?
• Use a slightly restricted definition: conflict
serializability.
• Two transactions Ti and Tj conflict if and only if
there exists some item X, accessed by both Ti
and Tj, and at least one of these transactions
wrote X.
• Intuitively, a conflict between two transactions
forces an execution order between them.
Serializability
• In serializability, ordering of read/writes is
important:
(a) If two transactions only read a data item, they
do not conflict and order is not important.
(b) If two transactions either read or write
completely separate data items, they do not
conflict and order is not important.
(c) If one transaction writes a data item and
another reads or writes same data item, order
of execution is important.
Possible Conflicts
•
Transaction
T1
Read
T2
Read
No
conflict
Read Write
Write
Read
operations
Write
Write
Conflicting
Conflict Equivalent Schedules
• Let I and J be consecutive instructions by two
different transactions within a schedule S.
– If I and J do not conflict, we can swap their order to
produce a new schedule S'.
– The instructions appear in the same order in S and S',
except for I and J, whose order does not matter.
• Definition: A schedule is conflict serializable if it
is conflict equivalent to a serial schedule.
– S and S' are termed conflict equivalent schedules.
Example of Conflict
Serializability
Another Example
• S5 is not serial
(obvious!) and is not
conflict serializable since
every pair of operations
conflicts, hence none can
be swapped.
• T2 and T3 have write
instructions without
read instructions,
termed blind writes.
Precedence Graph
• A directed graph depicting conflicts in a
schedule
• A assumption: No blind writes
• Given transactions T1,T2, …, Tn
Precedence Graph
• Create:
– node for each transaction;
– a directed edge Ti  Tj, if Tj reads the value of an
item written by TI;
– a directed edge Ti  Tj, if Tj writes a value into an
item after it has been read by Ti.
– Add and edge labelled X from Ti to Tj if Ti conflict on
X ( and is before ) Tj
• If precedence graph contains cycle schedule is
not conflict serializable.
Precedence Graph
• Example for schedule S1
• If precedence graph contains cycle schedule
is not conflict serializable.
Testing Serializability
• A schedule is ( conflict) serializable if its
precedence (conflict) graph is acyclic; that is,
• If precedence graph contains cycle schedule is
not conflict serializable.
• Need to guarantee serializability
 an inefficient strategy
• Generate a schedule
• Build conflict graph
• Test for cycle
 Impose protocol to generate only serializable
schedules
Example
Non-conflict
serializable schedule
• T9 is transferring £100 from one account
with balance balx to another account with
balance baly.
• T10 is increasing balance of these two
accounts by 10%.
• Precedence graph has a cycle and so is not
serializable.
Example - Non-conflict
serializable schedule
Another Example
Relationship Among
Schedules
Concurrency Control Techniques
• Two basic concurrency control techniques:
–
–
Locking
Timestamping
• Both are conservative approaches: delay
transactions in case they conflict with
other transactions.
• Optimistic methods assume conflict is rare
and only check for conflicts at commit.