Transcript ppt
Introduction to Transaction
Processing Concepts and Theory
1
1 Introduction to Transaction Processing (1)
A transaction is a
sequence of operations whose execution
transforms a database from one consistent state
to another consistent state.
A logical unit of database processing that
includes one or more access operations (read retrieval, write - insert or update, delete).
Transaction boundaries:
Begin and End transaction.
Consistent state: the data currently in the database
satisfy all integrity constraints defined for the
database.
During transaction execution the database may be
inconsistent.
2
Introduction to Transaction Processing (2)
When the transaction is committed, the database
must be consistent.
Execution of transaction
Consistent
Database
State
Possible inconsistent state
Consistent
Database
State
Two main issues to deal with:
Failures of various kinds, such as hardware
failures and system crashes
Concurrent execution of multiple transactions
3
Introduction to Transaction Processing (3)
Consider a transaction that transfers $200 from
account A to account B.
read(A)
A = A - 200
write(A)
read(B)
B = B + 200
write(B)
System crash
4
Introduction to Transaction Processing (4)
Single-User System:
At most one user at a time can use the system.
Multiuser System:
Many users can access the system concurrently.
Concurrency
Interleaved processing:
Concurrent execution of processes is interleaved in
a single CPU
Parallel processing:
Processes are concurrently executed in multiple
CPUs.
5
Introduction to Transaction Processing (5)
A database is a collection of named data items.
The size of a data item is called granularity. It can be
a field, a record , or a whole disk block
Basic operations are read and write
read_item(X): Reads a database item named X into
a program variable. To simplify our notation, we
assume that the program variable is also named X.
write_item(X): Writes the value of program
variable X into the database item named X.
Basic unit of data transfer from the disk to the computer
main memory is one block.
6
Introduction to Transaction Processing (6)
In general, a data item (what is read or written) will be the
field of some record in the database, although it may be a
larger unit such as a record or even a whole block.
read_item(X) command includes the following steps:
Find the address of the disk block that contains item X.
Copy that disk block into a buffer in main memory (if that
disk block is not already in some main memory buffer).
Copy item X from the buffer to the program variable
named X.
7
Introduction to Transaction Processing (7)
write_item(X) command includes the following steps:
Find the address of the disk block that contains item X.
Copy that disk block into a buffer in main memory (if that
disk block is not already in some main memory buffer).
Copy item X from the program variable named X into its
correct location in the buffer.
Store the updated block from the buffer back to disk
(either immediately or at some later point in time).
8
Introduction to Transaction Processing (8)
FIGURE 17.2 Two sample transactions:
(a) Transaction T1(b) Transaction T2
9
Why Concurrency Control is needed
Multiple transactions are allowed to run concurrently.
Advantages are:
increased processor and disk utilization, leading
to better transaction throughput
one transaction can be using the CPU while another is
reading from or writing to the disk
throughput: # of transactions executed in a given
amount if time.
reduced average response time for transactions
short transactions need not wait behind long ones.
Concurrency control is needed to achieve isolation
i.e., to control the interaction among the concurrent
transactions in order to prevent them from destroying
the database consistency.
10
Why Concurrency Control is needed
Problems occur when concurrent transactions
execute in an uncontrolled manner:
The Lost Update
The Temporary Update (or Dirty Read)
The Incorrect Summary
Unrepeatable Read:
A transaction T1 may read a given value. If another
transaction later updates that value and T1 reads that
value again, then T1 will see a different value.
11
The Lost Update Problem
This occurs when two transactions that access the
same database items have their operations
interleaved in a way that makes the value of some
database item incorrect.
12
The Temporary Update Problem
(Dirty Read)
This occurs when one transaction updates a
database item and then the transaction fails for
some reason.
The updated item is accessed by another
transaction before it is changed back to its
original value.
13
Incorrect Summary Problem
If one transaction is calculating an aggregate
summary function on a number of records while
other transactions are updating some of these
records, the aggregate function may calculate some
values before they are updated and others after they
are updated.
14
Incorrect Summary Problem
15
Why Recovery is needed
What causes a Transaction to fail
1. A computer failure (system crash):
A hardware or software error occurs in the
computer system during transaction execution.
If the hardware crashes, the contents of the
computer’s internal memory may be lost.
2. A transaction or system error:
Some operation in the transaction may cause it
to fail, such as integer overflow or division by
zero. Transaction failure may also occur
because of erroneous parameter values or
because of a logical programming error. In
addition, the user may interrupt the transaction
during its execution.
16
Why Recovery is needed
3. Local errors or exception conditions detected by the
transaction:
Certain conditions necessitate cancellation of the
transaction. For example, data for the transaction may
not be found. A condition, such as insufficient account
balance in a banking database, may cause a transaction,
such as a fund withdrawal from that account, to be
canceled.
A programmed abort in the transaction causes it to fail.
4. Concurrency control enforcement:
The concurrency control method may decide to abort
the transaction, to be restarted later, because it violates
serializability or because several transactions are in a
state of deadlock (see Chapter 18).
17
Why Recovery is needed
5. Disk failure:
Some disk blocks may lose their data because
of a read or write malfunction or because of a
disk read/write head crash. This may happen
during a read or a write operation of the
transaction.
6. Physical problems and catastrophes:
This refers to an endless list of problems that
includes power or air-conditioning failure, fire,
theft, sabotage, overwriting disks or tapes by
mistake, and mounting of a wrong tape by the
operator.
18
2 Transaction and System Concepts (1)
A transaction is an atomic unit of work that is either
completed in its entirety or not done at all.
For recovery purposes, the system needs to keep track
of when the transaction starts, terminates, and
commits or aborts.
A transaction must be in one of the following states:
Active:
the initial state; the transaction stays in this state while it is
executing
Partially committed:
after the final statement has been executed.
Failed:
after the discovery that normal execution can no longer
proceed.
19
Transaction and System Concepts (2)
Committed:
after successful completion.
Aborted:
after the transaction has been rolled back and the
database restored to its state prior to the start of the
transaction. Two options after it has been aborted:
– restart the transaction (only if no internal logical error)
– kill the transaction
20
Transaction and System Concepts (3)
Recovery manager keeps track of the following operations:
begin_transaction
This marks the beginning of transaction execution.
read or write:
These specify read or write operations on the database items
that are executed as part of a transaction.
end_transaction:
This specifies that read and write transaction operations have
ended and marks the end limit of transaction execution.
At this point it may be necessary to check whether the changes
introduced by the transaction can be permanently applied to
the database or whether the transaction has to be aborted
because it violates concurrency control or for some other
reason.
21
Transaction and System Concepts (4)
Recovery manager keeps track of the following
operations (cont):
commit_transaction:
This signals a successful end of the transaction so
that any changes (updates) executed by the
transaction can be safely committed to the database
and will not be undone.
rollback (or abort):
This signals that the transaction has ended
unsuccessfully, so that any changes or effects that
the transaction may have applied to the database
must be undone.
22
Transaction and System Concepts (5)
Recovery techniques use the following operators:
undo:
Similar to rollback except that it applies to a single
operation rather than to a whole transaction.
redo:
This specifies that certain transaction operations
must be redone to ensure that all the operations of a
committed transaction have been applied
successfully to the database.
23
State transition diagram illustrating the
states for transaction execution
Rollback
24
The System Log
Log or Journal: The log keeps track of all
transaction operations that affect the values of
database items.
This information may be needed to permit
recovery from transaction failures.
The log is kept on disk, so it is not affected by
any type of failure except for disk failure.
In addition, the log is periodically backed up to
archival storage to guard against such failures.
25
The System Log
T in the following discussion refers to a unique transactionid that is generated automatically by the system and is used
to identify each transaction:
Types of log record:
[start_transaction, T]:
Records that transaction T has started execution.
[write_item, T, X, old_value, new_value]:
Records that transaction T has changed the value of database
item X from old_value to new_value.
[read_item, T, X]:
Records that transaction T has read the value of database item
X.
[commit, T]:
Records that transaction T has completed successfully, and
affirms that its effect can be committed (recorded
permanently) to the database.
[abort, T]:
Records that transaction T has been aborted.
26
The System Log
Protocols for recovery that avoid cascading
rollbacks do not require that read operations be
written to the system log, whereas other protocols
require these entries for recovery.
Strict protocols require simpler write entries that do
not include new value (see Section 17.4).
27
Recovery Using Log Records
If the system crashes, we can recover to a
consistent database state by examining the log and
using one of the techniques described in Chapter
19.
Because the log contains a record of every write
operation that changes the value of some
database item, it is possible to undo the effect
of these write operations of a transaction T by
tracing backward through the log and resetting
all items changed by a write operation of T to
their old_values.
We can also redo the effect of the write
operations of a transaction T by tracing
forward through the log and setting all items
changed by a write operation of T (that did not
get done permanently) to their new_values.
28
Commit Point of a Transaction
Commit Point:
A transaction T reaches its commit point when
all its operations that access the database have
been executed successfully and the effect of all
the transaction operations on the database has
been recorded in the log.
Beyond the commit point, the transaction is said
to be committed, and its effect is assumed to be
permanently recorded in the database.
The transaction then writes an entry [commit,T]
into the log.
29
Commit Point of a Transaction
Roll Back of transactions:
Needed for transactions that have a [start_transaction,T] entry into
the log but no commit entry [commit,T] into the log.
Redoing transactions:
Transactions that have written their commit entry in the log must
also have recorded all their write operations in the log; otherwise
they would not be committed, so their effect on the database can be
redone from the log entries.
Notice that the log file must be kept on disk.
At the time of a system crash, only the log entries that have been
written back to disk are considered in the recovery process because
the contents of main memory may be lost.
Force writing a log:
Before a transaction reaches its commit point, any portion of the
log that has not been written to the disk yet must now be written to
the disk.
This process is called force-writing the log file before committing a
transaction.
30
3 Desirable Properties of Transactions (1)
The ACID properties of a transaction
Atomicity
a transaction is an atomic processing unit; it is either performed
in its entirety or not performed at all.
Consistency
a transaction transforms a database from a consistent state to
another consistent state.
Isolation
A transaction should not make its updates visible to other
transactions until it is committed; this property, when enforced
strictly, solves the temporary update problem.
Durability
committed work must never be lost due to subsequently failure.
31
ACID Properties
Example:
T1
read(X)
X = X + 100
T2
read(X)
X = X - 50
write(X)
write(X)
value of X
200 (initial value)
300 (not saved yet)
200
150 (not saved yet)
300 (saved)
150 (overwrite 300)
lost $100! Correct value of X should be 250.
32
4 Schedules
A schedule S of n transactions T1, T2, ..., Tn is an
ordering of all the operations in these transactions
subject to the constraint that:
for each transaction Ti, the operations of Ti in S
must appear in the same order as they do in Ti.
Note, however, that operations from other transactions Tj
can be interleaved with the operations of Ti in S.
Example: Given
T1 = R1(Q) W1(Q)
a
schedule:
not
a schedule:
& T2 = R2(Q) W2(Q)
R1(Q) R2(Q) W1(Q) W2(Q)
W1(Q) R1(Q) R2(Q) W2(Q)
33
Schedules
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);
34
Schedules
Sb: r1(X); w1(X); r2(X); w2(X); r1(Y); a1;
35
Conflict Operations
Instructions (Operations) li and lj of transactions Ti
and Tj respectively, conflict if and only if there
exists some item Q accessed by both li and lj, and at
least one of these instructions wrote Q.
1. li = Read(Q), lj = Read(Q). li and lj don’t conflict
2. li = Read(Q), lj = Write(Q). They conflict
3. li = Write(Q), lj = Read(Q). They conflict
4. li = Write(Q), lj = Write(Q). They conflict
Two operations in a schedule are conflict if:
1) They belong to different transactions,
2) They access the same item Q, and
3) At least one them is a Write(Q) operation.
36
Recoverable Schedule
Recoverable schedule:
One where no committed transaction needs to be
rolled back.
A schedule S is recoverable if no transaction T in S
commits until all transactions T’ that have written an
item that T reads have committed.
T reads from T’ in S if X is first written by T’ and later
read by T.
T’ should not have been aborted before T reads X
There should be no transaction Ti that writes X
after T’ writes it before T reads it (unless Ti, if any,
has aborted before T reads X).
Sa, Sb and Sa’ are recoverable:
Sa’: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1;
37
Recoverable Schedule
Consider the following schedules:
Sc: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1;
Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2;
Se: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); a1; a2;
Sc is not recoverable because:
T2 reads item X from T1, and then T2 commits before T1
commits.
If T1 aborts after c2 operation in Sc, then the value of X
that T2 read is no longer valid and T2 must be aborted
after it is committed, leading to a schedule that is not
recoverable.
For the schedule to be recoverable c2 operation in Sc must
be postponed until after T1 commits, as shown in Sd;
If T1 aborts instead of committing, then T2 should also
abort as shown in Se, because the value of X it read is no
longer valid.
38
Cascade-less Schedule
Schedules requiring cascaded rollback:
A schedule in which uncommitted transactions
that read an item from a failed transaction must
be rolled back.
As shown in schedule Se
Cascadeless Schedule:
One where every transaction reads only the
items that are written by committed
transactions.
r2(X) in Sd and Se must be postponed until after T1
has committed (or aborted), thus delaying T2 but
ensuring no cascading rollback if T1 aborts.
39
Cascade-less Schedule
Strict Schedules:
A schedule in which a transaction can neither
read or write an item X until the last
transaction that wrote X has committed.
Consider the following schedule:
Sf: w1(X, 5); w2(X , 8); a1;
Suppose the value of X was originally 9.
If T1 aborts, as in Sf, the recovery system will
restore the value of X to 9, even though it has
already been changed to 8 by T2, thus leading to
incorrect results.
Although Sf is cascade-less, it is not strict
It permits T2 to write X even though T1 that last
wrote X had not yet committed (or aborted).
40
Schedules Classification
In term of:
1. Recoverability
2. Avoidance of cascading rollback
3. Strictness
Condition 2 implies condition 1, and condition 3
implies both 1 and 2.
Thus,
all strict schedules are cascade-less, and
All cascade-less schedules are recoverable
41
Recoverability
Need to address the effect of transaction failures on
concurrently running transactions.
Recoverable schedule
if a transaction Tj reads a data items previously
written by a transaction Ti, the commit
operation of Ti appears before the commit
operation of Tj.
The following schedule (Schedule 11) is not
recoverable if T9 commits immediately after the
read
42
Recoverability
If T8 should abort, T9 would have read (dirty read)
an inconsistent database state. Hence database
must ensure that schedules are recoverable.
43
Recoverability (Cont.)
Cascading rollback
a single transaction failure leads to a series of
transaction rollbacks.
Consider the following schedule where none of
the transactions has yet committed (so the
schedule is recoverable)
If T10 fails, T11 and T12 must also be rolled back.
44
Recoverability (Cont.)
Can lead to the undoing of a significant amount of
work
Cascadeless schedules
cascading rollbacks cannot occur; for each pair of
transactions Ti and Tj such that Tj reads a data
item previously written by Ti, the commit
operation of Ti appears before the read operation
of Tj.
Every cascadeless schedule is also recoverable
It is desirable to restrict the schedules to those that
are cascadeless
45
Schedules
A schedule S is serial,
if for every transaction T participating in the
schedule, all the operations of T are executed
consecutively
if operations from different transactions are not interleaved.
otherwise
the schedule is called nonserial.
Serial schedules:
R1(Q) W1(Q) R2(Q) W2(Q)
R2(Q) W2(Q) R1(Q) W1(Q)
Non-serial schedule:
R1(Q) R2(Q) W1(Q) W2(Q)
46
Example Schedules
The following is a serial schedule (Schedule 1), in
which T1 is followed by T2.
47
Example Schedule (Cont.)
The following schedule (Schedule 3) is not a serial
schedule, but it is equivalent to Schedule 1.
In both Schedule 1 & 3, the sum A+B is preserved.
48
Example Schedules (Cont.)
The following concurrent schedule (Schedule 4)
does not preserve the value of the the sum A + B.
49
Example Schedules
(a) Serial schedule A: T1 followed by T2.
(b) Serial schedules B: T2 followed by T1.
50
Example Schedule (Cont.)
(c) Two nonserial schedules C and D with
interleaving of operations.
51
Several Observations
Serial schedule guarantees database consistency.
n transactions may form n! different serial
schedules.
Different serial schedule may produce different
result.
Suppose Q = 20 initially.
R1(Q), Q=Q+10, W1(Q), R2(Q), Q=Q*2, W2(Q)
produces Q = 60
R2(Q), Q=Q*2, W2(Q), R1(Q), Q=Q+10, W1(Q)
produces Q = 50
Allowing only serial schedule may cause poor
system performance (i.e., low throughput)
52
Several Observations
Serial schedule is not a must for guaranteeing
transaction consistency.
If X and Y are independent, then the following two
schedules always produces the same result:
non-serial schedule:
R1(X) W1(X) R2(X) W2(X) R1(Y) W1(Y)
serial
schedule:
R1(X) W1(X) R1(Y) W1(Y) R2(X) W2(X)
53
Serializability
A schedule S of n transactions is serializable if it is
equivalent to some serial schedule of the same n
transactions.
Basic Assumption
each transaction preserves database consistency
We ignore operations other than read and write
instructions
schedules consist of only read and write
instructions
54
Serializability
One way to ensure correctness of concurrent
transactions is to enforce serializability of
transactions
that is the interleaved execution of the
transactions must be equivalent to some serial
execution of those transactions.
The interleaved execution of a set of transactions is
considered correct iff it is serializable.
A nonserial but serializable schedule often permits
higher degree of concurrency than a serial schedule.
Different forms of schedule equivalence give rise
to the notions of:
conflict serializability
view serializability
55
Serializability
Two schedules that are result equivalent for the
initial value of X = 100, but are not result
equivalent in general.
56
Conflict Serializability
If li and lj are consecutive in a schedule and they do
not conflict, their results would remain the same
even if they had been interchanged in the schedule.
If a schedule S can be transformed into a schedule
S´ by a series of swaps of non-conflicting
instructions, we say that S and S´ are conflict
equivalent.
Two schedules are called conflict equivalent if
the order of any two conflicting operations is the
same in both schedules
We say that a schedule S is conflict serializable if it
is conflict equivalent to a serial schedule
57
Example
Consider two transactions:
T1 = R1(X) W1(X) R1(Y) W1(Y)
T2
= R2(X) W2(X)
The following two schedules are equivalent:
S1: R1(X) W1(X) R2(X) W2(X) R1(Y) W1(Y)
S2: R1(X) W1(X) R1(Y) W1(Y) R2(X) W2(X)
58
Conflict Serializability (Cont.)
Example of a schedule that is not conflict
serializable:
T3
T4
read(Q)
write(Q)
write(Q)
We are unable to swap instructions in the above
schedule to obtain either the serial schedules
T3, T4 or T4, T3
59
Conflict Serializability (Cont.)
Schedule 3 below can be transformed into Schedule
1, a serial schedule where T2 follows T1, by series
of swaps of non-conflicting instructions.
Therefore Schedule 3 is conflict serializable.
60
View Serializability
Let S and S´ be two schedules with the same set of
transactions. S and S´ are view equivalent if the
following three conditions are met:
1.
For each data item Q, if transaction Ti
reads the initial value of Q in schedule S, then
transaction Ti must, in schedule S´, also read the
initial value of Q.
2.
For each data item Q, if transaction Ti
executes read(Q) in schedule S, and that value
was produced by transaction Tj (if any), then
transaction Ti must, in schedule S´, also read the
value of Q that was produced by transaction Tj.
61
View Serializability
3.
For each data item Q, the transaction (if
any) that performs the final write(Q) operation
in schedule S must perform the final write(Q)
operation in schedule S´.
As can be seen, view equivalence is also based
purely on reads and writes alone.
Conditions 1 and 2 ensure that
each transaction reads the same values in both
schedules.
Condition 3, coupled with conditions 1 and 2,
ensures that
both schedules results in the same final state
62
View Serializability (Cont.)
A schedule S is view serializable
if it is view equivalent to a serial schedule.
Every conflict serializable schedule is also
view serializable.
Schedule 9 is view-serializable but not conflict serializable.
Every view serializable schedule that is not conflict
serializable has blind writes.
a write operation without having performed a read
operation
63
Testing for Serializability
Consider some schedule of a set of transactions T1,
T2, ..., Tn
Precedence graph — a direct graph where the
vertices are the transactions (names).
We draw an edge from Ti to Tj if the two transaction
conflict, and Ti accessed the data item on which the
conflict arose earlier.
The edge may be labeled by the item that was
accessed
x
y
64
Example Schedule (Schedule A)
T1
T2
R(X)
T3
T4
T5
R(Y)
R(Z)
R(B)
R(A)
R(A)
R(Y)
W(Y)
T1
W(Z)
T2
R(U)
R(Y)
W(Y)
R(Z)
W(Z)
R(U)
W(U)
T5
T3
65
T4
Test for Conflict Serializability
A schedule is conflict serializable iff its
precedence graph is acyclic.
If the precedence graph of schedule S has no cycle,
then S is equivalent to any serial schedule that can
be generated by a topological sort of the
precedence graph.
For example, a serializability order for schedule A
would be
T5 T1 T3 T2 T4
66
Test Schedule Serializability
Consider the following schedule S
R1(X) R2(Y) W1(X) R2(X) W2(Y) W2(X)
R3(Y) W3(Y) R4(X) W4(X)
T1
T2
T3
T4
Two possible orders of topological sorting:
T1 T2 T3 T4
& T1 T2 T4 T3
S is equivalent to both of the above two serial
schedules
67
Test for View Serializability
The precedence graph test for conflict
serializability must be modified to apply to a test
for view serializability.
The problem of checking if a schedule is view
serializable falls in the class of NP-complete
problems.
Thus existence of an efficient algorithm is
unlikely.
However practical algorithms that just check some
sufficient conditions for view serializability can
still be used.
68
Other Notions of Serializability
Schedule 8 given below produces
same outcome as the serial schedule
T1, T5, yet is not conflict equivalent
or view equivalent to it.
Because addition and subtraction are
commutitiave (they can be applied in
any order), it is possible to produce
correct schedules that are not
serializable.
Determining such equivalence
requires additional knowledge or
sematics of operations other than read
and write.
69
Implementation of Isolation
Schedules must be conflict or view serializable,
and recoverable, for the sake of database
consistency, and preferably cascadeless.
Concurrency-control schemes tradeoff between the
amount of concurrency they allow and the amount
of overhead that they incur.
Some schemes allow only conflict-serializable
schedules to be generated, while others allow
view-serializable schedules that are not conflictserializable.
70
Transaction Definition in SQL
Data manipulation language must include a
construct for specifying the set of actions that
comprise a transaction.
In SQL, a transaction begins implicitly.
A transaction in SQL ends by:
Commit work
commits current transaction and begins a new one.
Rollback work
causes current transaction to abort.
READ Section 17.6
71