Transcript Document
Database Systems
DataBase System
Major Content & Grade
Introduction
*
The
***
Relational Model
SQL
****
Transaction
Management
***
Database
Design (E-R)
***
Database
Design (Normalization)
***
Haichang Gao , Software School , Xidian University
2
DataBase System
Table of contents
Motivation
Basic concept of transactions
Issues of concurrency control
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
Operations on databases (SQL commands)
Queries
: select … from … where…
Insertions
Deletions
Updates
Create
: insert … values …
: delete … where …
: update … where …
tables, change attributes etc.
These are basic operations on tables
But are they “too basic” in real life?
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
Consider a database for bank accounts
Basic operations (in the eye of the customers)
Withdraw
Deposit
Transfer
Dividend
(利息)
Each basic operations contains multiple database
operations
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
Example : Transfer $k from x to y (Method 1)
Find tuple for x’s account (database query)
Read x’s account info into main memory
3. Check if x have at least $k
4. Subtract $k from x’s account
5. Write x’s new balance back to the database (database
update)
6. Find tuple for y’s account (database query)
7. Read y’s account info into main memory
8. Add $k to y’s account
9. Write y’s new balance to the database (database update)
1.
2.
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
One needs to maintain
Consistency/Correctness
Efficiency
Consistency/Correctness: The right amount of
money being transferred
Easy to check for normal operations
But what if
System crashes
Multiple users want to update same data
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
System crashes, case 1
Find tuple for x’s account (database query)
Read x’s account info into main memory
3. Check if x have at least $k
4. Subtract $k from x’s account
5. Write x’s new balance back to the database (database
update)
crashes! ------------------------------------------- System crashes!
6. Find tuple for y’s account (database query)
7. Read y’s account info into main memory
8. Add $k to y’s account
9. Write y’s new balance to the database (database update)
1.
2.
What is the database like now?
What happen if we don’t do anything about it?
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
System crashes, case 2
Find tuple for x’s account (database query)
Read x’s account info into main memory
3. Check if x have at least $k
4. Subtract $k from x’s account
5. Write x’s new balance back to the database (database
update)
6. Find tuple for y’s account (database query)
7. Read y’s account info into main memory
8. Add $k to y’s account
9. Write y’s new balance to the database (database
update)
1.
2.
crashes! -------------------------------------------- System crashes!
OK?
But what is output which has being buffered?
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
Two potential problems
System crashes in the middle
Need to make sure the system is consistent after restarting
Some tuples may be updated but others aren’t
System crashes at the “end”
It is unclear if all changes is saved onto the disk
When system crashes, all the unsaved changes is lost
Need to ensure that all changes are reflected
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
Another problem: multiple users
Consider another operation, dividend:
Find tuple for x’s account (database query)
2. Find tuple for y’s account (database query)
3. Read x’s account info into main memory
4. Read y’s account info into main memory
5. Add 1% to x’s account
6. Write x’s new balance back to the database (database
update)
7. Add 1% to y’s account
8. Write y’s new balance back to the database (database
update)
1.
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
Suppose x has $100, y has $200, consider two operations
x transfer $50 to y
Dividend
If transfer comes before dividend
X : 100 -> 50 -> 50.5
Y : 200 -> 250 -> 252.5
If dividend comes before transfer
X : 100 -> 101 -> 51
Y : 200 -> 202 -> 252
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
What if we want concurrent execution?
What does concurrent mean?
Can we concurrently run commands without any
limitations?
What is an acceptable schedule?
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
1.
2.
3.
4.
5.
6.
7.
8.
9.
Find tuple for x’s account (database query)
Read x’s account info into main memory
Check if x have at least $k
Subtract $k from x’s account
Write x’s new balance back to the
1. Find tuple for x’s account (database query)
database (database update)
2. Find tuple for y’s account (database query)
3. Read x’s account info into main memory
4. Read y’s account info into main memory
5. Add 1% to x’s account
6. Write x’s new balance back to the
database (database update)
7. Add 1% to y’s account
8. Write y’s new balance back to the
database (database update)
Find tuple for y’s account (database query)
Read y’s account info into main memory
Add $k to y’s account
Write y’s new balance to the database
(database update)
X : 100 -> 50 -> 50.5;
Y : 200 -> 202 -> 252
Acceptable to the bank, but not the customer….
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
Thus need to define an acceptable standard of
consistency, in the face of concurrent execution with
other commands
A plausible definition:
“If
multiple commands execute concurrently, the results
must looks like that the commands are executed one by
one (sequentially)’’
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
Many of the problems above can be eliminated if we
Disable
concurrency
Forcing
writes to disk immediately
Do
not write anything until the end of the command
However this leads to inefficiency
Thus: how to get the best of both worlds…
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- definition
A transaction is a unit of program execution that
accesses and possibly updates various data items.
Can be defined as
A set of SQL statements
Stored procedures
Initiated by high level programming
languages (Java, C++ etc.)
Delimited by begin transaction & end transaction
Example:
Begin transaction
X = select salary from
person where name = “Chu”
update person set salary = x * 10 where name = “”
Update person set salary = x / 10 where name = “”
End transaction
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- states
A transaction can be in any one of the 5 states:
Active,
the initial state; the transaction stays in this state
while it is executing
Partially committed, after the final statement has been
executed.
Failed, after the discovery that normal execution can no
longer proceed.
Aborted, after the transaction has been rolled back and the
database restored to its state prior to the start of the
transaction. Two options after it has been aborted:
restart the transaction – only if no internal logical error
kill the transaction
Committed, after successful completion.
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- states
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- states
A transaction need not commit immediately after its
last statement
Why?
It is the DBMS’s responsibility to determine which
transactions can commit and which to abort
Also, it is the DBMS’s responsibility to clean up
(roll back) after a transaction aborts
Possibility of cascade aborts
Haichang Gao , Software School , Xidian University
DataBase System
DataBase System
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- consistency
During transaction execution the database may be
inconsistent. When the transaction is committed,
the database must be consistent.
Two main issues to deal with:
Failures
of various kinds, such as hardware failures and
system crashes
Concurrent
execution of multiple transactions
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- ACID
Four basic properties that transactions must be
maintained
Atomicity (原子性): All or nothing
Consistency (一致性) : Each transaction must ensure data
consistency
Isolation (隔离性): Transactions “unaware” of other concurrent
transaction
Durability (持久性): Once committed, changes to database
must be persistent
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- ACID
Atomicity : All or nothing
i.e. : Either all operations of the transaction are properly
reflected in the database or none are.
Implications
If
the system crashes in the middle of a transaction T, when
the system restarts, before any user can use the database
again, the DBMS must ensure either
T is finished
T never started
Which do you think is easier? Make more sense?
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- ACID
Consistency: Each transaction must ensure data
consistency
i.e. Execution of a transaction in isolation preserves the
consistency of the database.
Thus all integrity and other constraints must be
satisfied
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- ACID
Isolation : Transactions “unaware” of other concurrent
transaction
i.e. : Although multiple transactions may execute concurrently,
each transaction must be unaware of other concurrently
executing transactions. Intermediate transaction results must be
hidden from other concurrently executed transactions.
Implications:
for
every pair of transactions Ti and Tj, it appears to Ti that
either Tj, finished execution before Ti started, or Tj started
execution after Ti finished.
Some level of interleaving are not allowed
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- ACID
Durability : Once committed, changes to database
must be persistent
i.e. : After a transaction completes successfully, the
changes it has made to the database persist, even if
there are system failures.
Implications:
Suppose
a transaction commits, and then the system crashes.
When the system restarts, before any user can use the
database again, the DBMS must ensure that the changes
made by this transaction is stored onto the disk.
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics – DBMS support
2 major tasks in DBMS to handle transactions
Concurrency control
Recovery
Handle how concurrent transaction is executed
Goal: Isolation
Handle how to recover a database after a failure
Goal: Atomicity & Durability
Consistency is maintained throughout various part of the DBMS (not
the focus of this course)
Many systems rolls the two part together as a “transaction
manager”
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics – DBMS support
What makes transaction processing tricky(灵活的)
Scheduling is hidden from the DBMS
DBMS cannot enforce which transaction to execute next
Buffer management is hidden from DBMS
Although the transaction write something onto the database, it
is only written to the buffers, to be transferred to the disk at
unspecified time
One can force transfer immediately, but will be very inefficient
Haichang Gao , Software School , Xidian University
DataBase System
Table of contents
Concurrency control
Serializability
Motivating
Conflict
Test
examples
serializability
of serializability
Haichang Gao , Software School , Xidian University
DataBase System
Concurrency control
Why concurrency?
increased
processor and disk utilization, leading to better
transaction throughput: one transaction can be using the CPU
while another is reading from or writing to the disk
reduced
average response time for transactions: short
transactions need not wait behind long ones.
Shared
resources. Many transaction may want to access the
same object/tuple.
Isolation.
One of the key requirement of transactions
Haichang Gao , Software School , Xidian University
DataBase System
Concurrency control -- schedule
Schedules – sequences that indicate the chronological
order(时间顺序) in which instructions of concurrent
transactions are executed
a
schedule for a set of transactions must consist of all
instructions of those transactions
must preserve the order in which the instructions appear in
each individual transaction.
Assumption: at any time, only one operation from
one transaction can be executed
However, DBMS may interleave operations (交叉执
行) from multiple transactions
Haichang Gao , Software School , Xidian University
DataBase System
Concurrency control -- schedule
Serial schedule: schedules that does not allow
interleaving between transactions (i.e. one transaction
finishes before the other begin)
Equivalent schedules: two schedules are equivalent if
they “produce the same results”
Haichang Gao , Software School , Xidian University
DataBase System
Notation used
Database consists of objects (X, Y, Z) , each of them is
an integer
Transactions are labeled T1, T2 etc.
Each transaction has a set of local variables (not
accessible by other transactions) in main memory.
(Labeled as a1, a2, b1, b2 etc.)
Each transaction access the database by the read() &
write() command
Haichang Gao , Software School , Xidian University
DataBase System
Notation used
A read command read a database object into a local variables (a1
<- read(X) )
A write command write a local variable into the database object
(write(X, a1) )
Local variables for read() & write() will not be shown if the
context is clear, or if it is unimportant
Manipulation and calculation on objects can only be done on
local variables (e.g. X <- X + 1 is not allowed, but a1 <- a1+ 1 is
ok)
In some case, the local manipulation is not shown (to highlight
the effects of read() and write())
Haichang Gao , Software School , Xidian University
DataBase System
Notation used
Example; (transfer, assuming overdraft(透支) is
allowed)
1.
a1 <- Read(X)
2.
a1 <- a1 – k
3.
Write(X, a1)
4.
a2 <- Read(Y)
5.
a2 <- a2 + k
6.
Write(Y, a2)
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example
Consider the fund transfer operation
1. Find tuple for x’s account
(database query)
2. Read x’s account info into
main memory
3. Check if x have at least $k
Rewrite
4. Subtract $k from x’s
Using class
account
notation
5. Write x’s new balance back
to the database (database
update)
6. Find tuple for y’s account
(database query)
7. Read y’s account info into
main memory
8. Add $k to y’s account
9. Write y’s new balance to
the database (database
update)
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example
Consider the dividend operation
1. Find tuple for x’s account
(database query)
2. Read x’s account info into
main memory
3. Add 1% to x’s account
4. Write x’s new balance
back to the database
(database update)
5. Find tuple for y’s account
(database query)
6. Read y’s account info into
main memory
7. Add 1% to y’s account
8. Write y’s new balance
back to the database
(database update)
Rewrite
Using class
notation
1. A1 <- Read(X)
2. A1 <- A1* 1.01
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 * 1.01
6. Write(Y, A2)
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example
Suppose x has $100, y has $200
Consider two operations
x transfer $50 to y
Dividend
For serial schedules
If transfer comes before dividend
If dividend comes before transfer
X : 100 -> 50 -> 50.5
Y : 200 -> 250 -> 252.5
X : 100 -> 101 -> 51
Y : 200 -> 202 -> 252
In both case, X + Y = 303
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example (2)
But with the following schedule
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
1.
2.
3.
4.
5.
6.
A1 <- Read(X)
A1 <- A1* 1.01
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 * 1.01
Write(Y, A2)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
X: 100 -> 50 -> 50.5;
Y: 200 -> 202 -> 252;
X+Y= 302.5
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example (2)
What cause the problem?
Contention
of resources(资源竞争)?
Interleaving(交叉执行)?
Is interleaving always bad?
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example (2)
But with the following schedule
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
1. A1 <- Read(X)
2. A1 <- A1* 1.01
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 * 1.01
6. Write(Y, A2)
X: 100 -> 50 -> 50.5;
Y: 200 -> 250 -> 252.5;
X+Y = 303
In this case, interleaving is ok!
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example (2)
Let’s change slightly:
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
X: 100 -> 50 -> 50.5;
1.
2.
3.
4.
A1 <- Read(X)
A1 <- A1* 1.01
Write(X, A1)
A2 <- Read(Y)
5. A2 <- A2 * 1.01
6. Write(Y, A2)
Y: 200 -> 250 -> 202;
X+Y = 252.5
In this case, interleaving is very bad!
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example (2)
Let’s change slightly (again):
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
X: 100 -> 50 -> 50.5;
1. A1 <- Read(X)
2. A1 <- A1* 1.01
3.
4.
5.
6.
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 * 1.01
Write(Y, A2)
Y: 200 -> 250 -> 252.5;
X+Y = 303
In this case, interleaving is good again!
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example (2)
What’s going on here?
Interleaving can
However,
How
be very bad.
some interleaving does not cause problems.
can we determine what kind of interleaving is “nice”?
Haichang Gao , Software School , Xidian University
DataBase System
Serializability
How to formalize the notion?
One can look at final results
If the schedule produce the same result as a serial schedule, then it’s fine.
However, this may be due to luck and/or “commutative”(可交换的)
operators
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
1.
2.
3.
4.
5.
6.
A1 <- Read(X)
A1 <- A1 + m
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 - m
Write(Y, A2)
A better notion is needed
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability
Suppose two transactions (T1, T2) want to operate on the
same data object (X)
Four possible scenarios
T1 Read(X), T2 Read(X)
T1 Read(X), T2 Write(X)
T1 Write(X), T2 Read(X)
T1 Write(X), T2 Write(X)
How does the order of these operations affect the results of
the transactions?
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
Remember this schedule?
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
X: 100 -> 50 -> 50.5;
i. A1 <- Read(X)
ii. A1 <- A1* 1.01
iii. Write(X, A1)
iv. A2 <- Read(Y)
v. A2 <- A2 * 1.01
vi. Write(Y, A2)
Y: 200 -> 250 -> 252.5;
X+Y = 303
In this case, interleaving is ok!
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
Remember this schedule?
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
i.
ii.
A1 <- Read(X)
A1 <- A1* 1.01
iii. Write(X, A1)
iv. A2 <- Read(Y)
v. A2 <- A2 * 1.01
vi. Write(Y, A2)
(4) And (iii) are not in conflict, so can swap
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
Remember this schedule?
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
i.
A1 <- Read(X)
ii. A1 <- A1* 1.01
iii. Write(X, A1)
iv. A2 <- Read(Y)
v. A2 <- A2 * 1.01
vi. Write(Y, A2)
(4) And (ii) are not in conflict, so can swap
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
Remember this schedule?
1.
2.
3.
4.
A1 <- Read(X)
A1 <- A1 – k
Write(X, A1)
A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
i. A1 <- Read(X)
ii. A1 <- A1* 1.01
iii. Write(X, A1)
iv. A2 <- Read(Y)
v. A2 <- A2 * 1.01
vi. Write(Y, A2)
(4) And (i) are not in conflict, so can swap
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
Remember this schedule?
1.
2.
3.
4.
5.
A1 <- Read(X)
A1 <- A1 – k
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 + k
6. Write(Y, A2)
i. A1 <- Read(X)
ii. A1 <- A1* 1.01
iii. Write(X, A1)
iv. A2 <- Read(Y)
v. A2 <- A2 * 1.01
vi. Write(Y, A2)
Similarily (5) And (i) – (iii) are not in conflict, so can swap
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
Remember this schedule?
1.
2.
3.
4.
5.
6.
A1 <- Read(X)
A1 <- A1 – k
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 + k
Write(Y, A2)
i.
ii.
iii.
iv.
v.
vi.
A1 <- Read(X)
A1 <- A1* 1.01
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 * 1.01
Write(Y, A2)
Similarily (6) And (i) – (iii) are not in conflict, so can swap
We obtain a serial schedule by this transformation (swapping process)
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
Now, remember this schedule?
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
X: 100 -> 50 -> 50.5;
i.
ii.
iii.
iv.
v.
vi.
A1 <- Read(X)
A1 <- A1* 1.01
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 * 1.01
Write(Y, A2)
Y: 200 -> 202 -> 252;
X+Y = 302.5
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
Now, remember this schedule?
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
i.
ii.
iii.
iv.
v.
vi.
A1 <- Read(X)
A1 <- A1* 1.01
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 * 1.01
Write(Y, A2)
(3) And (i) has conflict, so can’t swap
(4) And (vi) has conflict, so can’t swap
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
Can you work through this case?
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
X: 100 -> 50 -> 50.5;
1.
2.
3.
4.
A1 <- Read(X)
A1 <- A1* 1.01
Write(X, A1)
A2 <- Read(Y)
5. A2 <- A2 * 1.01
6. Write(Y, A2)
Y: 200 -> 250 -> 202;
X+Y = 250.5
In this case, interleaving is very bad!
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability
From previous slides, it seems
schedule can be transformed to a serial schedule good!
(achieve isolation)
A
schedule cannot be transformed to a serial schedule bad!
(do not achieve isolation)
A
Can we generalize?
Yes.
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- definitions
Schedule: sequences that indicate the chronological order in
which instructions of concurrent transactions are executed
Complete schedule: Schedule that contain commit/abort
decision for each transaction in the schedule
Serial schedule: A schedule where there is no interleaving of
operations by multiple transactions.
Denoted by < T1, T2, … , Tn>
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- definitions
Conflict equivalent transformation: swapping adjacent
operation on a schedule which does not conflict
Conflict equivalent(冲突等价): Two schedules S and S’ are
conflict equivalent if S can be transformed to S’ by successive
conflict equivalent transformations
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- definitions
Conflict serializability(冲突可串行化): a schedule S
is conflict serializable if it is conflict equivalent to a
serial schedule
Example of a schedule that is not conflict serializable:
T3
read(Q)
T4
write(Q)
write(Q)
We are unable to swap instructions in the above schedule to obtain either the
serial schedule < T3, T4 >, or the serial schedule < T4, T3 >.
Haichang Gao , Software School , Xidian University
DataBase System
Test for serializability
Consider the following schedules:
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
i.
ii.
iii.
iv.
v.
vi.
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
i. A1 <- Read(X)
ii. A1 <- A1* 1.01
iii. Write(X, A1)
A1 <- Read(X)
A1 <- A1* 1.01
Write(X, A1)
A2 <- Read(Y)
4. A2 <- Read(Y)
A2 <- A2 * 1.01 5. A2 <- A2 + k
Write(Y, A2)
6. Write(Y, A2)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
Not serializable
iv. A2 <- Read(Y)
v. A2 <- A2 * 1.01
vi. Write(Y, A2)
Serializable
Haichang Gao , Software School , Xidian University
DataBase System
How to guarantee serializable
Need to provide protocol (rules on how data item is accessed)
to ensure conflict serializability
Goal of protocol:
To allow access for data items that are not required by
multiple transactions
For those data items required by multiple transaction, restrict
access in some way, or limit it to exclusive access
Balance between safety and efficiency
Too restrictive: little or no concurrency, ineffective
Too lenient: leads to inconsistency
Haichang Gao , Software School , Xidian University
DataBase System
Lock-based protocols
“Exclusive” access locks
Each database item is associated with locks
Transaction must obtain locks before accessing the
object
Transaction must release lock when it finishes.
Haichang Gao , Software School , Xidian University
DataBase System
Lock-based protocols
Example:
Lock(X)
Read(X)
Unlock(X)
Read(X)
Lock(X): check if object X is already locked
If not, obtain the lock
If so, wait or “do something” to handle the potential deadlock (like
aborting)
Unlock(X): release the lock on object X
The addition of Lock(X) and Unlock(X) commands are done
by the DBMS
Haichang Gao , Software School , Xidian University
DataBase System
Lock-based protocols: S and X locks
Two kinds of locks on each object
Shared
locks (S-locks)
Requested before reading
Multiple transactions can hold a S-lock on an object
simultaneously
Exclusive locks (X-locks)
Requested before writing
Only one transaction can hold an X-lock on an object at
any given time
No other transaction can hold any lock (not even a S-lock)
if some transaction has an X-lock
Haichang Gao , Software School , Xidian University
DataBase System
Lock-based protocols: S and X locks
More on S and X locks
A
transaction that holds an S-lock on an object can read the
object
A
transaction that holds an X-lock on an object can read and
write the objects
Lock-compatibility table
T2 holds
S-lock
X-lock
S-lock
True
False
X-lock
False
False
T1 Request
Haichang Gao , Software School , Xidian University
DataBase System
Lock-based protocol
Consider the following examples (again):
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
T1 (Transfer)
1. A1 <- Read(X)
2. A1 <- A1* 1.01
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 * 1.01
6. Write(Y, A2)
T2 (Dividend)
Haichang Gao , Software School , Xidian University
DataBase System
Lock-based protocol -- example
With a lock-based protocol, one possible way T1 is
transformed
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
T1 (Transfer)
1. S-lock(X)
2. A1 <- Read(X)
3. A1 <- A1 – k
4. X-lock(X)
5. Write(X, A1)
6. X-lock(Y)
7. A2 <- Read(Y)
8. A2 <- A2 + k
9. Write(Y, A2)
10.Unlock(X)
11.Unlock(Y)
Haichang Gao , Software School , Xidian University
DataBase System
Lock-based protocol -- example
Notes from the previous example:
Locks must be obtained before read/write can begin
If a transaction want to read and write the same object, it can either
Obtain an X-lock before reading
Obtain an S-lock before reading, then obtain an X-lock before
writing (it is not automatically granted)
A transaction does not have to release locks immediately after use
Good or bad?
Haichang Gao , Software School , Xidian University
DataBase System
Two phase locking – definition
The basic two-phase locking (2PL) protocol
A transaction T must hold a lock on an item x in the appropriate
mode before T accesses x.
If a conflicting lock on x is being held by another transaction, T waits.
Once T releases a lock, it cannot obtain any other lock subsequently.
Note: a transaction is divided into two phases:
A growing phase (obtaining locks)
A shrinking phase (releasing locks)
Claim : 2PL ensures conflict serializability
Haichang Gao , Software School , Xidian University
DataBase System
Two phase locking – Serializability
Lock-point: the point where the transaction obtains
all the locks
With 2PL, a schedule is conflict equivalent to a a
serial schedule ordered by the lock-point of the
transactions
Haichang Gao , Software School , Xidian University
DataBase System
2-phase locking -- example
1.
2.
3.
4.
S-lock(X)
A1 <- Read(X)
A1 <- A1 – k
X-lock(X)
5. Write(X, A1)
6. S-lock(Y)
7. A2 <- Read(Y)
8. A2 <- A2 + k
9. X-lock(Y)
10. Write(Y, A2)
11. Unlock(X)
Lock point for T1
12. Unlock(Y)
T1
1. S-lock(X)
1.
2.
3.
4.
5.
6.
S-lock(X)
A1 <- Read(X)
A1 <- A1* 1.01
X-lock(X)
Write(X, A1)
S-lock(Y)
T2 waits
T2 waits
6. S-lock(Y)
7. A2 <- Read(Y)
8. A2 <- A2 * 1.01
Lock point for T2
9. X-lock(Y)
10. Write(Y, A2)
11. Unlock(Y)
T2
Haichang
Gao
,
Software
School
, Xidian University
12. Unlock(X)
DataBase System
DataBase System
Haichang Gao , Software School , Xidian University
DataBase System
Table of contents
Motivation
Basic concept of transactions
Issues of concurrency control
Haichang Gao , Software School , Xidian University
DataBase System
Deadlocks
2 phase locking is a blocking protocol (transaction
has to wait if it cannot obtain a lock)
Probability of a deadlock(死锁)
Example:
1. S-lock(X)
2. A1 <- Read(X)
3. A1 <- A1 – k
Can’t proceed
T2 has S-lock
on X
4. X-lock(X)
1. S-lock(X)
2. A1 <- Read(X)
3. A1 <- A1* 1.01
4. X-lock(X)
Can’t proceed
T1 has S-lock
on X
Deadlock!
Haichang Gao , Software School , Xidian University
DataBase System
Deadlocks
How to deal with deadlocks?
Ostrich (驼鸟策略)
Pretend nothing happens and wait for user to hit the <ctrl-alt-del> key
Timeout(超时法)
Assume deadlock after there is no progress for some time, and do
something about it
Detection and recovery(检测与恢复)
Wait until a deadlock occurs and do something about it
Avoidance(避免死锁)
Wait until a deadlock can occur if certain operations is executed and
do something about it
Prevention(预防法)
Set up the system such that there is never a chance of deadlocks
Haichang Gao , Software School , Xidian University
DataBase System
Deadlocks -- timeout
When transaction waits too long for a lock, it is
aborted and restarted again from the beginning
The timeout period should be:
long
enough so that most transactions that are aborted are
actually deadlocked;
short
enough so that deadlocked transactions don’t wait too
long for their deadlocks to be broken.
Haichang Gao , Software School , Xidian University
DataBase System
Deadlocks -- Detection
Deadlocks can be described as a wait-for graph, which
consists of a pair G = (V,E),
V
is a set of vertices (all the transactions in the system)
E
is a set of edges; each element is an ordered pair Ti Tj.
Haichang Gao , Software School , Xidian University
DataBase System
Deadlocks -- Detection
When deadlock is detected :
Some
transaction will have to rolled back (made a victim)
to break deadlock. Select that transaction as victim that
will incur minimum cost.
Rollback -- determine how far to roll back transaction
Total rollback: Abort the transaction and then restart it.
More effective to roll back transaction only as far as necessary to
break deadlock. (but tricky to implement)
Drawback:
maintaining wait-for-graph
is not cheap
Detecting cycles is an overhead
Haichang Gao , Software School , Xidian University
DataBase System
The phantom(幻影) menace
Consider the following 2 transactions
1. Select sum(salary)
From faculty
Where dept = “CS”
2. Select sum(salary)
From faculty
Where dept = “Math”
T1
a. Insert into faculty
values (“Lin”, “CS”, 1000)
b. Insert into faculty
values (“Lam”, “Math”, 5000)
T2
There does not seems to be a conflict (in terms of tuple)
Assume initially CS faculty have total salary 1,000,000 and
Math faculty have total salary 2,000,000
Then T1 -> T2 will imply the select statements return
1,000,000 and 2,000,000
T2 -> T1 will imply the select statements return 1,001,000
and 2,005,000
Haichang Gao , Software School , Xidian University
DataBase System
The phantom menace
But consider the following schedule
1. Select sum(salary)
From faculty
Where dept = “CS”
2. Select sum(salary)
From faculty
Where dept = “Math”
a. Insert into faculty
values (“Lin”, “CS”, 1000)
b. Insert into faculty
values (“Lam”, “Math”, 5000)
T1
T2
The output will be 1,000,100 and 2,000,000
Not conflict serializable!
Haichang Gao , Software School , Xidian University
DataBase System
The phantom problem
This is known as the phantom problem
Why does it occur?
No tuples are in conflict
However, conflict occurs for tuples that satisfy a certain condition
(dept = “CS”, dept = “Math”)
T1 require access for ALL tuples satisfying the condition
However, T2 changes the number of tuples satisfying the condition
No quick solution: index-locking as a possibility
Haichang Gao , Software School , Xidian University
DataBase System
Isolation levels
The goal of locking is to ensure serializability
To ensure serializability, we require 2PL.
No
new locks are acquired once releasing locks begin
Locks acquired need to be held for a long time (long locks)
One cannot acquire a lock, done work with it, and then release it
immediately
Adv: ensure serializability
Dis: less concurrency
Is serializability ( long locks/two-phase locking) necessary
in all cases?
Haichang Gao , Software School , Xidian University
Serializability – In practice
DataBase System
Data manipulation language must include a construct for
specifying the set of actions that comprise a transaction.
In SQL, a transaction begins implicitly.
A transaction in SQL ends by:
Commit work commits current transaction and begins a new one.
Rollback work causes current transaction to abort.
Levels of consistency specified by SQL-92:
Serializable — default
Repeatable read
Read committed
Read uncommitted
Haichang Gao , Software School , Xidian University
Serializability – In practice
DataBase System
Serializable — default
Repeatable read — only committed records to be read, repeated
reads of same record must return same value. However, a
transaction may not be serializable – it may find some records
inserted by a transaction but not find others.
Read committed — only committed records can be read, but
successive reads of record may return different (but committed)
values.
Read uncommitted — even uncommitted records may be read.
Haichang Gao , Software School , Xidian University
DataBase System
Levels of Isolation
SQL:
SET TRANSACTION (READ ONLY | READ WRITE)
ISOLATION LEVEL (READ UNCOMMITTED | READ COMMITTED |
REPEATABLE READ | SERIALIZABLE);
Two facts of the transaction defined by the statement:
1). Type of the transaction (Read only/Read Write)
2). Levels of isolation
Haichang Gao , Software School , Xidian University
DataBase System
Levels of Isolation
四种隔离级别的读写锁
短期锁(short-term lock):只是在执行存取操作的一段足够长的时间内保留;
长期锁(long-term lock): 这种锁将一直保持直到事务提交为止。
Read
Uncommitted
(dirty reads)
Read Committed
Write locks on rows
of a table are longterm
Read locks on rows of
a table are long-term
Read and write locks on
predicates are long-term
NO (but it’s readonly)
No Read locks at all
No predicate locks at all
No
Short-term Read predicate
locks
Long-term Write predicate
locks
Yes
Repeatable Read
Yes
Yes
Short-term Read predicate
locks
Long-term Write predicate
locks
Serializable
Yes
Yes
Long-term Read and Write
predicate locks
Haichang Gao , Software School , Xidian University
DataBase System
Levels of Isolation
1) Read Uncommitted
特点:
· 该隔离级别执行的事务只能是读事务,而不允许更新事务;
· 该隔离级别执行的事务无论访问什么数据都不(需要)加锁。
故该隔离级别可能Read到一个Uncommitted结果。
不会产生丢失修改,但是会产生读脏数据。
Haichang Gao , Software School , Xidian University
DataBase System
Levels of Isolation
2) Read Committed
特点:
· 该隔离级别下加的写锁为长期锁,读锁为短期锁;
短期读谓词锁,长期写谓词锁.
· 该隔离级别下对于三类冲突的两种不会发生:
a. Wi(A) →Rj(A)
(2) 故Read到的永远是Committed结果。(读脏不会出现)
b. Wi(A) →Wj(A) (3) (一般的丢失修改不会发生)
· 该隔离级别下对于三类冲突的一种会发生:
c. Ri(A) → Wj(A) (1)
故会导致如下二类问题:
① 不可重复读
② Scholar’s Lost Update Anomaly
Haichang Gao , Software School , Xidian University
DataBase System
Levels of Isolation
3) Repeatable Read
特点:
· 该隔离级别下加的读、写锁均为长期锁;并且仅对读谓词加的是短期锁,写
谓
词加为长期锁。
该隔离级别消除了丢失修改、读脏数据、不可重复读异常。但由于其对读谓词
加的是短期锁,故该隔离级别消除不了“幽灵更新异常”(phantom update anomaly)
。
4) Serializability (an Phantom Updates)
特点:
· 该隔离级别下加的读、写锁均为长期锁;并且对读、写谓词加的均为长期锁
。
该隔离级别可保证并发的事务串行化并消除所有的异常。
Haichang Gao , Software School , Xidian University
DataBase System
Table of contents
Recovery(恢复)
Haichang Gao , Software School , Xidian University
DataBase System
Recovery – why?
ACID properties
Atomicity:
all-or-nothing
Durability: Once
committed, changes must be permanent
If the system is always functional, that’s ok.
However, system may crash (failure).
Haichang Gao , Software School , Xidian University
92
Recovery – why?
DataBase System
Types of failures:
Transaction
failure :
Logical errors: transaction cannot complete due to some
internal error condition (e.g. your program have a division
by zero error)
System errors: the database system must terminate an
active transaction due to an error condition (e.g., deadlock)
Haichang Gao , Software School , Xidian University
93
Recovery – why?
DataBase System
Types of failures (ctd)
System
crash: a power failure or other hardware or software
failure causes the system to crash.
Fail-stop assumption: non-volatile storage (非易失性存储器)
contents are assumed to not be corrupted by system crash
Database systems have numerous integrity checks to prevent
corruption of disk data
Haichang Gao , Software School , Xidian University
94
Recovery – why?
DataBase System
Types of failures (ctd):
Disk
failure: a head crash or similar disk failure destroys all
or part of disk storage
Destruction is assumed to be detectable: disk drives use
checksums to detect failures
With some disk systems, some failures maybe correctable.
We focus on transaction/system failure as well as
other failures that are correctable.
Haichang Gao , Software School , Xidian University
95
DataBase System
Recovery – why?
Failure can occur ANY time
Including
the time that you do not want
We
do assume in this class that writing a page is atomic (i.e.
failure do not occur in the middle of writing a page)
Thus need to maintain atomicity and durability at
all times
Haichang Gao , Software School , Xidian University
96
DataBase System
Recovery – why?
After failure, the system needs to recover
Need
Need
to recover to a consistent state
to ensure all transactions are atomic and durable
Thus when the system go back up and running after a
failure
A recovery module is up and running – before allowing
any other transaction to run
It will restore the database to the consistent state as well
as ensure the ACID properties
After that transactions may continue
Notice that information need to be stored during the
normal running of the transaction for the recovery
module to run properly.
Haichang Gao , Software School , Xidian University
97
DataBase System
Recovery – atomicity requirement
Suppose the system crash while a transaction T is being
executed (but not committed)
During recovery, one need to
Find those transactions
Ensure atomicity is held
Abort or complete?
Abort! (in many case, don’t know how to complete anyway)
Abort means restoring the database to the point where those
transactions has not started (as some changes may have
propagated to the disk)
Thus, after the recovery procedure, those transaction should
seems to have NEVER been executed
Haichang Gao , Software School , Xidian University
98
DataBase System
Recovery – atomicity requirement
Thus, during recovery, one need to
Find those transactions that has started but not yet committed
Ensure atomicity is held
Find all the changes to the database that the transactions has
done
Undo all the changes
Haichang Gao , Software School , Xidian University
99
DataBase System
Recovery – durability requirement
Suppose the crash occurs right after a transaction T
committed.
Seems to be ok, but …
T may have written something onto the disk
However, the writes may not have propagated to the disk
Reasons
The writes may be scheduled/buffered but the system
crashed before such writes can execute
The buffer manager/virtual memory management may
decide to put a disk page into main memory for a long time
(save disk access time)
This is especially true on a network/shared file system
Haichang Gao , Software School , Xidian University
100
DataBase System
Recovery – durability requirement
Remember,
when you issue a write() command, the
system does not necessary write what you have onto
disk immediately.
In
some systems, you may issue a flush() command to
force the writes onto the disk
Haichang Gao , Software School , Xidian University
101
DataBase System
Recovery – durability requirement
Thus, during recovery, one need to
Find
those transactions that has committed
Ensure
durability is held
Check if all the changes made by the transactions is
written onto the disk
If not, then redo all the changes
Haichang Gao , Software School , Xidian University
102
DataBase System
Recovery -- overview
Recovery algorithms have two parts
1.
Actions taken during normal transaction processing to ensure enough
information exists to recover from failures
2.
Actions taken after a failure to recover the database contents to a state
that ensures atomicity, consistency and durability
In order to achieve (1), we need to store the information in
stable storage
Haichang Gao , Software School , Xidian University
103
DataBase System
Log files
Logs are needed to record the operations on the
database during normal operations
The log is a sequence of log records, and maintains a
record of update activities on the database.
Haichang Gao , Software School , Xidian University
104
DataBase System
Log files
Different types of log records during normal operations:
Begin record: <Ti, start> -- registered when transaction Ti begins
Write record: <Ti, X, Vold, Vnew> -- registered when a database item
X is updated by Ti, where Vold, Vnew store the old and new values
respectively
Commit record: <Ti, commit> -- registered when Ti commits.
Vold, Vnew also known as before-image & after-image
respectively
Formally, a transaction commits when the commit log record is
written
Abort record: <Ti, abort> -- registered when Ti aborts
Haichang Gao , Software School , Xidian University
105
DataBase System
Log files
With such information, redoing and undoing
operations can be done:
Undo:
copy Vold back to the object.
Redo:
copy Vnew back to the object
Haichang Gao , Software School , Xidian University
106
DataBase System
Log files
Logs are written onto stable storage
Question: given an operation to be logged, should
we log first or execute first?
Consider the case the system fail between the two
operations
Haichang Gao , Software School , Xidian University
107
DataBase System
Log files
Thus logs must be write-ahead logs (WAL)
i.e.
all operations must be logged first
Log
records must be forced to stable storage before actually
operations can be executed
Haichang Gao , Software School , Xidian University
108
DataBase System
Log-based recovery : basic approach
Given a write-ahead log. How should recovery
proceed after the system crash?
Two major steps:
Locating the
Apply
transaction that need works to be done
compensatory action (补偿) on these transactions
Haichang Gao , Software School , Xidian University
109
DataBase System
Log-based recovery : basic approach
Step 1: locating transactions that needed to be dealt
with.
Uncommitted
but active transactions
Need undo
Transactions that has <start T> in log, but not <commit
T>
Committed
transactions
Need redo
Transactions that has <commit T> in log
Haichang Gao , Software School , Xidian University
110
DataBase System
Log-based recovery : basic approach
Example
a)
Undo T0
b)
Undo T1, Redo T0
c)
Redo T0, T1
Haichang Gao , Software School , Xidian University
111
DataBase System
Log-based recovery : basic approach
Step 2: Apply compensatory actions
Redo
& Undo
Requirement:
actions have to be idempotent(等幂)
That is, even if the operation is executed multiple times
the effect is the same as if it is executed once
Haichang Gao , Software School , Xidian University
112
DataBase System
Log-based recovery : basic approach
For undo: copying old value of the item from the log
to the database
For redo: copying new value of the item from the log
to the database
Both operations are idempotent
Haichang Gao , Software School , Xidian University
113