Document

Transcript Document

Database Systems
DataBase System
Major Content & Grade
 Introduction
*
 The
***
Relational Model
 SQL
****
 Transaction
Management
***
 Database
Design (E-R)
***
 Database
Design (Normalization)
***
Haichang Gao , Software School , Xidian University
2
DataBase System
Table of contents
 Motivation
 Basic concept of transactions
 Issues of concurrency control
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
 Operations on databases (SQL commands)
 Queries
: select … from … where…
 Insertions
 Deletions
 Updates
 Create
: insert … values …
: delete … where …
: update … where …
tables, change attributes etc.
 These are basic operations on tables
 But are they “too basic” in real life?
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
 Consider a database for bank accounts
 Basic operations (in the eye of the customers)
 Withdraw
 Deposit
 Transfer
 Dividend
(利息)
 Each basic operations contains multiple database
operations
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
 Example : Transfer $k from x to y (Method 1)
Find tuple for x’s account (database query)
Read x’s account info into main memory
3. Check if x have at least $k
4. Subtract $k from x’s account
5. Write x’s new balance back to the database (database
update)
6. Find tuple for y’s account (database query)
7. Read y’s account info into main memory
8. Add $k to y’s account
9. Write y’s new balance to the database (database update)
1.
2.
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
 One needs to maintain

Consistency/Correctness
 Efficiency
 Consistency/Correctness: The right amount of
money being transferred


Easy to check for normal operations
But what if
 System crashes
 Multiple users want to update same data
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
System crashes, case 1
Find tuple for x’s account (database query)
Read x’s account info into main memory
3. Check if x have at least $k
4. Subtract $k from x’s account
5. Write x’s new balance back to the database (database
update)
crashes! ------------------------------------------- System crashes!
6. Find tuple for y’s account (database query)
7. Read y’s account info into main memory
8. Add $k to y’s account
9. Write y’s new balance to the database (database update)
1.
2.


What is the database like now?
What happen if we don’t do anything about it?
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
System crashes, case 2
Find tuple for x’s account (database query)
Read x’s account info into main memory
3. Check if x have at least $k
4. Subtract $k from x’s account
5. Write x’s new balance back to the database (database
update)
6. Find tuple for y’s account (database query)
7. Read y’s account info into main memory
8. Add $k to y’s account
9. Write y’s new balance to the database (database
update)
1.
2.
crashes! -------------------------------------------- System crashes!


OK?
But what is output which has being buffered?
Haichang Gao , Software School , Xidian University
DataBase System
Motivation

Two potential problems
 System crashes in the middle



Need to make sure the system is consistent after restarting
Some tuples may be updated but others aren’t
System crashes at the “end”



It is unclear if all changes is saved onto the disk
When system crashes, all the unsaved changes is lost
Need to ensure that all changes are reflected
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
 Another problem: multiple users
 Consider another operation, dividend:
Find tuple for x’s account (database query)
2. Find tuple for y’s account (database query)
3. Read x’s account info into main memory
4. Read y’s account info into main memory
5. Add 1% to x’s account
6. Write x’s new balance back to the database (database
update)
7. Add 1% to y’s account
8. Write y’s new balance back to the database (database
update)
1.
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
 Suppose x has $100, y has $200, consider two operations

x transfer $50 to y

Dividend
 If transfer comes before dividend

X : 100 -> 50 -> 50.5

Y : 200 -> 250 -> 252.5
 If dividend comes before transfer

X : 100 -> 101 -> 51

Y : 200 -> 202 -> 252
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
 What if we want concurrent execution?
 What does concurrent mean?
 Can we concurrently run commands without any
limitations?
 What is an acceptable schedule?
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
1.
2.
3.
4.
5.
6.
7.
8.
9.
Find tuple for x’s account (database query)
Read x’s account info into main memory
Check if x have at least $k
Subtract $k from x’s account
Write x’s new balance back to the
1. Find tuple for x’s account (database query)
database (database update)
2. Find tuple for y’s account (database query)
3. Read x’s account info into main memory
4. Read y’s account info into main memory
5. Add 1% to x’s account
6. Write x’s new balance back to the
database (database update)
7. Add 1% to y’s account
8. Write y’s new balance back to the
database (database update)
Find tuple for y’s account (database query)
Read y’s account info into main memory
Add $k to y’s account
Write y’s new balance to the database
(database update)
X : 100 -> 50 -> 50.5;
Y : 200 -> 202 -> 252
Acceptable to the bank, but not the customer….
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
 Thus need to define an acceptable standard of
consistency, in the face of concurrent execution with
other commands
 A plausible definition:
 “If
multiple commands execute concurrently, the results
must looks like that the commands are executed one by
one (sequentially)’’
Haichang Gao , Software School , Xidian University
DataBase System
Motivation
 Many of the problems above can be eliminated if we
 Disable
concurrency
 Forcing
writes to disk immediately
 Do
not write anything until the end of the command
 However this leads to inefficiency
 Thus: how to get the best of both worlds…
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- definition
 A transaction is a unit of program execution that
accesses and possibly updates various data items.
 Can be defined as
 A set of SQL statements
 Stored procedures
 Initiated by high level programming
languages (Java, C++ etc.)
 Delimited by begin transaction & end transaction
 Example:
 Begin transaction
 X = select salary from
person where name = “Chu”
 update person set salary = x * 10 where name = “”
 Update person set salary = x / 10 where name = “”
 End transaction
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- states
 A transaction can be in any one of the 5 states:
 Active,
the initial state; the transaction stays in this state
while it is executing
 Partially committed, after the final statement has been
executed.
 Failed, after the discovery that normal execution can no
longer proceed.
 Aborted, after the transaction has been rolled back and the
database restored to its state prior to the start of the
transaction. Two options after it has been aborted:
 restart the transaction – only if no internal logical error
 kill the transaction
 Committed, after successful completion.
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- states
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- states
 A transaction need not commit immediately after its
last statement
 Why?
 It is the DBMS’s responsibility to determine which
transactions can commit and which to abort
 Also, it is the DBMS’s responsibility to clean up
(roll back) after a transaction aborts
 Possibility of cascade aborts
Haichang Gao , Software School , Xidian University
DataBase System
DataBase System
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- consistency
 During transaction execution the database may be
inconsistent. When the transaction is committed,
the database must be consistent.
 Two main issues to deal with:
 Failures
of various kinds, such as hardware failures and
system crashes
 Concurrent
execution of multiple transactions
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- ACID
 Four basic properties that transactions must be
maintained
 Atomicity (原子性): All or nothing
 Consistency (一致性) : Each transaction must ensure data
consistency
 Isolation (隔离性): Transactions “unaware” of other concurrent
transaction
 Durability (持久性): Once committed, changes to database
must be persistent
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- ACID
 Atomicity : All or nothing
 i.e. : Either all operations of the transaction are properly
reflected in the database or none are.
 Implications
 If
the system crashes in the middle of a transaction T, when
the system restarts, before any user can use the database
again, the DBMS must ensure either
 T is finished
 T never started
 Which do you think is easier? Make more sense?
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- ACID
 Consistency: Each transaction must ensure data
consistency
 i.e. Execution of a transaction in isolation preserves the
consistency of the database.
 Thus all integrity and other constraints must be
satisfied
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- ACID
 Isolation : Transactions “unaware” of other concurrent
transaction
 i.e. : Although multiple transactions may execute concurrently,
each transaction must be unaware of other concurrently
executing transactions. Intermediate transaction results must be
hidden from other concurrently executed transactions.
 Implications:
 for
every pair of transactions Ti and Tj, it appears to Ti that
either Tj, finished execution before Ti started, or Tj started
execution after Ti finished.
 Some level of interleaving are not allowed
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics -- ACID
 Durability : Once committed, changes to database
must be persistent
 i.e. : After a transaction completes successfully, the
changes it has made to the database persist, even if
there are system failures.
 Implications:
 Suppose
a transaction commits, and then the system crashes.
When the system restarts, before any user can use the
database again, the DBMS must ensure that the changes
made by this transaction is stored onto the disk.
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics – DBMS support
 2 major tasks in DBMS to handle transactions

Concurrency control



Recovery



Handle how concurrent transaction is executed
Goal: Isolation
Handle how to recover a database after a failure
Goal: Atomicity & Durability
Consistency is maintained throughout various part of the DBMS (not
the focus of this course)
 Many systems rolls the two part together as a “transaction
manager”
Haichang Gao , Software School , Xidian University
DataBase System
Transaction basics – DBMS support
 What makes transaction processing tricky(灵活的)

Scheduling is hidden from the DBMS


DBMS cannot enforce which transaction to execute next
Buffer management is hidden from DBMS

Although the transaction write something onto the database, it
is only written to the buffers, to be transferred to the disk at
unspecified time

One can force transfer immediately, but will be very inefficient
Haichang Gao , Software School , Xidian University
DataBase System
Table of contents
 Concurrency control
 Serializability
 Motivating
 Conflict
 Test
examples
serializability
of serializability
Haichang Gao , Software School , Xidian University
DataBase System
Concurrency control
 Why concurrency?
 increased
processor and disk utilization, leading to better
transaction throughput: one transaction can be using the CPU
while another is reading from or writing to the disk
 reduced
average response time for transactions: short
transactions need not wait behind long ones.
 Shared
resources. Many transaction may want to access the
same object/tuple.
 Isolation.
One of the key requirement of transactions
Haichang Gao , Software School , Xidian University
DataBase System
Concurrency control -- schedule
 Schedules – sequences that indicate the chronological
order(时间顺序) in which instructions of concurrent
transactions are executed
a
schedule for a set of transactions must consist of all
instructions of those transactions
 must preserve the order in which the instructions appear in
each individual transaction.
 Assumption: at any time, only one operation from
one transaction can be executed
 However, DBMS may interleave operations (交叉执
行) from multiple transactions
Haichang Gao , Software School , Xidian University
DataBase System
Concurrency control -- schedule
 Serial schedule: schedules that does not allow
interleaving between transactions (i.e. one transaction
finishes before the other begin)
 Equivalent schedules: two schedules are equivalent if
they “produce the same results”
Haichang Gao , Software School , Xidian University
DataBase System
Notation used
 Database consists of objects (X, Y, Z) , each of them is
an integer
 Transactions are labeled T1, T2 etc.
 Each transaction has a set of local variables (not
accessible by other transactions) in main memory.
(Labeled as a1, a2, b1, b2 etc.)
 Each transaction access the database by the read() &
write() command
Haichang Gao , Software School , Xidian University
DataBase System
Notation used
 A read command read a database object into a local variables (a1
<- read(X) )
 A write command write a local variable into the database object
(write(X, a1) )
 Local variables for read() & write() will not be shown if the
context is clear, or if it is unimportant
 Manipulation and calculation on objects can only be done on
local variables (e.g. X <- X + 1 is not allowed, but a1 <- a1+ 1 is
ok)
 In some case, the local manipulation is not shown (to highlight
the effects of read() and write())
Haichang Gao , Software School , Xidian University
DataBase System
Notation used
 Example; (transfer, assuming overdraft(透支) is
allowed)
1.
a1 <- Read(X)
2.
a1 <- a1 – k
3.
Write(X, a1)
4.
a2 <- Read(Y)
5.
a2 <- a2 + k
6.
Write(Y, a2)
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example
 Consider the fund transfer operation
1. Find tuple for x’s account
(database query)
2. Read x’s account info into
main memory
3. Check if x have at least $k
Rewrite
4. Subtract $k from x’s
Using class
account
notation
5. Write x’s new balance back
to the database (database
update)
6. Find tuple for y’s account
(database query)
7. Read y’s account info into
main memory
8. Add $k to y’s account
9. Write y’s new balance to
the database (database
update)
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example
 Consider the dividend operation
1. Find tuple for x’s account
(database query)
2. Read x’s account info into
main memory
3. Add 1% to x’s account
4. Write x’s new balance
back to the database
(database update)
5. Find tuple for y’s account
(database query)
6. Read y’s account info into
main memory
7. Add 1% to y’s account
8. Write y’s new balance
back to the database
(database update)
Rewrite
Using class
notation
1. A1 <- Read(X)
2. A1 <- A1* 1.01
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 * 1.01
6. Write(Y, A2)
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example
 Suppose x has $100, y has $200
 Consider two operations


x transfer $50 to y
Dividend
 For serial schedules

If transfer comes before dividend



If dividend comes before transfer



X : 100 -> 50 -> 50.5
Y : 200 -> 250 -> 252.5
X : 100 -> 101 -> 51
Y : 200 -> 202 -> 252
In both case, X + Y = 303
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example (2)
 But with the following schedule
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
1.
2.
3.
4.
5.
6.
A1 <- Read(X)
A1 <- A1* 1.01
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 * 1.01
Write(Y, A2)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
X: 100 -> 50 -> 50.5;
Y: 200 -> 202 -> 252;
X+Y= 302.5
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example (2)
 What cause the problem?
 Contention
of resources(资源竞争)?
 Interleaving(交叉执行)?
 Is interleaving always bad?
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example (2)
 But with the following schedule
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
1. A1 <- Read(X)
2. A1 <- A1* 1.01
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 * 1.01
6. Write(Y, A2)
X: 100 -> 50 -> 50.5;
Y: 200 -> 250 -> 252.5;
X+Y = 303
In this case, interleaving is ok!
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example (2)
 Let’s change slightly:
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
X: 100 -> 50 -> 50.5;
1.
2.
3.
4.
A1 <- Read(X)
A1 <- A1* 1.01
Write(X, A1)
A2 <- Read(Y)
5. A2 <- A2 * 1.01
6. Write(Y, A2)
Y: 200 -> 250 -> 202;
X+Y = 252.5
In this case, interleaving is very bad!
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example (2)
 Let’s change slightly (again):
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
X: 100 -> 50 -> 50.5;
1. A1 <- Read(X)
2. A1 <- A1* 1.01
3.
4.
5.
6.
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 * 1.01
Write(Y, A2)
Y: 200 -> 250 -> 252.5;
X+Y = 303
In this case, interleaving is good again!
Haichang Gao , Software School , Xidian University
DataBase System
Serializable schedule – example (2)
 What’s going on here?
 Interleaving can
 However,
 How
be very bad.
some interleaving does not cause problems.
can we determine what kind of interleaving is “nice”?
Haichang Gao , Software School , Xidian University
DataBase System
Serializability
 How to formalize the notion?

One can look at final results

If the schedule produce the same result as a serial schedule, then it’s fine.

However, this may be due to luck and/or “commutative”(可交换的)
operators
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
1.
2.
3.
4.
5.
6.
A1 <- Read(X)
A1 <- A1 + m
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 - m
Write(Y, A2)
A better notion is needed
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability
 Suppose two transactions (T1, T2) want to operate on the
same data object (X)
 Four possible scenarios

T1 Read(X), T2 Read(X)

T1 Read(X), T2 Write(X)

T1 Write(X), T2 Read(X)

T1 Write(X), T2 Write(X)
 How does the order of these operations affect the results of
the transactions?
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
 Remember this schedule?
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
X: 100 -> 50 -> 50.5;
i. A1 <- Read(X)
ii. A1 <- A1* 1.01
iii. Write(X, A1)
iv. A2 <- Read(Y)
v. A2 <- A2 * 1.01
vi. Write(Y, A2)
Y: 200 -> 250 -> 252.5;
X+Y = 303
In this case, interleaving is ok!
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
 Remember this schedule?
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
i.
ii.
A1 <- Read(X)
A1 <- A1* 1.01
iii. Write(X, A1)
iv. A2 <- Read(Y)
v. A2 <- A2 * 1.01
vi. Write(Y, A2)
(4) And (iii) are not in conflict, so can swap
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
 Remember this schedule?
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
i.
A1 <- Read(X)
ii. A1 <- A1* 1.01
iii. Write(X, A1)
iv. A2 <- Read(Y)
v. A2 <- A2 * 1.01
vi. Write(Y, A2)
(4) And (ii) are not in conflict, so can swap
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
 Remember this schedule?
1.
2.
3.
4.
A1 <- Read(X)
A1 <- A1 – k
Write(X, A1)
A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
i. A1 <- Read(X)
ii. A1 <- A1* 1.01
iii. Write(X, A1)
iv. A2 <- Read(Y)
v. A2 <- A2 * 1.01
vi. Write(Y, A2)
(4) And (i) are not in conflict, so can swap
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
 Remember this schedule?
1.
2.
3.
4.
5.
A1 <- Read(X)
A1 <- A1 – k
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 + k
6. Write(Y, A2)
i. A1 <- Read(X)
ii. A1 <- A1* 1.01
iii. Write(X, A1)
iv. A2 <- Read(Y)
v. A2 <- A2 * 1.01
vi. Write(Y, A2)
Similarily (5) And (i) – (iii) are not in conflict, so can swap
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
 Remember this schedule?
1.
2.
3.
4.
5.
6.
A1 <- Read(X)
A1 <- A1 – k
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 + k
Write(Y, A2)
i.
ii.
iii.
iv.
v.
vi.
A1 <- Read(X)
A1 <- A1* 1.01
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 * 1.01
Write(Y, A2)
Similarily (6) And (i) – (iii) are not in conflict, so can swap
We obtain a serial schedule by this transformation (swapping process)
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
 Now, remember this schedule?
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
X: 100 -> 50 -> 50.5;
i.
ii.
iii.
iv.
v.
vi.
A1 <- Read(X)
A1 <- A1* 1.01
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 * 1.01
Write(Y, A2)
Y: 200 -> 202 -> 252;
X+Y = 302.5
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
 Now, remember this schedule?
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
i.
ii.
iii.
iv.
v.
vi.
A1 <- Read(X)
A1 <- A1* 1.01
Write(X, A1)
A2 <- Read(Y)
A2 <- A2 * 1.01
Write(Y, A2)
(3) And (i) has conflict, so can’t swap
(4) And (vi) has conflict, so can’t swap
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- example
 Can you work through this case?
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
X: 100 -> 50 -> 50.5;
1.
2.
3.
4.
A1 <- Read(X)
A1 <- A1* 1.01
Write(X, A1)
A2 <- Read(Y)
5. A2 <- A2 * 1.01
6. Write(Y, A2)
Y: 200 -> 250 -> 202;
X+Y = 250.5
In this case, interleaving is very bad!
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability
 From previous slides, it seems
schedule can be transformed to a serial schedule  good!
(achieve isolation)
A
schedule cannot be transformed to a serial schedule  bad!
(do not achieve isolation)
A
 Can we generalize?
 Yes.
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- definitions
 Schedule: sequences that indicate the chronological order in
which instructions of concurrent transactions are executed
 Complete schedule: Schedule that contain commit/abort
decision for each transaction in the schedule
 Serial schedule: A schedule where there is no interleaving of
operations by multiple transactions.

Denoted by < T1, T2, … , Tn>
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- definitions
 Conflict equivalent transformation: swapping adjacent
operation on a schedule which does not conflict
 Conflict equivalent(冲突等价): Two schedules S and S’ are
conflict equivalent if S can be transformed to S’ by successive
conflict equivalent transformations
Haichang Gao , Software School , Xidian University
DataBase System
Conflict serializability -- definitions
 Conflict serializability(冲突可串行化): a schedule S
is conflict serializable if it is conflict equivalent to a
serial schedule
 Example of a schedule that is not conflict serializable:
T3
read(Q)
T4
write(Q)
write(Q)
We are unable to swap instructions in the above schedule to obtain either the
serial schedule < T3, T4 >, or the serial schedule < T4, T3 >.
Haichang Gao , Software School , Xidian University
DataBase System
Test for serializability
 Consider the following schedules:
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
i.
ii.
iii.
iv.
v.
vi.
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
i. A1 <- Read(X)
ii. A1 <- A1* 1.01
iii. Write(X, A1)
A1 <- Read(X)
A1 <- A1* 1.01
Write(X, A1)
A2 <- Read(Y)
4. A2 <- Read(Y)
A2 <- A2 * 1.01 5. A2 <- A2 + k
Write(Y, A2)
6. Write(Y, A2)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
Not serializable
iv. A2 <- Read(Y)
v. A2 <- A2 * 1.01
vi. Write(Y, A2)
Serializable
Haichang Gao , Software School , Xidian University
DataBase System
How to guarantee serializable
 Need to provide protocol (rules on how data item is accessed)
to ensure conflict serializability
 Goal of protocol:
 To allow access for data items that are not required by
multiple transactions
 For those data items required by multiple transaction, restrict
access in some way, or limit it to exclusive access
 Balance between safety and efficiency
 Too restrictive: little or no concurrency, ineffective
 Too lenient: leads to inconsistency
Haichang Gao , Software School , Xidian University
DataBase System
Lock-based protocols
 “Exclusive” access  locks
 Each database item is associated with locks
 Transaction must obtain locks before accessing the
object
 Transaction must release lock when it finishes.
Haichang Gao , Software School , Xidian University
DataBase System
Lock-based protocols
 Example:
Lock(X)
Read(X)
Unlock(X)
Read(X)
 Lock(X): check if object X is already locked


If not, obtain the lock
If so, wait or “do something” to handle the potential deadlock (like
aborting)
 Unlock(X): release the lock on object X
 The addition of Lock(X) and Unlock(X) commands are done
by the DBMS
Haichang Gao , Software School , Xidian University
DataBase System
Lock-based protocols: S and X locks
 Two kinds of locks on each object
 Shared
locks (S-locks)
 Requested before reading
 Multiple transactions can hold a S-lock on an object
simultaneously
 Exclusive locks (X-locks)
 Requested before writing
 Only one transaction can hold an X-lock on an object at
any given time
 No other transaction can hold any lock (not even a S-lock)
if some transaction has an X-lock
Haichang Gao , Software School , Xidian University
DataBase System
Lock-based protocols: S and X locks
 More on S and X locks
A
transaction that holds an S-lock on an object can read the
object
A
transaction that holds an X-lock on an object can read and
write the objects
 Lock-compatibility table
T2 holds
S-lock
X-lock
S-lock
True
False
X-lock
False
False
T1 Request
Haichang Gao , Software School , Xidian University
DataBase System
Lock-based protocol
 Consider the following examples (again):
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
T1 (Transfer)
1. A1 <- Read(X)
2. A1 <- A1* 1.01
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 * 1.01
6. Write(Y, A2)
T2 (Dividend)
Haichang Gao , Software School , Xidian University
DataBase System
Lock-based protocol -- example
 With a lock-based protocol, one possible way T1 is
transformed
1. A1 <- Read(X)
2. A1 <- A1 – k
3. Write(X, A1)
4. A2 <- Read(Y)
5. A2 <- A2 + k
6. Write(Y, A2)
T1 (Transfer)
1. S-lock(X)
2. A1 <- Read(X)
3. A1 <- A1 – k
4. X-lock(X)
5. Write(X, A1)
6. X-lock(Y)
7. A2 <- Read(Y)
8. A2 <- A2 + k
9. Write(Y, A2)
10.Unlock(X)
11.Unlock(Y)
Haichang Gao , Software School , Xidian University
DataBase System
Lock-based protocol -- example
 Notes from the previous example:

Locks must be obtained before read/write can begin

If a transaction want to read and write the same object, it can either


Obtain an X-lock before reading

Obtain an S-lock before reading, then obtain an X-lock before
writing (it is not automatically granted)
A transaction does not have to release locks immediately after use

Good or bad?
Haichang Gao , Software School , Xidian University
DataBase System
Two phase locking – definition
 The basic two-phase locking (2PL) protocol



A transaction T must hold a lock on an item x in the appropriate
mode before T accesses x.
If a conflicting lock on x is being held by another transaction, T waits.
Once T releases a lock, it cannot obtain any other lock subsequently.
 Note: a transaction is divided into two phases:


A growing phase (obtaining locks)
A shrinking phase (releasing locks)
 Claim : 2PL ensures conflict serializability
Haichang Gao , Software School , Xidian University
DataBase System
Two phase locking – Serializability
 Lock-point: the point where the transaction obtains
all the locks
 With 2PL, a schedule is conflict equivalent to a a
serial schedule ordered by the lock-point of the
transactions
Haichang Gao , Software School , Xidian University
DataBase System
2-phase locking -- example
1.
2.
3.
4.
S-lock(X)
A1 <- Read(X)
A1 <- A1 – k
X-lock(X)
5. Write(X, A1)
6. S-lock(Y)
7. A2 <- Read(Y)
8. A2 <- A2 + k
9. X-lock(Y)
10. Write(Y, A2)
11. Unlock(X)
Lock point for T1
12. Unlock(Y)
T1
1. S-lock(X)
1.
2.
3.
4.
5.
6.
S-lock(X)
A1 <- Read(X)
A1 <- A1* 1.01
X-lock(X)
Write(X, A1)
S-lock(Y)
T2 waits
T2 waits
6. S-lock(Y)
7. A2 <- Read(Y)
8. A2 <- A2 * 1.01
Lock point for T2
9. X-lock(Y)
10. Write(Y, A2)
11. Unlock(Y)
T2
Haichang
Gao
,
Software
School
, Xidian University
12. Unlock(X)
DataBase System
DataBase System
Haichang Gao , Software School , Xidian University
DataBase System
Table of contents
 Motivation
 Basic concept of transactions
 Issues of concurrency control
Haichang Gao , Software School , Xidian University
DataBase System
Deadlocks
 2 phase locking is a blocking protocol (transaction
has to wait if it cannot obtain a lock)
 Probability of a deadlock(死锁)
 Example:
1. S-lock(X)
2. A1 <- Read(X)
3. A1 <- A1 – k
Can’t proceed
T2 has S-lock
on X
4. X-lock(X)
1. S-lock(X)
2. A1 <- Read(X)
3. A1 <- A1* 1.01
4. X-lock(X)
Can’t proceed
T1 has S-lock
on X
Deadlock!
Haichang Gao , Software School , Xidian University
DataBase System
Deadlocks
 How to deal with deadlocks?
 Ostrich (驼鸟策略)
 Pretend nothing happens and wait for user to hit the <ctrl-alt-del> key
 Timeout(超时法)
 Assume deadlock after there is no progress for some time, and do
something about it
 Detection and recovery(检测与恢复)
 Wait until a deadlock occurs and do something about it
 Avoidance(避免死锁)
 Wait until a deadlock can occur if certain operations is executed and
do something about it
 Prevention(预防法)
 Set up the system such that there is never a chance of deadlocks
Haichang Gao , Software School , Xidian University
DataBase System
Deadlocks -- timeout
 When transaction waits too long for a lock, it is
aborted and restarted again from the beginning
 The timeout period should be:
 long
enough so that most transactions that are aborted are
actually deadlocked;
 short
enough so that deadlocked transactions don’t wait too
long for their deadlocks to be broken.
Haichang Gao , Software School , Xidian University
DataBase System
Deadlocks -- Detection
 Deadlocks can be described as a wait-for graph, which
consists of a pair G = (V,E),
V
is a set of vertices (all the transactions in the system)
E
is a set of edges; each element is an ordered pair Ti Tj.
Haichang Gao , Software School , Xidian University
DataBase System
Deadlocks -- Detection
 When deadlock is detected :
 Some
transaction will have to rolled back (made a victim)
to break deadlock. Select that transaction as victim that
will incur minimum cost.
 Rollback -- determine how far to roll back transaction


Total rollback: Abort the transaction and then restart it.
More effective to roll back transaction only as far as necessary to
break deadlock. (but tricky to implement)
 Drawback:
 maintaining wait-for-graph
is not cheap
 Detecting cycles is an overhead
Haichang Gao , Software School , Xidian University
DataBase System
The phantom(幻影) menace
 Consider the following 2 transactions
1. Select sum(salary)
From faculty
Where dept = “CS”
2. Select sum(salary)
From faculty
Where dept = “Math”
T1
a. Insert into faculty
values (“Lin”, “CS”, 1000)
b. Insert into faculty
values (“Lam”, “Math”, 5000)
T2
 There does not seems to be a conflict (in terms of tuple)
 Assume initially CS faculty have total salary 1,000,000 and
Math faculty have total salary 2,000,000
 Then T1 -> T2 will imply the select statements return
1,000,000 and 2,000,000
 T2 -> T1 will imply the select statements return 1,001,000
and 2,005,000
Haichang Gao , Software School , Xidian University
DataBase System
The phantom menace
 But consider the following schedule
1. Select sum(salary)
From faculty
Where dept = “CS”
2. Select sum(salary)
From faculty
Where dept = “Math”
a. Insert into faculty
values (“Lin”, “CS”, 1000)
b. Insert into faculty
values (“Lam”, “Math”, 5000)
T1
T2
 The output will be 1,000,100 and 2,000,000
 Not conflict serializable!
Haichang Gao , Software School , Xidian University
DataBase System
The phantom problem
 This is known as the phantom problem
 Why does it occur?


No tuples are in conflict
However, conflict occurs for tuples that satisfy a certain condition
(dept = “CS”, dept = “Math”)
 T1 require access for ALL tuples satisfying the condition
 However, T2 changes the number of tuples satisfying the condition
 No quick solution: index-locking as a possibility
Haichang Gao , Software School , Xidian University
DataBase System
Isolation levels
 The goal of locking is to ensure serializability
 To ensure serializability, we require 2PL.
 No
new locks are acquired once releasing locks begin
 Locks acquired need to be held for a long time (long locks)



One cannot acquire a lock, done work with it, and then release it
immediately
Adv: ensure serializability
Dis: less concurrency
 Is serializability ( long locks/two-phase locking) necessary
in all cases?
Haichang Gao , Software School , Xidian University
Serializability – In practice
DataBase System
 Data manipulation language must include a construct for
specifying the set of actions that comprise a transaction.
 In SQL, a transaction begins implicitly.
 A transaction in SQL ends by:


Commit work commits current transaction and begins a new one.
Rollback work causes current transaction to abort.
 Levels of consistency specified by SQL-92:




Serializable — default
Repeatable read
Read committed
Read uncommitted
Haichang Gao , Software School , Xidian University
Serializability – In practice
DataBase System
 Serializable — default
 Repeatable read — only committed records to be read, repeated
reads of same record must return same value. However, a
transaction may not be serializable – it may find some records
inserted by a transaction but not find others.
 Read committed — only committed records can be read, but
successive reads of record may return different (but committed)
values.
 Read uncommitted — even uncommitted records may be read.
Haichang Gao , Software School , Xidian University
DataBase System
Levels of Isolation
SQL:
SET TRANSACTION （READ ONLY | READ WRITE)
ISOLATION LEVEL (READ UNCOMMITTED | READ COMMITTED |
REPEATABLE READ | SERIALIZABLE);
Two facts of the transaction defined by the statement：
1). Type of the transaction (Read only/Read Write)
2). Levels of isolation
Haichang Gao , Software School , Xidian University
DataBase System
Levels of Isolation
四种隔离级别的读写锁
短期锁(short-term lock)：只是在执行存取操作的一段足够长的时间内保留；
长期锁(long-term lock): 这种锁将一直保持直到事务提交为止。
Read
Uncommitted
(dirty reads)
Read Committed
Write locks on rows
of a table are longterm
Read locks on rows of
a table are long-term
Read and write locks on
predicates are long-term
NO (but it’s readonly)
No Read locks at all
No predicate locks at all
No
Short-term Read predicate
locks
Long-term Write predicate
locks
Yes
Repeatable Read
Yes
Yes
Short-term Read predicate
locks
Long-term Write predicate
locks
Serializable
Yes
Yes
Long-term Read and Write
predicate locks
Haichang Gao , Software School , Xidian University
DataBase System
Levels of Isolation
1) Read Uncommitted
特点：
· 该隔离级别执行的事务只能是读事务，而不允许更新事务；
· 该隔离级别执行的事务无论访问什么数据都不(需要)加锁。
故该隔离级别可能Read到一个Uncommitted结果。
不会产生丢失修改，但是会产生读脏数据。
Haichang Gao , Software School , Xidian University
DataBase System
Levels of Isolation
2) Read Committed
特点：
· 该隔离级别下加的写锁为长期锁，读锁为短期锁；
短期读谓词锁，长期写谓词锁．
· 该隔离级别下对于三类冲突的两种不会发生：
a. Wi(A) →Rj(A)
(2) 故Read到的永远是Committed结果。(读脏不会出现)
b. Wi(A) →Wj(A) (3) (一般的丢失修改不会发生)
· 该隔离级别下对于三类冲突的一种会发生：
c. Ri(A) → Wj(A) (1)
故会导致如下二类问题：
① 不可重复读
② Scholar’s Lost Update Anomaly
Haichang Gao , Software School , Xidian University
DataBase System
Levels of Isolation
3) Repeatable Read
特点：
· 该隔离级别下加的读、写锁均为长期锁；并且仅对读谓词加的是短期锁，写
谓
词加为长期锁。
该隔离级别消除了丢失修改、读脏数据、不可重复读异常。但由于其对读谓词
加的是短期锁，故该隔离级别消除不了“幽灵更新异常”(phantom update anomaly)
。
4) Serializability (an Phantom Updates)
特点：
· 该隔离级别下加的读、写锁均为长期锁；并且对读、写谓词加的均为长期锁
。
该隔离级别可保证并发的事务串行化并消除所有的异常。
Haichang Gao , Software School , Xidian University
DataBase System
Table of contents
 Recovery(恢复)
Haichang Gao , Software School , Xidian University
DataBase System
Recovery – why?
 ACID properties
 Atomicity:
all-or-nothing
 Durability: Once
committed, changes must be permanent
 If the system is always functional, that’s ok.
 However, system may crash (failure).
Haichang Gao , Software School , Xidian University
92
Recovery – why?
DataBase System
 Types of failures:
 Transaction
failure :

Logical errors: transaction cannot complete due to some
internal error condition (e.g. your program have a division
by zero error)

System errors: the database system must terminate an
active transaction due to an error condition (e.g., deadlock)
Haichang Gao , Software School , Xidian University
93
Recovery – why?
DataBase System
 Types of failures (ctd)
 System
crash: a power failure or other hardware or software
failure causes the system to crash.

Fail-stop assumption: non-volatile storage (非易失性存储器)
contents are assumed to not be corrupted by system crash

Database systems have numerous integrity checks to prevent
corruption of disk data
Haichang Gao , Software School , Xidian University
94
Recovery – why?
DataBase System
 Types of failures (ctd):
 Disk
failure: a head crash or similar disk failure destroys all
or part of disk storage
 Destruction is assumed to be detectable: disk drives use
checksums to detect failures
 With some disk systems, some failures maybe correctable.
 We focus on transaction/system failure as well as
other failures that are correctable.
Haichang Gao , Software School , Xidian University
95
DataBase System
Recovery – why?
 Failure can occur ANY time
 Including
the time that you do not want
 We
do assume in this class that writing a page is atomic (i.e.
failure do not occur in the middle of writing a page)
 Thus need to maintain atomicity and durability at
all times
Haichang Gao , Software School , Xidian University
96
DataBase System
Recovery – why?
 After failure, the system needs to recover
 Need
 Need
to recover to a consistent state
to ensure all transactions are atomic and durable
 Thus when the system go back up and running after a
failure
 A recovery module is up and running – before allowing
any other transaction to run
 It will restore the database to the consistent state as well
as ensure the ACID properties
 After that transactions may continue
 Notice that information need to be stored during the
normal running of the transaction for the recovery
module to run properly.
Haichang Gao , Software School , Xidian University
97
DataBase System
Recovery – atomicity requirement
 Suppose the system crash while a transaction T is being
executed (but not committed)
 During recovery, one need to


Find those transactions
Ensure atomicity is held
 Abort or complete?
 Abort! (in many case, don’t know how to complete anyway)
 Abort means restoring the database to the point where those
transactions has not started (as some changes may have
propagated to the disk)
 Thus, after the recovery procedure, those transaction should
seems to have NEVER been executed
Haichang Gao , Software School , Xidian University
98
DataBase System
Recovery – atomicity requirement
 Thus, during recovery, one need to


Find those transactions that has started but not yet committed
Ensure atomicity is held


Find all the changes to the database that the transactions has
done
Undo all the changes
Haichang Gao , Software School , Xidian University
99
DataBase System
Recovery – durability requirement
 Suppose the crash occurs right after a transaction T
committed.
 Seems to be ok, but …



T may have written something onto the disk
However, the writes may not have propagated to the disk
Reasons


The writes may be scheduled/buffered but the system
crashed before such writes can execute
The buffer manager/virtual memory management may
decide to put a disk page into main memory for a long time
(save disk access time)

This is especially true on a network/shared file system
Haichang Gao , Software School , Xidian University
100
DataBase System
Recovery – durability requirement
 Remember,
when you issue a write() command, the
system does not necessary write what you have onto
disk immediately.
 In
some systems, you may issue a flush() command to
force the writes onto the disk
Haichang Gao , Software School , Xidian University
101
DataBase System
Recovery – durability requirement
 Thus, during recovery, one need to
 Find
those transactions that has committed
 Ensure
durability is held

Check if all the changes made by the transactions is
written onto the disk

If not, then redo all the changes
Haichang Gao , Software School , Xidian University
102
DataBase System
Recovery -- overview
 Recovery algorithms have two parts
1.
Actions taken during normal transaction processing to ensure enough
information exists to recover from failures
2.
Actions taken after a failure to recover the database contents to a state
that ensures atomicity, consistency and durability
 In order to achieve (1), we need to store the information in
stable storage
Haichang Gao , Software School , Xidian University
103
DataBase System
Log files
 Logs are needed to record the operations on the
database during normal operations
 The log is a sequence of log records, and maintains a
record of update activities on the database.
Haichang Gao , Software School , Xidian University
104
DataBase System
Log files
 Different types of log records during normal operations:


Begin record: <Ti, start> -- registered when transaction Ti begins
Write record: <Ti, X, Vold, Vnew> -- registered when a database item
X is updated by Ti, where Vold, Vnew store the old and new values
respectively


Commit record: <Ti, commit> -- registered when Ti commits.


Vold, Vnew also known as before-image & after-image
respectively
Formally, a transaction commits when the commit log record is
written
Abort record: <Ti, abort> -- registered when Ti aborts
Haichang Gao , Software School , Xidian University
105
DataBase System
Log files
 With such information, redoing and undoing
operations can be done:
 Undo:
copy Vold back to the object.
 Redo:
copy Vnew back to the object
Haichang Gao , Software School , Xidian University
106
DataBase System
Log files
 Logs are written onto stable storage
 Question: given an operation to be logged, should
we log first or execute first?
 Consider the case the system fail between the two
operations
Haichang Gao , Software School , Xidian University
107
DataBase System
Log files
 Thus logs must be write-ahead logs (WAL)
 i.e.
all operations must be logged first
 Log
records must be forced to stable storage before actually
operations can be executed
Haichang Gao , Software School , Xidian University
108
DataBase System
Log-based recovery : basic approach
 Given a write-ahead log. How should recovery
proceed after the system crash?
 Two major steps:
 Locating the
 Apply
transaction that need works to be done
compensatory action （补偿） on these transactions
Haichang Gao , Software School , Xidian University
109
DataBase System
Log-based recovery : basic approach
 Step 1: locating transactions that needed to be dealt
with.
 Uncommitted
but active transactions

Need undo

Transactions that has <start T> in log, but not <commit
T>
 Committed
transactions

Need redo

Transactions that has <commit T> in log
Haichang Gao , Software School , Xidian University
110
DataBase System
Log-based recovery : basic approach
 Example
a)
Undo T0
b)
Undo T1, Redo T0
c)
Redo T0, T1
Haichang Gao , Software School , Xidian University
111
DataBase System
Log-based recovery : basic approach
 Step 2: Apply compensatory actions
 Redo
& Undo
 Requirement:

actions have to be idempotent(等幂)
That is, even if the operation is executed multiple times
the effect is the same as if it is executed once
Haichang Gao , Software School , Xidian University
112
DataBase System
Log-based recovery : basic approach
 For undo: copying old value of the item from the log
to the database
 For redo: copying new value of the item from the log
to the database
 Both operations are idempotent
Haichang Gao , Software School , Xidian University
113

Document

Transcript Document

Directory