Data Integrity

Download Report

Transcript Data Integrity

15.1
General Overview
Where we’ve been...
DBA skills for relational DB’s:
Logical Schema Design
 E/R diagrams
 Decomposition and Normalization
Query languages
 RA, RC, SQL
Integrity Constraints
Transactions
Database Implementation Issues
Files, buffering, indexing
Today: Data Integrity
15.2
What Does a DBMS Manage?
1. Data organization



E/R Model

Relational Model
2. Data Retrieval

Relational Algebra

Relational Calculus

SQL
3. Data Integrity

Integrity Constraints

Transactions
15.3
15.4
Updates in SQL
An example:
UPDATE account
SET balance = balance -50
WHERE acct_no = A102
What takes place:
memory
…
Disk
Dntn: A102: 300
Dntn: A15: 500
…
(1) Read
(2) update
Transaction:
(3) write
1.
2.
3.
Read(A)
A <- A -50
Write(A)
Mian: A142: 300
15.5
account
The Threat to Data Integrity
Consistent DB
Name Acct
bal
-------- ------ -----Joe
A-33 300
Joe A-509 100
Inconsistent DB
Joe’s total: 400
Name Acct
bal
-------- ------ -----Joe
A-33 250
Joe A-509 100
transaction
Joe’s total: 350
Consistent DB
What a Xaction should
look like to Joe
Name Acct
bal
-------- ------ -----Joe
A-33 250
Joe A-509 150
What actually happens
during execution
Joe’s total: 400
15.6
Transactions
What?:
 Updates to db that can be executed concurrently
Why?:
(1) Updates can require multiple reads, writes on a db
e.g., transfer $50 from A-33 to A509
=
read(A)
A  A -50
write(A)
read(B)
BB+50
write(B)
(2) For performance reasons, db’s permit updates to be executed
concurrently
Concern: concurrent access/updates of data can compromise data integrity
15.7
ACID Properties
Properties that a Xaction needs to have:
Atomicity: either all operations in a Xaction take effect, or
none
Consistency: operations, taken together preserve db
consistency
Isolation: intermediate, inconsistent states must be concealed
from other Xactions
Durability. If a Xaction successfully completes (“commits”),
changes made to db must persist, even if system crashes
15.8
Demonstrating ACID
Transaction to transfer $50 from account A to account B:
1.
2.
3.
4.
5.
6.
read(A)
A := A – 50
write(A)
read(B)
B := B + 50
write(B)
Consistency: total value A+B, unchanged by Xaction
Atomicity: if Xaction fails after 3 and before 6, 3 should not affect db
Durability: once user notified of Xaction commit, updates to A,B should
not be undone by system failure
Isolation: other Xactions should not be able to see A, B between steps 3-6
15.9
Threats to ACID
1. Programmer Error
e.g.: $50 substracted from A, $30 added to B
 threatens consistency
2. System Failures
e.g.: crash after write(A) and before write(B)
 threatens atomicity
e.g.: crash after write(B)
 threatens durability
3. Concurrency
E.g.: concurrent Xaction reads A, B between steps 3-6
 threatens isolation
15.10
Isolation
Simplest way to guarantee: forbid concurrent Xactions!
But, concurrency is desirable:
(1) Achieves better throughput (TPS: transactions per second)
one Xaction can use CPU while another is waiting for disk to
service request
(2) Achieves better average response time
short Xactions don’t need to get stuck behind long ones
Prohibiting concurrency is not an option
15.11
Isolation
Approach to ensuring Isolation:
 Distinguish between “good” and “bad” concurrency
 Prevent all “bad” (and sometime some “good”) concurrency from
happening OR
 Recognize “bad” concurrency when it happens and undo its effects
(abort some transactions)
 Pessimistic vs Optimistic CC
Both pessimistic and optimistic approaches require
distinguishing between good and bad concurrency
How: concurrency characterized in terms of possible Xaction “schedules”
15.12
Schedules
Schedules – sequences that indicate the chronological order in
which instructions of concurrent transactions are executed
 a schedule for a set of transactions must consist of all instructions of
those transactions
 must preserve the order in which the instructions appear in each
individual transaction
T1
1
2
3
T2
A
B
C
D
T1
1
T2
A
B
one possible schedule:
2
3
C
D
15.13
Example Schedules
Transactions:
T1: transfers $50 from A to B
T2: transfers 10% of A to B
T1
read(A)
A <- A -50
write(A)
read(B)
B<-B+50
write(B)
Example 1: a “serial” schedule
T2
Constraint: The sum of A+B
must be the same
Before: 100+50
=150, consistent
read(A)
tmp <- A*0.1
A <- A – tmp
write(A)
read(B)
B <- B+ tmp
write(B)
15.14
After: 45+105
Example Schedule
Another “serial” schedule:
T1
T2
read(A)
tmp <- A*0.1
A <- A – tmp
write(A)
read(B)
B <- B+ tmp
write(B)
Before: 100+50
=150, consistent
After: 40+110
Consistent but not the
same as previous schedule..
Either is OK!
read(A)
A <- A -50
write(A)
read(B)
B<-B+50
write(B)
15.15
Example Schedule (Cont.)
Another “good” schedule:
T1
read(A)
A <- A -50
write(A)
T2
Effect:
read(A)
tmp <- A*0.1
A <- A – tmp
write(A)
Before
A
100
B
50
After
45
105
Same as one of the serial schedules
Serializable
read(B)
B<-B+50
write(B)
read(B)
B <- B+ tmp
write(B)
15.16
Example Schedules (Cont.)
A “bad” schedule
T1
read(A)
A <- A -50
T2
Before: 100+50 = 150
read(A)
tmp <- A*0.1
A <- A – tmp
write(A)
read(B)
After: 50+60 = 110 !!
Not consistent
write(A)
read(B)
B<-B+50
write(B)
B <- B+ tmp
write(B)
15.17
Example Schedule (Schedule B)
T1
T2
T3
T4
T5
read(X)
read(Y)
read(Z)
read(V)
read(W)
read(Y)
write(Y)
write(Z)
read(U)
read(Y)
write(Y)
read(Z)
write(Z)
read(U)
write(U)
15.18
Transaction Definition in SQL
Data manipulation language must include a construct for
specifying the set of actions that comprise a transaction.
In SQL, a transaction begins implicitly.
A transaction in SQL ends by:
 Commit work commits current transaction and begins a new
one.
 Rollback work causes current transaction to abort.
Levels of consistency specified by SQL-92:
 Serializable — default (more conservative than conflict
serializable)
 Repeatable read
 Read committed
 Read uncommitted
15.19
Transactions in SQL
Serializable — default
- can read only committed records
- if T is reading or writing X, no other Xaction can change X
until T commits
- if T is updating a set of records (identified by WHERE
clause), no other Xaction can change this set until T commits
Weaker versions (non-serializable) levels can also be declared
Idea: tradeoff: More concurrency => more overhead to ensure
valid schedule.
Lower degrees of consistency useful for gathering approximate
information about the database, e.g., statistics for query optimizer.
15.20
15.21
General Overview
Relational model - SQL
 Formal & commercial query languages
Functional Dependencies
Normalization
Physical Design
Indexing
Query Processing and Optimization
Transaction Processing and CC
15.22
Transaction Concept
A transaction is a unit of program execution that
accesses and possibly updates various data items.
A transaction must see a consistent database.
During transaction execution the database may be
inconsistent.
A transaction ends with a Commit or an Abort
When the transaction is committed, the database must
be consistent.
State 1
State 2
15.23
How to enforce serializable schedules?
Prevent P(S) cycles from occurring using a concurrency control
manager: ensures interleaving of operations amongst
concurrent xactions only result in serializable schedules.
T1 T2 …..
Tn
CC Scheduler
Serializable
schedule
15.24
Concurrency Via Locks
Idea:
 Data items modified by one xaction at a time
Locks
 Control access to a resource
 Can block a xaction until lock granted
 Two modes:
 Shared (read only)
 eXclusive (read & write)
15.25
Granting Locks
Requesting locks
 Must request before accessing a data item
Granting Locks
 No lock on data item? Grant
 Existing lock on data item?
 Check compatibility:
 Compatible? Grant
 Not? Block xaction
15.26
shared
exclusive
shared
Yes
No
exclusive
No
No
Lock instructions
New instructions
- lock-S: shared lock request
- lock-X: exclusive lock request
- unlock: release previously held lock
T1
Example:
lock-X(B)
read(B)
B B-50
write(B)
unlock(B)
lock-X(A)
read(A)
A A + 50
write(A)
unlock(A)
T2
lock-S(A)
read(A)
unlock(A)
lock-S(B)
read(B)
unlock(B)
display(A+B)
15.27
Locking Issues
Starvation
 T1 holds shared lock on Q
 T2 requests exclusive lock on Q: blocks
 T3, T4, ..., Tn request shared locks: granted
 T2 is starved!
Solution?
Do not grant locks if other xaction is waiting
15.28
Locking Issues
No xaction proceeds:
T1
T2
Deadlock
- T1 waits for T2 to unlock A
lock-X(B)
- T2 waits for T1 to unlock B
read(B)
B  B-50
write(B)
lock-S(A)
More later…
read(A)
lock-S(B)
lock-X(A)
15.29
Locking Issues
Does not ensure serializability by itself:
T1
lock-X(B)
read(B)
B B-50
write(B)
unlock(B)
T2
lock-S(A)
read(A)
unlock(A)
lock-S(B)
read(B)
unlock(B)
display(A+B)
lock-X(A)
read(A)
A A + 50
write(A)
unlock(A)
15.30
T2 displays 50 less!!
Two-Phase Locking: 2PL
Each xaction has two phases
 Growing : only lock request
 Shrinking: only unlocks
 xactions start in growing phase
 First unlock begins the shrinking phase
Ensures serializabile schedules !
 Order concurrent xactions on the moment of final lock is granted
15.31
2PL
Example: T1 in 2PL
Growing phase
Shrinking phase


T1
lock-X(B)
read(B)
B  B - 50
write(B)
lock-X(A)
read(A)
A  A - 50
write(A)
unlock(B)
unlock(A)
15.32
2PL Issues
2PL does not prevent deadlock
T1
T2
lock-X(B)
read(B)
B  B-50
> 2 xactions involved?
write(B)
- Rollbacks expensive
lock-S(A)
read(A)
lock-S(B)
lock-X(A)
15.33
Dealing with Deadlocks
How do you detect a deadlock?
 Wait-for graph
 Directed edge from Ti to Tj
T2
 Ti waiting for Tj
T4
T1
T1
T2
T3
T4
T3
X(Z)
X(V)
X(W)
Suppose T4 requests lock-S(Z)....
S(V)
S(W)
S(V)
15.34
Detecting Deadlocks
Wait-for graph has a cycle  deadlock
T2
T2, T3, T4 are deadlocked
T4
T1
•Build wait-for graph, check for cycle
T3
•How often?
- Tunable
Expect many deadlocks or many xactions involved
Run often to avoid aborts
Else run less often to reduce overhead
15.35
Recovering from Deadlocks
Rollback one or more xaction
 Which one?
 Rollback the cheapest ones
 Cheapest ill-defined
 Was it almost done?
 How much will it have to redo?
 Will it cause other rollbacks?
 How far?
 May only need a partial rollback
 Avoid starvation
 Ensure same xaction not always chosen to break deadlock
15.36
15.37
Review: The ACID properties
A tomicity:
All actions in the Xaction happen, or none happen.
C onsistency:
If each Xaction is consistent, and the DB starts
consistent, it ends up consistent.
I solation:
D urability:
Execution of one Xaction is isolated from that of other Xacts.
If a Xaction commits, its effects persist.
CC guarantees Isolation and Atomicity.
The Recovery Manager guarantees Atomicity & Durability.
15.38
Why is recovery system necessary?
Transaction failure :
 Logical errors: application errors (e.g. div by 0, segmentation fault)
 System errors: deadlocks
 Aborts
System crash:
 hardware/software failure causes the system to crash.
Disk failure:
 head crash or similar disk failure destroys all or part of disk storage
The lost data can be in main memory or in disk
15.39
Storage Media
Volatile storage:
 does not survive system crashes
 examples: main memory, cache memory
Nonvolatile storage:
 survives system crashes
 examples: disk, tape, flash memory,
non-volatile (battery backed up) RAM
Stable storage:
 a “mythical” form of storage that survives all failures
 approximated by maintaining multiple copies on distinct nonvolatile
media
15.40
Recovery and Durability
To achieve Durability:
Put data on stable storage
To approximate stable storage make two copies of
data
15.41
Stable-Storage Implementation
Solution:
 Write to the first disk
 Write to the second disk when the first disk completes
 The process is complete only after the second write completes
successfully
Recovery (from disk failures, etc):
 Detect bad blocks with the checksum (e.g. parity)
 Two good copies, equal blocks: done
 One good, one bad : copy good to bad
 Two bad copies: ignore write
 Two good, unequal blocks?
Ans: Copy the second to the first
15.42
Recovery and Atomicity
Example: transfer $50 from account A to account B
 goal is either to perform all database modifications made by Ti or
none at all.
Requires several inputs (reads) and outputs (writes)
Failure after output to account A and before output to B….
 DB is corrupted!
15.43
Recovery Algorithms
Recovery algorithms are techniques to ensure database
consistency and transaction atomicity and durability despite
failures
Recovery algorithms have two parts
1. Actions taken during normal transaction processing to ensure
enough information exists to recover from failures
2. Actions taken after a failure to recover the database contents to a
state that ensures atomicity, consistency and durability
ARIES: Algorithms for Recovery and Isolation Exploiting Semantics
http://en.wikipedia.org/wiki/Algorithms_for_Recovery_and_Isolation
_Exploiting_Semantics
15.44
Log-Based Recovery
Simplifying assumptions:
 Transactions run serially
 logs are written directly on the stable storage
Log: a sequence of log records; maintains a record of update
activities on the database. (Write Ahead Log, W.A.L.)
W.A.L. Write-Ahead Log:
 record the operation on the log, before you write it on the db.
 Record everything else on the log before commit
15.45
Log based approach
Log records for transaction Ti:
<Ti start >
<Ti, X, V1, V2>
<Ti commit >
Two approaches using logs
Deferred database modification
Immediate database modification
15.46
Log example
Transaction T1
Read(A)
A =A-50
Write(A)
Read(B)
B = B+50
Write(B)
Log
<T1, start>
<T1, A, 1000, 950>
<T1, B, 2000, 2050>
<T1, commit>
15.47
Deferred Database Modification
Ti starts: write a <Ti start> record to log.
Ti write(X)
 write <Ti, X, V> to log: V is the new value for X
 The write is deferred
 Note: old value is not needed for this scheme
Ti partially commits:
 Write <Ti commit> to the log
DB updates by reading and executing the log:
 <Ti start> …… <Ti commit>
15.48
Deferred Database Modification
How to use the log for recovery after a crash?
Redo: if both <Ti start> and <Ti commit> are there in the log.
Crashes can occur while
 the transaction is executing the original updates, or
 while recovery action is being taken
example transactions T0 and T1 (T0 executes before T1):
T0: read (A)
T1 : read (C)
A: - A - 50
C:- C- 100
Write (A)
write (C)
read (B)
B:- B + 50
write (B)
15.49
Deferred Database Modification (Cont.)
Below we show the log as it appears at three instances of time.
<T0, start>
<T0, A, 950>
<T0, B, 2050>
(a)
<T0, start>
<T0, A, 950>
<T0, B, 2050>
<T0, commit>
<T1, start>
<T1, C, 600>
(b)
<T0, start>
<T0, A, 950>
<T0, B, 2050>
<T0, commit>
<T1, start>
<T1, C, 600>
<T1, commit>
(c)
15.50
Immediate Database Modification
Database updates of an uncommitted transaction is allowed
Tighter logging rules are needed to ensure transactions are
undoable
 Write records must be of the form: <Ti, X, Vold, Vnew >
 log record must be written before database item is written
 Output of DB blocks can occur:
 Before or after commit
 In any order
15.51
Immediate Database Modification Example
Log
Write
Output
<T0 start>
<T0, A, 1000, 950>
<To, B, 2000, 2050>
A = 950
B = 2050
<T0 commit>
<T1 start>
<T1, C, 700, 600>
C = 600
BB, BC
<T1 commit>
BA
Note: BX denotes block containing X.
15.52
Immediate Database Modification (Cont.)
Recovery procedure :
 Undo : <Ti, start > is in the log but <Ti commit> is not. Undo:
 restore the value of all data items updated by Ti to their old
values, going backwards from the last log record for Ti
 Redo: <Ti start> and <Ti commit> are both in the log.
 Redo: sets the value of all data items updated by Ti to the new
values, going forward from the first log record for Ti
Both operations must be idempotent: even if the operation is executed
multiple times the effect is the same as if it is executed once
15.53
I M Recovery Example
<T0, start>
<T0, A, 1000, 950>
<T0, B, 2000, 2050>
(a)
<T0, start>
<T0, A, 1000, 950>
<T0, B, 2000, 2050>
<T0, commit>
<T1, start>
<T1, C, 700, 600>
<T0, start>
<T0, A, 1000, 950>
<T0, B, 2000, 2050>
<T0, commit>
<T1, start>
<T1, C, 700, 600>
<T1, commit>
(b)
(c)
Recovery actions in each case above are:
(a) undo (T0): B is restored to 2000 and A to 1000.
(b) undo (T1) and redo (T0): C is restored to 700, and then A and B are
set to 950 and 2050 respectively.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050
respectively. Then C is set to 600
15.54
Checkpoints
Problems in recovery procedure as discussed earlier :
1. searching the entire log is time-consuming
2. we might unnecessarily redo transactions which have already
output their updates to the database.
How to avoid redundant redoes?

Put marks in the log indicating that at that point DB and log are
consistent. Checkpoint!
15.55
Checkpoints
At a checkpoint:
Output all log records currently residing in main memory onto
stable storage.
Output all modified buffer blocks to the disk.
Write a log record < checkpoint> onto stable storage.
15.56
Checkpoints (Cont.)
Recovering from log with checkpoints:
1. Scan backwards from end of log to find the most recent
<checkpoint> record
2. Continue scanning backwards till a record <Ti start> is found.
3. Need only consider the part of log following above start record.
Why?
4. After that, recover from log with the rules that we had before.
15.57
Example of Checkpoints
Tc
Tf
T1
T2
T3
T4
checkpoint
system failure
checkpoint
T1 can be ignored (updates already output to disk due to checkpoint)
T2 and T3 redone.
T4 undone
15.58
Recovery With Concurrent Transactions
To permit concurrency:
 All transactions share a single disk buffer and a single log
 Concurrency control: Strict 2PL :i.e. Release eXclusive locks only
after commit.
 Logging is done as described earlier.
The checkpointing technique and actions taken on recovery have
to be changed
 since several transactions may be active when a checkpoint is
performed.
15.59