Transcript Recovery
Chapter 20: Recovery
Failure Types
Transaction Failures: local recovery
System Failure: Global recovery
Main memory is lost
Disk survives
Media Failure
421B: Database Systems - Recovery
2
Motivation
Atomicity: All-or-nothing
Transactions may abort (“Rollback”).
Durability:
Changes survive server crash
Desired Behavior after
system restarts:
– T1, T2 & T3 should be
durable.
– T4 & T5 should be
aborted (effects not
seen).
421B: Database Systems - Recovery
T1
T2
T3
T4
T5
c
crash!
c
c
a
3
Update Approaches
In-place updating
Change value of object and write it back to the same
place on stable storage
Used in most DB
Multiversion System
Old object version is kept, new version is created
E.g., in PostgreSQL
Vacuum procedure that from time to time deletes
old versions that are no more needed
421B: Database Systems - Recovery
4
Handling the Buffer Pool
Force/NoForce
If transaction T has written object X on page P: Force page P to disk
(flush) before T commits
All changes of T are in the database on stable storage before T commits
NoForce: Flushing pages to disk is only determined by replacement
policy of buffer manager;
some of the changes of T might not be in the stable database at commit
Steal/NoSteal
NoSteal: If transaction T has updated object X on page P: do NOT
flush P before T commits;
No change of an active, uncommitted transaction is on stable storage
Steal: Replacements strategy is allowed to replace and flush a page
even if the page contains update of uncommitted transaction
Changes of uncommitted transactions might be in the stable database
421B: Database Systems - Recovery
5
Combinations
NoSteal
Force
No Force
Steal
Atomic flush at
Commit Time
flush any time
before commit
flush any time
after commit
Flush any
time
421B: Database Systems - Recovery
6
More on Steal
STEAL (why enforcing Atomicity is hard)
To steal frame F: Current page in F (say P) is written to disk;
some transaction T holds lock on object A on P.
What if the T with the lock on A aborts? What if the
system fails directly after the flush and before T commits?
Must remember the old value of A at steal time (to support
UNDOing the write to page P).
Crash case: we have to do something with transaction T4
ACTIVE at the time of the crash
No Steal (no uncommitted changes are in the stable
database)
At restart after failure we are sure that none of the
changes of T4 are in DB
nothing to do for ACTIVE transactions
421B: Database Systems - Recovery
7
More on Force
Force (write changes before transaction commits)
At restart after failure we are sure that changes of T1,
T2, and T3 are in DB and changes of T5 are NOT in the
DB; nothing to do for TERMINATED transactions
NO FORCE (why enforcing Durability is hard)
Assume a transaction T has modified tuple on page P and
T committed but update is not yet in the stable
database? Now system crashes before modified page is
written to disk?
Write as little as possible, in a convenient place, at
commit time,to support REDOing modifications.
421B: Database Systems - Recovery
8
Combinations
Ideal: FORCE/NO-Steal:
nothing has to be done at recovery
Problem: basically not possible with update-in-place
In reality: mostly NOFORCE/STEAL
421B: Database Systems - Recovery
9
Basic Idea: Logging
A log is a read/append data structure maintained on stable
storage (survives failures)
UNDO information: when transaction T updates an object it
stores the old version of object (before-image); when
transaction aborts, copy before-image to current objectlocation.
REDO information: when transaction T updates an object it
stores the new version of object (after-image); can be used to
redo updates of transactions that committed.
In total: whenever a transaction T updates an object, both
before- and after-image are written as one log-record and
appended to the log.
Additionally: when transaction starts, a BEGIN record is
appended to the log; when transaction commits/aborts, a
commit/abort record is appended to the log.
421B: Database Systems - Recovery
10
Architecture II
Secondary
Storage
(stable)
Upper Layer
Cache/Buffer Manager
Access cost:
~15 ms
Log Disk
Access cost:
~1 ms
421B: Database Systems - Recovery
Buffer Pool (random access)
Log (append/read)
11
DB pages and Log pages
Db page:
Rid = (i,N)
Page i
Rid = (i,2)
Rid = (i,1)
20
N
T255: w(x)
...
16
2
24 N
1 # slots
Log page:
T3: before(y), after (y)
…
…
T255: begin T3: commit
T255: before(z), after (z)
…
T255: before(x), after (x)
421B: Database Systems - Recovery
Log tail
12
When to flush a log page
The Write-Ahead Logging Protocol:
Must force the log entry for an update before the
corresponding data page gets to disk.
Must write all log entries for a Xact before commit.
Note: flushing log page is much cheaper than flushing DB
page!
#1 guarantees Atomicity
Assume active T has changed X; page with X get flushed
to disk (steal); now system crashes before T commits =>
must undo T changes; need before image of X!
#2 guarantees Durability
Assume T has changed X and committed; page with X
does not get flushed to disk (no-force); now system
crashes => must redo T changes; need after image of X!
421B: Database Systems - Recovery
13
Types of Recovery
Local UNDO during normal processing
whenever a transactions aborts, undo updates of aborted Xact
by installing before-images.
Log-records are probably still in main memory; scan backwards
starting from log-tail;
Global UNDO: at restart after system crash
Xacts that aborted before the crash (we find abort record in
log)
Xacts that were active at the time of the crash (we find
neither abort nor commit record in log)
Whenever pages on the disk have updates of such Xacts (we
say the update is reflected in the database), undo these
updates by installing before-images
Pages contain additional information to detect this!
421B: Database Systems - Recovery
14
Types of Recovery
Partial REDO: at restart after system crash
Xacts that committed before the crash (we find a commit record in
the log)
Whenever pages on the disk do not have the updates of such Xacts
(we say the update is not reflected in the database), redo the
updates by installing after-images
Page contains additional information so that we can detect this.
Global REDO: after disk failure
Make snapshot of database (once a day /once a week)
Duplicate log and keep on two disks
Keep log on a second storage
After disk failure
Start with snapshot and then apply log
421B: Database Systems - Recovery
15
Recovery after Crash
Simple procedure:
Backward pass: Scan log from tail to head; For each record
If commit of T, include T in list of committed transactions
C
If abort of T, include T in list of aborted transactions A
If update record of T and T is neither in A or C, include T
in list of aborted transactions
If update record of T on object X and T in A
Read in page P with object X
If update on X performed, install before-image
Forward pass: Scan log from head to tail: for each record
If update record of T on object X and T in C
Read in page P with object X
If update on X not yet performed, install after-image
421B: Database Systems - Recovery
16
Example of Recovery
LOG
1
2
3
4
5
6
7
BM
Backward pass:
7: Put T3 in C
6: Put T2 in A
update: T1 writes A on P5
6: Read P5; nothing
update T2 writes B on P3
P5 is flushed
has to be done
T1 Abort
4,5: nothing
update: T3 writes C on P1
3: put T1 in A
update: T3 write D on P3
2: read p3; install
update: T2 writes Z P5
before-image of B
P3
is
flushed
T3 commit
1: read p5: install
CRASH, RESTART
before image (the
write on A was
flushed to disk but
Forward pass:
not the undo during
Step 4: read P1 install
normal processing)
after-image of P1
Step 5: read P3; nothing
has to be done
421B: Database Systems - Recovery
17
Checkpointing
Log becomes longer and longer => recovery has to read entire log!
Periodically, the DBMS creates a checkpoint, in order to minimize the
time taken to recover in the event of a system crash.
Simple checkpoint:
Goal: only log that was written after the checkpoint has to be
analyzed
Algorithm:
Prevent new transactions from starting
Wait until all transactions have terminated
Flush all dirty pages to stable storage
Write a checkpoint log entry
Start new transactions
Upon recovery: backward pass only goes to last checkpoint entry
In real life more complicated; transaction processing is not interrupted;
no big flush in one step
421B: Database Systems - Recovery
18
Example
Write
Flush Chkpt
Buffer record
start
T1
T2
T3
T4
T5
c
a
421B: Database Systems - Recovery
c
19
Further Issues
Crash during Recovery
Logical logging: instead of physical before/after image
redo operation / inverse operation (e.g. increment by
one, decrement by one)
Hard disk failures:
mirror disk or
Archive copy (consistent copy of database on tape, created e.g.
once every night when no transaction processing) + archive log
(similar to log shown here)
Real Life: much more complicated: see textbook with
ARIES
421B: Database Systems - Recovery
20