Transcript Recovery

Chapter 20: Recovery
Failure Types
Transaction Failures: local recovery
 System Failure: Global recovery

 Main memory is lost
 Disk survives

Media Failure
421B: Database Systems - Recovery
2
Motivation

Atomicity: All-or-nothing
 Transactions may abort (“Rollback”).

Durability:
 Changes survive server crash

Desired Behavior after
system restarts:
– T1, T2 & T3 should be
durable.
– T4 & T5 should be
aborted (effects not
seen).
421B: Database Systems - Recovery
T1
T2
T3
T4
T5
c
crash!
c
c
a
3
Update Approaches

In-place updating
 Change value of object and write it back to the same
place on stable storage
 Used in most DB

Multiversion System
 Old object version is kept, new version is created
 E.g., in PostgreSQL
 Vacuum procedure that from time to time deletes
old versions that are no more needed
421B: Database Systems - Recovery
4
Handling the Buffer Pool

Force/NoForce
 If transaction T has written object X on page P: Force page P to disk
(flush) before T commits

All changes of T are in the database on stable storage before T commits
 NoForce: Flushing pages to disk is only determined by replacement
policy of buffer manager;


some of the changes of T might not be in the stable database at commit
Steal/NoSteal
 NoSteal: If transaction T has updated object X on page P: do NOT
flush P before T commits;

No change of an active, uncommitted transaction is on stable storage
 Steal: Replacements strategy is allowed to replace and flush a page
even if the page contains update of uncommitted transaction

Changes of uncommitted transactions might be in the stable database
421B: Database Systems - Recovery
5
Combinations
NoSteal
Force
No Force
Steal
Atomic flush at
Commit Time
flush any time
before commit
flush any time
after commit
Flush any
time
421B: Database Systems - Recovery
6
More on Steal


STEAL (why enforcing Atomicity is hard)
 To steal frame F: Current page in F (say P) is written to disk;
some transaction T holds lock on object A on P.
 What if the T with the lock on A aborts? What if the
system fails directly after the flush and before T commits?
 Must remember the old value of A at steal time (to support
UNDOing the write to page P).
 Crash case: we have to do something with transaction T4
ACTIVE at the time of the crash
No Steal (no uncommitted changes are in the stable
database)
 At restart after failure we are sure that none of the
changes of T4 are in DB
 nothing to do for ACTIVE transactions
421B: Database Systems - Recovery
7
More on Force


Force (write changes before transaction commits)
 At restart after failure we are sure that changes of T1,
T2, and T3 are in DB and changes of T5 are NOT in the
DB; nothing to do for TERMINATED transactions
NO FORCE (why enforcing Durability is hard)
 Assume a transaction T has modified tuple on page P and
T committed but update is not yet in the stable
database? Now system crashes before modified page is
written to disk?
 Write as little as possible, in a convenient place, at
commit time,to support REDOing modifications.
421B: Database Systems - Recovery
8
Combinations

Ideal: FORCE/NO-Steal:
 nothing has to be done at recovery
 Problem: basically not possible with update-in-place

In reality: mostly NOFORCE/STEAL
421B: Database Systems - Recovery
9
Basic Idea: Logging





A log is a read/append data structure maintained on stable
storage (survives failures)
UNDO information: when transaction T updates an object it
stores the old version of object (before-image); when
transaction aborts, copy before-image to current objectlocation.
REDO information: when transaction T updates an object it
stores the new version of object (after-image); can be used to
redo updates of transactions that committed.
In total: whenever a transaction T updates an object, both
before- and after-image are written as one log-record and
appended to the log.
Additionally: when transaction starts, a BEGIN record is
appended to the log; when transaction commits/aborts, a
commit/abort record is appended to the log.
421B: Database Systems - Recovery
10
Architecture II
Secondary
Storage
(stable)
Upper Layer
Cache/Buffer Manager
Access cost:
~15 ms
Log Disk
Access cost:
~1 ms
421B: Database Systems - Recovery
Buffer Pool (random access)
Log (append/read)
11
DB pages and Log pages
Db page:
Rid = (i,N)
Page i
Rid = (i,2)
Rid = (i,1)
20
N
T255: w(x)
...
16
2
24 N
1 # slots
Log page:
T3: before(y), after (y)
…
…
T255: begin T3: commit
T255: before(z), after (z)
…
T255: before(x), after (x)
421B: Database Systems - Recovery
Log tail
12
When to flush a log page

The Write-Ahead Logging Protocol:
 Must force the log entry for an update before the
corresponding data page gets to disk.
 Must write all log entries for a Xact before commit.
 Note: flushing log page is much cheaper than flushing DB
page!

#1 guarantees Atomicity
 Assume active T has changed X; page with X get flushed
to disk (steal); now system crashes before T commits =>
must undo T changes; need before image of X!

#2 guarantees Durability
 Assume T has changed X and committed; page with X
does not get flushed to disk (no-force); now system
crashes => must redo T changes; need after image of X!
421B: Database Systems - Recovery
13
Types of Recovery

Local UNDO during normal processing
 whenever a transactions aborts, undo updates of aborted Xact
by installing before-images.
 Log-records are probably still in main memory; scan backwards
starting from log-tail;

Global UNDO: at restart after system crash
 Xacts that aborted before the crash (we find abort record in
log)
 Xacts that were active at the time of the crash (we find
neither abort nor commit record in log)
 Whenever pages on the disk have updates of such Xacts (we
say the update is reflected in the database), undo these
updates by installing before-images
 Pages contain additional information to detect this!
421B: Database Systems - Recovery
14
Types of Recovery

Partial REDO: at restart after system crash
 Xacts that committed before the crash (we find a commit record in
the log)
 Whenever pages on the disk do not have the updates of such Xacts
(we say the update is not reflected in the database), redo the
updates by installing after-images
 Page contains additional information so that we can detect this.

Global REDO: after disk failure
 Make snapshot of database (once a day /once a week)
 Duplicate log and keep on two disks
 Keep log on a second storage
 After disk failure

Start with snapshot and then apply log
421B: Database Systems - Recovery
15
Recovery after Crash

Simple procedure:
 Backward pass: Scan log from tail to head; For each record




If commit of T, include T in list of committed transactions
C
If abort of T, include T in list of aborted transactions A
If update record of T and T is neither in A or C, include T
in list of aborted transactions
If update record of T on object X and T in A
 Read in page P with object X
 If update on X performed, install before-image
 Forward pass: Scan log from head to tail: for each record

If update record of T on object X and T in C
 Read in page P with object X
 If update on X not yet performed, install after-image
421B: Database Systems - Recovery
16
Example of Recovery
LOG
1
2
3
4
5
6
7
BM
Backward pass:
 7: Put T3 in C
 6: Put T2 in A
update: T1 writes A on P5
 6: Read P5; nothing
update T2 writes B on P3
P5 is flushed
has to be done
T1 Abort
 4,5: nothing
update: T3 writes C on P1
 3: put T1 in A
update: T3 write D on P3
 2: read p3; install
update: T2 writes Z P5
before-image of B
P3
is
flushed
T3 commit
 1: read p5: install
CRASH, RESTART
before image (the
write on A was
flushed to disk but
 Forward pass:
not the undo during
 Step 4: read P1 install
normal processing)
after-image of P1
 Step 5: read P3; nothing
has to be done
421B: Database Systems - Recovery

17
Checkpointing




Log becomes longer and longer => recovery has to read entire log!
Periodically, the DBMS creates a checkpoint, in order to minimize the
time taken to recover in the event of a system crash.
Simple checkpoint:
 Goal: only log that was written after the checkpoint has to be
analyzed
 Algorithm:
 Prevent new transactions from starting
 Wait until all transactions have terminated
 Flush all dirty pages to stable storage
 Write a checkpoint log entry
 Start new transactions
 Upon recovery: backward pass only goes to last checkpoint entry
In real life more complicated; transaction processing is not interrupted;
no big flush in one step
421B: Database Systems - Recovery
18
Example
Write
Flush Chkpt
Buffer record
start
T1
T2
T3
T4
T5
c
a
421B: Database Systems - Recovery
c
19
Further Issues



Crash during Recovery
Logical logging: instead of physical before/after image
redo operation / inverse operation (e.g. increment by
one, decrement by one)
Hard disk failures:
 mirror disk or
 Archive copy (consistent copy of database on tape, created e.g.
once every night when no transaction processing) + archive log
(similar to log shown here)

Real Life: much more complicated: see textbook with
ARIES
421B: Database Systems - Recovery
20