12. Recovery LECTURE - NDSU Computer Science

Transcript 12. Recovery LECTURE - NDSU Computer Science

12. Recovery
REVIEW:
COMMIT is the sucessful end-of-transaction operation.
Changes to data items are not made permanent until the COMMIT issued by the TM is acknowledged by the DM.
.
.
| Transaction Manager(s) |
.
.
|SCHEDULER |
.
.
|DATA MANAGER |
.
.
|DATABASE ON DISK |
Following that ack, the DBMS must guarantee that the updates will never be lost, no matter what happens! (DB must be recoverable).
Techniques used by DBMS to guarantee recoverability to a recent COMMITTED DB STATE (all data items show the value written by a
committed transaction and the resulting state is consistent with the integrity constraints)
ROLLBACK (ABORT) is the unsuccessful end-of-transaction operation). All changes are undone using the LOG (or JOURNAL).
- The on-line LOG holds all updates as they are made.
- when on-line log fills up, written to off-line log (usually on tape)
- LOGs can grow to be as large as the database itself.
Write-Ahead Logging (WAL) Protocol: requires that a log record is physically written with the "last committed value" on it, before that item is
changed (overwritten).
WAL protocol facilitate
"UNDO by re-intalling before-values" of all changes (removes effects of abortd transaction).
Section 12 # 1
Transaction Failure
TYPES OF FAILURES:
Transaction local (ABENDS, NSF check)
System failures (DBMS itself fails)
Media failure (disk crash)
TRANSACTION FAILURE (transactions themselves are responsible for action)
e.g., Abnormal program ends (ABENDS),
Non-Sufficient Funds (NSF)
Transaction code can can trap these and specify remedy (e.g., ROLLBACK). However, in order to facilitate proper
transaction actions, system must hold all output messages until COMMIT. Otherwise, this can happen:
A T M
/////
|
| O `
| o o o
|
>
| o o o
| `-|
| o o o
`----'
|_______
|
| / $ /
|--------------/___/
|
|
|
|
^
|
| |
|
| |
|
L L
|_______
|
|
|
|
|
|
|
|
|
|
|
|
|
!@#$%!
....ROLLBACK!
BANKER
| o o |
| _ |
| ' ` |
_______
`-----' |
|---- | NSF
|
|_______
|_____
----- |
|
|
|
L
|
|
|
____
|
|
|
At an ATM cash machine, if the "message" (the cash) is given to the customer before commit, it is impossible to
ROLLBACK the transaction.
Section 12
#2
System Failure
SYSTEM FAILURE: DBMS itself fails, and the memory contents are lost (including buffers), but the data on disk is
undamaged (The Data Manager is allowed to do its job any way it wants to (to optimize its activity). That's the reason
for the component separation in the first place (instead of monlithic system). So the DM can be implemented so that the
Disk(s) may contain some "uncommitted values" and/or it may not contain all committed values.
The Disk(s) may contain uncommitted values if a STEAL policy is used.
STEAL policy: The Buffer Manager can replace a page which still has uncommitted values (write a page to disk that contain
uncommitted values) (actually "stealing" a page from 1 trans and give it to another) (Necessary for very long running
trans e.g., a payroll processing)
The Disk(s) may not contain all committed values if a NO-FORCE policy is used.
NO-FORCE policy: Buffer Mgr may not write a page with newly committed values until later. (e.g., In a Banking system,
may not be able to afford to force every write immediately)
BUFFER POLICIES:
|FORCE
| YES
| YES
| NO
| NO
| STEAL
| YES
| NO
| YES < - the hardest to implement but the best!
| NO
Although there are system that use either a NO-STEAL or a FORCE policy (or both), we discuss only STEAL, NO-FORCE
(STEAL NO-FORCE requires the most demanding recovery system).
Section 12
#3
Steal, No-force buffer policy
In a STEAL NO-FORCE system:
All transactions active at fail-time (BEGUN, not ENDed) must be UNDONE. (because some of the changes it made may
have been written under the STEAL policy).
All transactions committed at fail-time must be idempotently REDONE (because the committed changes it made may have
not been written under the NO-FORCE policy).
One way is to UNDO all active transactions and then idempotently REDO all committed transactions.
Do we have to go all the way back to IPL (Initial Program Load) and REDO all committed transactions?
Can that be avoided? YES! Through checkpointing!
The System periodically takes a CHECKPOINT.
There are many, many checkpointing methods, the next slide shows a "Standard" CHECKPOINT:
Section 12
#4
Steal, No-force checkpoint
There are many, many checkpointing methods, this slide shows a "Standard" CHECKPOINT:
It is usually done at a quiescent point in time (no activity going on), but not necessarily
(i.e., there are "on-the-fly" checkpointing methods, but they are complex).
1. forcewrites all buffers to disk immediately (flushes buffers).
2. forcewrites a "checkpoint" record to the log. A CHECKPOINT record must have an "active
list" containing all currently active transactions.
Trans | "ca-chunk"
|
.- | "change record"
|
: | "ca-chunk"
|
log
: | "COMMIT *-1st flush
|-.
record: |
.
| :
////
.- - | then"check-point-rec"
| ::
////
: : |
|/ /| ((0 -)0): \|
.
| / O `-'
`-' //
2
| log |
|database| / `.__/|_/
_|_/
:.<- | buff |
| buffer |/ - - -' |
:*
|______|__
|_______ |
|
@@@ ::
/
)
^
@ o >::
(
/
| |
@\/ ::
`----'|
| |
|--::
\___/
L L
/() ::
|
| ::
V
/^\ ::
|
| disk copy|
L L:`>tr-log
|
| database |
`>_chpt-rec |
|__________|
Section 12
#5
Steal, No-force checkpoint
With standard SNF Checkpointing (described above), of the following which must be undone and which must be redone?
Active
where
|------> |
^
^
BEGIN
COMMIT
CHECKPOINT
CRASH
T1 |------->|
T2
|---------------------------->|
T3 |----------------------------------------->
T4
|--->|
T5
|---------->
T6
|-------------------------->|
After the crash, the RECOVERY PROCESS would:
1. Start at most recent Checkpoint record in LOG containing ACTIVE-list={T2,T3,T6} UNDO-list = ACTIVE-list e.g.,
UNDO={T2,T3,T6} REDO-list = empty.
2. Scan forward in the LOG from CHECKPOINT record. For each BEGIN encountered, put trans in UNDO-list
(UNDO={T4, T5} For each COMMIT encountered, move trans from UNDO to REDO. (e.g., move T4,T2).
3. When LOG is exhausted, Idempotently REDO REDO-list in commit in order. (e.g., {T6, T4, T2} ) UNDO all trans in
UNDO-list (e.g., {T3, T5} )
Note: Since transactions are redone in commit-order = REDO-order, it must be the case that the Serial Order to
which
execution is equivalent is COMMIT order. That is, if another serial order is the order to which the serializability is
equivalent, the REDO must be done in that order.
Section 12 # 6
Steal, No-force checkpoint
Note: Since transactions are redone in commit-order = REDO-order, it must be the case that the Serial Order to which
execution is equivalent is COMMIT order.
That is, if another serial order is the order to which the serializability is equivalent, the REDO must be done in that order.
In T2 and T4 above, messages may have gone back to the users which were based on and execution order equivalent to
SOME serial order (values reported to users were generated by the execution in that order).
Thus, RECOVERY must regenerate in the same order.
The only way that the RECOVERY process can know what serial order the original execution was equivalent to is that the
initial execution be equivalent to some serial order identifiable from the LOG.
One order identifiable from the LOG is COMMIT order. Therefore, it is common to demand that the order of execution be
equivalent to the serial COMMIT-order.
(S2PL does that. Is that why it is so popular?)
Section 12
#7
Media Failue
MEDIA FAILURE (from disk crash) RECOVERY
ARCHIVE: periodically dump database (i.e., make an ARCHIVE copy to off-line tape?):
1. Shut down the DBMS (e.g., late at night or during "quiescent" period)
2. Copy the entire database to off-line storage (tape)
3. Bring up the DBMS again
4. Erase the LOG and restart logging
___
|
|
.
.
| disk copy
|
| tape |< - - - - - - - |
of
|
___ . ___ .
| database
|
|________________ |
Following a media failure (disk crash),
1. RESTORE DB from archive,
___
|
|
.
.
| disk copy
|
| tape |- - - - - - - > |
of
|
___ . ___ .
| database
|
|________________ |
2. REDO transaction-log from archive-time to as near to crash-time as possible (using both the off-line and the on-line
log (the on-line is kept on separate disk from the database itself for durability)). This is called ROLL-FORWARD :
___________COMPUTER________
LOG |- - - ->|
"ca-chunk"
|
____|
|
"ca-chunk
|
|
"redo transaction"
|
|--------------.
.-------------- |
| log
|
| database |
| buffer
|
| buffer
|
Section 12 # 8
|_________ |_________ |_________ |
Media Failue
MEDIA FAILURE (from disk crash) RECOVERY
There are many other methods.
DUPLEXING = make two copies of every data item on separate disks (at least separate failure modes).
The amount of extra disk space used can be reduced by methods such as Huffman coding to as low as 5% extra disk
space, however, in this, the Age Of Infinite Storage is it worth doing? Huffman coding is used in some in RAID
systems. (Redundant Arrays of Independent Disks)
APPENDIX
Storage past, present and future: In 1956, IBM developed RAMAC, a refrig sized disk system with 50 2-ft diam platters.
RAMAC had a capacity of 5 megabytes.
Since then: 1. The amount of data stored on given area has increased 1,000,000-fold. 2. The transfer speed has increased
3,000-fold. 3. The cost per bit has decreased 500,000-fold (comparable $s).
This is due to breakthroughs in 1. "areal density" (# bits/squarech in). 2. revolution speeds. 3. read-write head technologies.
How much more higher can disk capacity go? So far predictions of "upper limits" have been made by engineers and they
have always been wrong (way wrong).
However, we are approaching a limit determined by fundamental physics, not engineering ingenuity. There comes a point
beyond which random jiggle of electron spins due to temperature is likely to cause the directions of bit's magnetization
to spontaneously reverse within the expected livetime of the disk.
This is called the SUPERPARAMAGNETIC LIMIT and it may limit the progress that can be achieved through
minaturizing or the "scaling down" of existing technologies.
Where is the superparamagnetic limit? Most agree it will be encountered at densities ~120 Gbits/square_inch.
At 6.5 sq_in per 3.5 inch surface, that gives ~ 800 Gigabits/surface. or ~ 100 GigaBytes/surface times 50 surfaces, we can
conclued that a 3.5 inch hard-drives may go to 5000 GigaBytes/disk= 5 TeraBytes/disk
Note that COMMODITY drives today have reached 500 GigaBytes/drive: so another 10=fold increas and we're there with
commodity drives!!!
Indexing and providing reference paths and access paths to data stores of this size is nearly impossible!
Section 12
#9
Appendix continued
What are we going to do?? Holographic storage? (From holographic storage 1 holographic storage 2 "Storing data as
holograms has intrigued scientists for decades.
In the early 1960s, former Almaden Research Center scientist Glenn Sincerbox helped IBM develop the world's first working
holographic data storage system
- a write-once-read-many (worm) technology using photographic film, for US Air Force.
Today, IBM participates in two industry/university/government consortia that aim to demonstrate holographic storage
technologies by the turn of the century.
A traditional hologram is produced when a beam of laser light, the reference beam, interferes with another beam reflected
from the object to be recorded.
The pattern of interference is captured by photographic film, a light-sensitive crystal or some other optical material.
Illuminating th pattern by the reference beam reproduces a 3-D image of the object. (the technology is called interfereometry)
Each viewing angle gives you a different view of the same object.
Holographic data storage works in exactly the same way. But for every angle, instead of having another view of object, we
have a completely different page of information."
Up to 10,000 pages have been stored in a single cube of recording material 1 cm on a side.
Each page contains one megabit of information, which means that the cube can store ~10 gigabits.
Since there are approximately 27 cubic cms in a cubic inch and there are approximately 46 cubic inches in a 3.5 inch cube
(3.5 inch diskettes piled 3.5 inches high) that means a 3.5 inch cube holograph would hold ~12 terabits of data.
Holographic recording has the advantage of being inherently non-linear (parallel).
It reads and stores an entire page at a time. The technology permits data rates of up to one gigabit (or 125 megabytes) per
second, making it ideal for storing image data.
Another advantage of holographic storage, largely untapped, lies in its use as associative memory. Just as illuminating a
hologram with a reference beam recovers the stored info, illuminating it with a pattern of info will reproduce the
corresponding reference beam and angle, which immediately identifies the page on which the information is stored.
Section 12
# 10
Appendix continued
In other words, holographic memories can be searched extremely quickly for data patterns (associative memory).
This would allow database searches using physics rather than software.
Note that holographic storage may make the current access path technologies (indexes) obsolete.
Why would anyone use indexes. hash functions, SQL, Relational Alg, Relational Calc.... when you can simply pattern match
in a holo cube?!?!?!?!
It should be interesting. Spintronics is another solution; IBM and Stanford University (and the NDSU CNSE center in our
Research Park) are putting their heads together on a new microelectronics technology dubbed "spintronics" that
promises breakthroughs in computer processors and other electronics components while extending Moore's Law forchip
design.
In setting up a spintronics lab, researchers at the two organizations plan to control the spin, or magnetic orientation, of
electrons within nano-scale electronic structures comprisingz super-thin layers to produce devices for low-power
switching and non-volatile information storage.
Magnetic Properties: Electron spin is a quantum property that has "up" or "down" states. Aligning spins in a material creates
magnetism, and magnetic fields affect the passage of electrons differently. Understanding and controlling this property is
central to creating a whole new breed of electronic applications.
Among the possibilities are reconfigurable logic devices, room-temperature superconductors and quantum computers. The 1st
commercial products, ranging from digital cameras to instant-on computers, will not be available for at least 5 years.
Current chip technology relies on the charges of electrons in circuitry, explained Mike Ross, a spokesperson for IBM's
Almaden Research Center. Spintronics uses the quantum "spin" property of electrons to create magnetism, just as an
electron's negative charge property creates electricity.
MRAM In the Works: By designing and making stacks of different materials -- some with layers only two to three atoms
thick -- researchers can create devices that have novel properties. The spintronic GMR head, for example, has boosted
the disk-drive industry, Ross told NewsFactor.
"This sensitive magnetic sensor, introduced by IBM, has resulted in a 40-fold increase in data storage in the past seven
years," he said.
Section 12
# 11
Appendix continued
Magnetic RAM (MRAM) is the next spintronic device in the works.
It has the potential to be a non-volatile memory that runs circles around non-volatile Flash memory typically used in cell
phones, memory cards and other products. Current fast memory (SRAM, SDRAM, etc.) technology is volatile, meaning
that devices must be booted up to save data.
"We want to learn more about using this technology in the sensor realm, and we see big benefits to logic and other types of
electronics circuits," said Ross.
The IBM-Stanford Spintronic Science and Applications Center (SpinAps) will involve about a half-dozen Stanford professors
and a similar number of IBM scientists. Research projects are funded by the two partners and agencies, including the
Defense Advanced Research Projects Agency, the U.S. Department of Energy, and the National Science Foundation.
RAM Revolution Spintronics "has quickly revolutionized magnetic recording technology and is going to revolutionize
random access memory (RAM)," University of Utah physics researcher Jing Shi told NewsFactor.
Compared with electronic computers, computers with spintronic memory should be able to store more data, process it faster,
and consume less power.
Spintronics also may yield "instant-on" computers.
Aligned spins stay aligned until a magnetic field changes them -- even if a computer is shut off. Consequently, spin-based
instant-on computers do not require booting to move data from the hard drive to the memory.
The data never left.
Section 12
# 12
Thank
you.
Section 12

12. Recovery LECTURE - NDSU Computer Science

Transcript 12. Recovery LECTURE - NDSU Computer Science

Directory