PPT Chapter 19

Download Report

Transcript PPT Chapter 19

Chapter 19
Recovery and Fault Tolerance
Copyright © 2008
Introduction
•
•
•
•
•
Faults, Failures, and Recovery
Byzantine Faults and Agreement Protocols
Recovery
Fault Tolerance Techniques
Resiliency
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.2
2
Faults, Failures, and Recovery
• A fault may damage the state of a system
– Error: a part of the system state that is erroneous
• Failure: unexpected behavior or situation
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.3
3
Faults, Failures, and Recovery
(continued)
• Recovery: for reliable operation, system is restored to a
consistent state, and operation resumed
– A recovery is performed when a failure is noticed
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.4
4
Classes of Faults
• Fault model: properties that determine the kinds of
errors/failures that might result from a fault
• Classes of faults:
– System fault  system crash
• Amnesia and partial amnesia faults
• A fail-stop fault brings a system to a halt
– Process fault
• Byzantine faults: malicious or arbitrary actions
– Storage fault  amnesia faults
– Communication fault  nonamnesia faults
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.5
5
Overview of Recovery Techniques
• For non-Byzantine faults, recovery involves restoring
system or application to a consistent state
– Involves reexecuting some actions
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.6
6
Overview of Recovery Techniques
(continued)
• Recovery approaches are classified into:
– Backward recovery: resetting state of entity affected by
fault to a prior state and resuming its operation
• Involves reexecution of some actions
– Forward recovery: repairing erroneous state of a system
so system can continue its operation
• Repair cost depends on the nature of the computation
• May involve a certain amount of reexecution
• Backward recovery is simpler to implement
– But, requires a practical method of producing a consistent
state recording of a system
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.7
7
Byzantine Faults and Agreement
Protocols
• Byzantine faults have been studied only in the restricted
context of agreement between processes
– Byzantine generals problem: Attack or retreat
• Agreement protocols used for:
– Byzantine agreement problem
– Consensus problem
– Interactive consistency problem
• Impossibility result: a group of three (m) processes
containing one (k) faulty cannot reach agreement
– If m > 3, agreement is possible if m > 3 k
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.8
8
Recovery
• A recovery scheme consists of two components:
– Checkpointing algorithm
• Decides when to take a checkpoint for a process
– Recovery algorithm
• Uses checkpoints to roll back processes such that new
process states are mutually consistent
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.9
9
Recovery (continued)
• State of a process cannot be recovered in isolation
– Must restore state of computation S’ in which states of all
pairs of processes are mutually consistent
• Goal of recovery algorithm is to decide:
– Whether a process Pi should be rolled back
– Identify checkpoint to which Pi should be rolled back
• Asynchronous or synchronous checkpointing
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.10
10
Fault Tolerance Techniques
• Basic principle in fault tolerance is to ensure that a fault
either does not cause an error
– Or that the error can be removed easily
• Two facets of the tolerance of system faults that follow
the fail-stop model:
– Fault tolerance for replicated data
– Fault tolerance for distributed data
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.11
11
Logs, Forward Recovery, and
Backward Recovery
• A log is a record of actions or activities in a process
– Do logs (also called redo logs)
• Used to implement forward recovery
– Undo logs
• Used to implement backward recovery
• A write-ahead logging principle is used
• A log could be an operation log or a value log
– Example: intentions list (for atomic actions) is a value log
used as a redo log
• Value logs provide idempotency
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.12
12
Handling Replicated Data
•
Availability of data D provided through replication
– At least one copy of D would be accessible from any
node despite anticipated faults in the system
•
If D may be modified, it is essential to use rules to
ensure correctness of data access and updates:
1. Many processes can concurrently read D
2. Only one process can write into D at any time
3. Reading and writing can’t be performed concurrently
4. A process reading D must see the latest value of D
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.13
13
Handling Replicated Data (continued)
• Quorum: number of copies of D that must be accessed
to perform a specific operation on D
• Quorum algorithms enforce Rules 1–4 by specifying a
read quorum Qr and a write quorum Qw
– 2 x Qw > n and Qr + Qw > n
– Two kinds of locks used on D: read and write locks
– If a system is required to tolerate faults in up to k nodes,
we could choose:
• Qr = k + 1
• Qw = n − k
• n>2×k
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.14
14
Handling Distributed Data
• Distributed transaction: facility for manipulating files
located in different nodes of a distributed system in a
mutually consistent manner
– Also called a multisite transaction
– Each node contains a transaction manager
– Originating node contains a transaction coordinator
• Coordinator implements the all-or-nothing property of
transactions with two-phase commit (2PC) protocol
– Depending on responses from participating nodes, decides
whether to commit or abort transaction
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.15
15
Two-Phase Commit Protocol
•
Phase 1
1. Actions of transaction coordinator:
a. Write a ‘Prepare Ti’ record in the log
b. Set a time-out and send a ‘Prepare Ti’ message to all nodes
participating in the transaction
2. Actions of a participating node:
a. If it is ready to commit, write updates in stable storage and a
‘Prepared Ti’ record in the log, and send a ‘Prepared Ti’
message to coordinator
b. Otherwise, write an ‘Abandoned Ti’ record in the log and
send an ‘Abandoned Ti’ message to coordinator
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.16
16
Two-Phase Commit Protocol
•
Phase 2
1. Actions of transaction coordinator:
a. If it receives a ‘Prepared’ reply from all nodes before timeout occurs, write a ‘Commit Ti’ record in the log and send
‘Commit Ti’ messages to all nodes
b. Otherwise, write an ‘Abort Ti’ record in the log and send
‘Abort Ti’ messages to all nodes
c. Wait for an acknowledgment from each node and write a
‘Complete Ti’ message in the log
2. Actions of a participating node: Depending on the
coordinator’s message,
a. Write a ‘Commit Ti’ record in the log and perform commit
processing
b. Write an ‘Abort Ti’ record in log and abandon updates of Ti
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.17
17
Resiliency
• Resiliency techniques focus on minimizing the cost of
reexecution when faults occur
– Basis for resiliency: failures in a distributed system are
partial, so some parts of a distributed computation may
survive a failure
• For example, distributed transactions use resiliency
techniques:
– Nested transactions
– Tentative commits
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.18
18
Nested Transaction
• A nested transaction Tik is a part of an atomic
transaction Ti
– It commits only if its parent transaction Ti commits
– It is implemented as follows
• When Tik completes, it is said to have reached a tentative
commit
• When Ti wishes to commit, it checks whether all nested
transactions have reached a tentative commit and can
participate in commit processing
– It is implemented using a 2PC protocol
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.19
19
Nested Transaction
• Resiliency is implemented as follows:
– If a nested transaction Tik does not respond to a ‘Prepare’
message, the coordinator can retry Tik in the same node
or in some other node
• If Tik had reached tentative commit and its node had failed
when ‘Prepare’ message was sent
• If the failed node recovers and the coordinator retries Tik in
it
• The results of Tik, computed before the failure, can be used
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.20
20
Summary
• Recovery and fault tolerance are two approaches to
reliability of a computer system
– Generically called recovery
• A third recovery approach is resiliency
• A fault causes an error in the state of the system, which
leads to a failure
– A fail-stop fault brings the system to a halt
– An amnesia fault makes it lose a part of its state
– A Byzantine fault makes it behave in an unpredictable
manner
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.21
21
Summary (continued)
• Recovery from non-Byzantine faults can be performed
by using two approaches:
– Backward recovery and forward recovery
• Fault tolerance implemented by maintaining logs
– E.g., undo or do logs
– Logs used to implement atomic transactions
• Two-phase commit protocol (2PC protocol) is used
• Nested transactions are a resiliency technique
– Used when transaction involves data in many nodes
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
19.22
22