Slides - NC State University
Download
Report
Transcript Slides - NC State University
Fault Tolerant Distributed
Systems
A Survey
by
Nirmit Desai
The track
Review the approaches to fault tolerance (coupled with mutex algorithms
in some cases, 1 - 3), Get an idea of the terminology and field
#1: Revannaswamy, Bhatt - ‘97
#2: Chang, Singhal, Liu - ‘90
#3: Helary, Mostefaoui - ‘94
#4: Stumm, Zhou - ‘90
#5: Nett, Mock, Theisohn - ‘97
#6: Ballarini, Bernardi, Donatelli - ‘02
#7: Hardekopf, Kwiat, Upadhyaya - ‘01
#8: Injong Rhee - ‘95
#9: Goyer, Momtahan, Selic - ‘90
#10: Frank Mueller - ‘01
Summary
Questions
10/17/2002
Fault tolerant DS - a survey
2
#1: Revannaswamy, Bhatt - ‘97
O(log N), N = Number of nodes in the spanning tree
Communicate with neighbours only (Raymonds algorithm)
Tolerant to single link / single node failures
Eliminate the failed component and obtain different tree
structure (network is biconnected)
Mechanism to detect the failures is assumed to exist
Only “Branches” are used for message exchanges
10/17/2002
Fault tolerant DS - a survey
3
#1 Contd - Reconfiguaration
Repeat the steps below till all connected OR insufficient chords
At the end of reconfiguration, all data-structures are reset and
algorithm is restarted, older token holder is pre-empted and
token is transferred to a newly elected leader
10/17/2002
Fault tolerant DS - a survey
4
#2: Chang, Singhal, Liu - ‘90
1. O((2+K) * logN), Each node has K alternate paths to token node
2. Works with DAG, no cycles should ever develop
3. FIFO violation for greedy strategy
10/17/2002
Fault tolerant DS - a survey
5
#2 Contd. - Fault Tolerance
Mechanism to detect the failures is assumed to exist
Single link / single node failure and recovery
Any path constructed by taking outgoing edges always leads to
the token holder
Studies all the states of the system at which the failure may
occur and gives solutions to each of the states for tolerance
At recovery, the node asks its neighbours about its state in order
to reconstruct the data structures
Fairness compromized at recovery - queue order is arbitrary
10/17/2002
Fault tolerant DS - a survey
6
#3: Helary, Mostefaoui - ‘94
Worst case O(lg n) for mutex, average O(lg n) for fault-tolerance
Fairness compromized at recovery - queue order is arbitrary
Best of both the worlds: Naimi (dynamic), Raymond(static)
Node topology preserves binomial tree (open cube) structure
Assumes finite message delay for detection of node failure
10/17/2002
Fault tolerant DS - a survey
7
#3 Contd. - Fault Tolerance
Handles multiple and simultaneous node failures, no link failures
Due to bounded message delays and estimated critical section
lengths, a node can suspect a failure by timeouts.
Enquire appropriate nodes about their state and deduce
whether the suspicion is correct.
In case of failure, some data structures are updated, tree is
reconfigured
10/17/2002
Fault tolerant DS - a survey
8
#4: Stumm, Zhou
Extension of coherent DSM algorithms for fault tolerance
Tolerance to single host failures, no link failures
Availability of broadcast (multicast) communication is assumed
Communication cost assumed to be larger than memory access
costs
Critical state (DSM data, state info) replicated to multiple hosts
Central server algo: central server manages data
(request/response)
Backup the central server state, fault detection by timeouts
Full replication algo: all data on each host, local reads,
sequenced writes (by sequencers), simple recovery by obtaining
latest copy from other hosts
10/17/2002
Fault tolerant DS - a survey
9
#4 Contd.
Migration algo: Data always migrated to the accessing host, can
be integrated with the VMM of the host OS, maintain sequential
consistency
Read replication algo: Similar to migration but no migration for
read accesses
Comparisons:
Fault tolerant versions of central server and full replication
do not introdue a substantial overhead
Central server is better for infrequent shared data accesses
Full replication is better for sparse write accesses
10/17/2002
Fault tolerant DS - a survey
10
#5: Nett, Mock, Theisohn
Specification and realization of a generalized distributed
dependency management system for fault tolerance
Key observation: Management of dynamically evolving
dependencies is a key problem for fault tolerance
Dependency graph: nodes as sites, edges as dependencies
Dependency set: set of nodes reachable from start nodes
Traversal of dependency set yields solution: at each node
The node is added to the dep. set ?
Search continues ?
Which dep. types to follow now ?
In which direction ?
10/17/2002
Fault tolerant DS - a survey
11
#5 Contd - Example
Stability: completed actions have
stable dependencies
The proposed system supports
only applications that have
stable dependency states
Does our application conform to
this requirement ?
Next slide
10/17/2002
Fault tolerant DS - a survey
12
#5 Contd - Language
Generate a state machine with
this language and, verify for
stability
10/17/2002
Fault tolerant DS - a survey
13
#5 Contd - Example
Send messages and generate
global dependency graph to look
for inconsistencies
10/17/2002
Fault tolerant DS - a survey
14
#6: Ballarini, Bernardi, Donatelli
Formal and theoretical discussion of synchronization problem in
distributed systems
10/17/2002
Fault tolerant DS - a survey
15
#7: Hardekopf, Kwiat, Upadhyaya
Security and fault tolerance for aerospace distributed
communications by means of voting mechanism
Basic voting protocol:
Initiator sends a request to one of the voters
The voter multicasts a request to other voters
Voters decide locally and send their vote to the client
client waits for f+1 alike votes where f is the degree of fault
tolerance
Problems: not scalable, assumes exact voting
Proposes a model of set of voters, interface module and user
where majority of voters are assumed to be always correct !!
10/17/2002
Fault tolerant DS - a survey
16
#7 Contd - The algorithm
Voters are authenticated when they
commit
O(1) messages with any number of
voters
Hashing and digital signatures for
WAN
Each voter will follow these steps
If no other voter has committed an answer, commit self-answer and skip
If some voter has committed and the result conflict, then commit self-answer
After everyone had a chance, determine the majority result
If majority is in conflict, then go to step 1.
Interface module will receive first commit, buffer it and start the timer.
If new commit is received old is over-written and timer is restarted.
Otherwise the buffered result is output to the client
10/17/2002
Fault tolerant DS - a survey
17
#8: Injong Rhee
Fault tolerance in distributed resource allocation by locality
Failure locality = max. number of processes dead due to failure
Depends on degree of distributedness of algorithm
Tight lower bound on failure locality
is 2 where, is the maximum
number of conflicting processes
Dining philosopher problemhas the
optimal failure locality
Proposed algorithm breaks the chain by not allowing a process
to wait for a process which is also blocked, achieves 2
10/17/2002
Fault tolerant DS - a survey
18
#8 Contd - Model
Each process is modeled in steps: send, receive, local, fail and
local state
Guarded command set: [receive -> Action,...]
Configuration C is a set of current local
Network is a process too
Enabling and execution of guarded sommand set
System is initial configuration C0 and execution of guarded
command set of all the processes
No constraints on relative timing of execution of process steps
Priorities by logical clocks (with waiting factor)
Centralized resource managers manage exclusion by
responding to the requests for resource (with preemption)
10/17/2002
Fault tolerant DS - a survey
19
#9: Goyer, Momtahan, Selic
Hierarchical control strategy for fault tolerance (centralized app)
Simplicity, efficiency, predictability
Transient and hard failures - recovers or doesn’t recover ?
Soft QoS guarantees in terms of response times
10/17/2002
Fault tolerant DS - a survey
20
#10: Frank Mueller
Extension of Naimi’s protocol - single node failures
Backup communication in form of ring structure
Fault handler node: Find out the faulty node by timeout, collect
fault information (from 2 nodes) and take appropriate action
10/17/2002
Fault tolerant DS - a survey
21
Some ideas - for our protocol
The first node experiencing timeout may send a message to
every other node asking for their states
The node experiencing timeout must be: waiting for a
grant/token to arrive OR waiting for a release to arrive
In case of grant/token, propagate the message to the parent
chain to detect the pooint of failure
In case of release, send the message down the chain of children
to see who is supposed to send a release
Once detected the point, recalculate the state of the affected
nodes - the ones falling on the chain
Assuming no link failures for now, multiple failures can be
detected
10/17/2002
Fault tolerant DS - a survey
22
Summary
Approaches fall into following categories:
Embedded fault tolerance into the base algorithms - inspiring,
concrete, but hard to import
Generalized approaches - demonstrate strength of
applicability but no concrete algorithmic description
Voting approaches - can be used to reach consensus, useful
in case of multiple failures
Centralized approach - simple, predictable but may be not
desirable
Replication based approaches - simple, incur overhead,
sometimes the only solution (esp for complex algorithms)
No prominent solution for detecting failure - even for very specific
problems/algorithms
10/17/2002
Fault tolerant DS - a survey
23
Questions
??
10/17/2002
Fault tolerant DS - a survey
24