PPT Chapter 18

Download Report

Transcript PPT Chapter 18

Chapter 18
Distributed Control Algorithms
Copyright © 2008
Introduction
•
•
•
•
•
•
•
•
Operation of Distributed Control Algorithms
Correctness of Distributed Control Algorithms
Distributed Mutual Exclusion
Distributed Deadlock Handling
Distributed Scheduling Algorithms
Distributed Termination Detection
Election Algorithms
Practical Issues in Using Distributed Control Algorithms
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.2
2
OS Control Functions in a
Distributed Environment
• Special features of distributed OS control functions
– Mutual exclusion
• Involves synchronization of processes in different computers
– Deadlock handling
• Deadlocks may involve use of resources in different hosts
– Scheduling
• Perform load balancing for comparable loading of computers
– Termination detection
• Check whether all processes of a computation, which may
operate in different computers, have completed
– Election
• Elect coordinator for a privileged function
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.3
3
Operation of Distributed Control
Algorithms (continued)
• A distributed control algorithm operates in parallel with
its clients, so that it can respond readily to events
related to its service
– Each process has a control part and a basic part
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.4
4
Correctness of Distributed Control
Algorithms
• Algorithm correctness has two facets:
– Liveness: eventually performs correct actions
• i.e., without indefinite delays
– Safety: does not perform wrong actions
• Proving correctness of a distributed algorithm is a
complex task
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.5
5
Correctness of Distributed Control
Algorithms (continued)
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.6
6
Distributed Mutual Exclusion
• A Permission-Based Algorithm
• Token-Based Algorithms
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.7
7
A Permission-Based Algorithm
• Ricart and Agrawala algorithm: CS entry in FCFS order
– Requires 2 x (n – 1 ) messages per CS entry
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.8
8
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.9
9
Maekawa Algorithm
• Each process has a request set of processes; it seeks
the permission of only processes in the request set
(Ri represents the request set of process Pi)
– Correctness is ensured through the following rules:
• For all Pi :
Pi is included in Ri
• For all Pi, Pj: Ri ∩ Rj is non-null
– The algorithm requires 2 x √n messages per CS entry
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.10
10
Token-Based Algorithms for Mutual
Exclusion
• A token represents the privilege to use a CS
– Only a process possessing token can enter CS
• Safety of algorithm follows from this rule
• Liveness: must ensure token eventually reaches a
(requesting) process
• These algorithms use abstract system models
– Edges represent the paths used to pass control
messages
– For example:
• Abstract ring and tree topologies
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.11
11
An Algorithm Employing the Ring
Topology
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.12
12
Raymond’s Token-Based Algorithm
•
Features of the algorithm
– The algorithm uses an abstract inverted tree to reduce
the number of messages. It has three invariants
• Pholder, the process holding the token, is at root of the tree
• Each process in the system belongs to the tree
• Each process other than the Pholder has only one edge,
which points to its parent in the tree
Thus, each process has a path that ends on Pholder
– Each process has a local request queue
• When it receives a request, it puts the requestor’s id in the
queue
• When it makes a request, it puts its own id in the queue
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.13
13
Raymond’s Algorithm
• Uses an abstract inverted tree as the system model
Token is transferred to P3
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.14
14
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.15
15
Distributed Deadlock Handling
• Deadlock detection, prevention, and avoidance
approaches studied earlier use state information
– Problems arise in extending these approaches to a
distributed system
• Approaches in distributed systems:
– Detection: centralized and distributed
– Prevention
• No special techniques for distributed deadlock
avoidance have been discussed in OS literature
• Model used in this section:
– SISR model of resource allocation
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.16
16
Problems in Centralized Deadlock
Detection
• Steps in centralized deadlock detection:
– Collect WFGs of all nodes at a central node
– Superimpose them to form a merged WFG
– Use a conventional deadlock detection algorithm
• Problem: can lead to phantom deadlocks
– Violates safety property in deadlock detection
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.17
17
Distributed Deadlock Detection
• Key issues in deadlock detection:
– A cycle indicates a deadlock in an SISR system
– A knot indicates a deadlock in an MISR system
• Distributed deadlock detection approach:
– Cycles and knots detected through joint actions of nodes
in system
– Every node can detect and declare a deadlock
• Two such algorithms:
– Diffusion computation-based algorithm
– Edge chasing algorithm
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.18
18
Diffusion Computation-Based
Deadlock Detection
• Diffusion computation: used to collect info about nodes
– Diffusion phase
• Computation that has originated in one node, spreads to
other nodes
– A control message called a query is sent along each edge
– The first query received by a node is called an engaging query.
On receiving it, the node sends queries along all its out-edges
– Information collection phase
• Each node sends information in response to each query
– It sends a dummy reply for a non-engaging query
– It collects information from all replies it received, adds its own
information, and sends the result as the reply to the engaging
query. We call it an engaging reply
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.19
19
Diffusion Computation-Based
Algorithm
• Algorithm 18.4 was proposed by Chandy, Misra, and
Haas (1983)
– Works for SISR and MISR systems
• P2, P3 are blocked. P1 becomes blocked and sends a query but does
not receive a reply because P4 is not blocked
• P4 requests a resource held by P1, becomes blocked and sends a query.
It would receive a reply and declare a deadlock
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.20
20
Diffusion Computation-Based
Algorithm (continued)
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.21
21
Mitchell–Merritt Algorithm
for Distributed Deadlock Detection
• It is an edge chasing algorithm—control messages are
sent over WFG edges to detect cycles
– A provision is made to ensure that the cycle has not been
broken before it was detected
• Each process is assigned a public label and a private label
– The labels are identical when a process is created
– The public label of a process changes when it gets blocked on
a resource request
– It also changes when it waits for a process having a larger
public label
• A wait-for edge with a specific relation between public and
private labels of its source and destination processes
indicates presence of a deadlock
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.22
22
Mitchell-Merritt Algorithm
• Rules of the algorithm:
public label
private label
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.23
23
Distributed Deadlock Prevention
• Cycles are prevented as follows:
– A pair (local time, node id) is used to time-stamp creation
of a process
– When process Pi requests a resource allocated to Pj,
time-stamps of Pi, Pj are used to decide whether Pi can
wait for Pj
• Two approaches
– Wait-or-die
• Pi is allowed to wait if older than Pj; otherwise, it is killed
– Would-or-wait
• Pi is allowed to wait if younger than Pj; otherwise Pj is killed
• A killed process retains original timestamp if restarted
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.24
24
Distributed Scheduling Algorithms
• A distributed scheduling algorithm balances
computational loads in the nodes
– Uses process migration
• Apply a threshold to decide if a node is heavily or lightly
loaded
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.25
25
Distributed Scheduling Algorithms
(continued)
• Migration may be preemptive or nonpreemptive
• Stability is an important issue
– An unstable algorithm may lead to a situation similar to
thrashing
• A process is transferred very often; does not make progress
• Algorithms can be sender- or receiver-initiated
– A heavily loaded node is a sender
– A lightly loaded node is a receiver
– A symmetrically initiated algorithm contains features of
both
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.26
26
Distributed Scheduling Algorithms
(continued)
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.27
27
Distributed Scheduling Algorithms
(continued)
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.28
28
Distributed Termination Detection
• When a process terminates, OS frees its resources
– This approach is not adequate for distributed systems
• Processes of a distributed computation execute in
different nodes of a distributed system. They should be
terminated when all of them have completed their tasks
– These processes perform work assigned to them
• A process is active when it is performing work, and passive
when it has no work
• Work is assigned to a process through a message
– Passive process becomes active on receiving a message
– Distributed termination condition (DTC):
• All processes of a distributed computation are passive
• No basic messages are in transit
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.29
29
Distributed Termination Detection
(continued)
• Credit-distribution-based termination detection
– Every process is assigned a numerical credit
• It sends some of its credit in each message
– When a process becomes passive, it sends its credit to
collector process
– The distributed computation is known to have terminated
when credit accumulated by collector is C
• Diffusion computation-based termination detection
– Each process that becomes passive initiates a diffusion
computation to determine if DTC holds
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.30
30
Distributed Termination Detection
(continued)
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.31
31
Election Algorithms
• Critical system functions are assigned to a coordinator
(for the function)
– E.g., replacing lost token in a token-based algorithm
• Coordinator is typically the highest-priority process in
set
• A process that finds that coordinator is not responding
assumes it has failed
– Initiates election algorithm
• Algorithm chooses highest-priority nonfailed process as
new coordinator
– Then, announces its id to all nonfailed processes
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.32
32
Election Algorithms for Unidirectional
Ring Topologies
• Links in ring are assumed to be FIFO channels
• Assumption: control part of failed process continues to
function and forwards received messages along outedge
• Election performed by obtaining ids of all nonfailed
processes and electing highest-priority process
• Two types of messages:
– “elect me” and “new coordinator”
• O(n2) messages or 3n−1 (worst case) in refined version
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.33
33
Election Algorithms
•
Algorithm for the ring topology
• Process Pi initiates by sending (“elect me”, Pi) message
• Process Pj receiving an (“elect me”, Pi) message sends an
(“elect me”, Pj) message and then forwards Pi’s message
• Pi receives back its own message after receiving message
of every other process; it elects the highest priority process
as leader
– It sends a “new coordinator” message to inform others
– Algorithm 2: Refinement of algorithm 1
• In Step 2, Pj sends its own message if its priority is higher
than Pi’s; otherwise, it sends Pi’s message
• Only highest priority process would get back its own
message
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.34
34
Election Algorithm
•
Bully algorithm
1. Initiator Pi sends (“elect me”, Pi) messages to all higher
priority processes and starts a time-out interval T1
a. If a time-out occurs, it sends a “new coordinator” message
to lower priority processes
b. If it receives a “don’t you dare” message from a higher
priority process Pj, it starts another time-out interval T2
i.
If a time-out occurs, it starts a new election
2. If a process Pj receives an “elect me” message from a
lower priority process
a. It sends a “don’t you dare” message to its sender
b. Starts a new election by sending (“elect me”, Pj) messages
•
Requires O(n2) messages per election
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.35
35
Practical Issues in Using Distributed
Control Algorithms
• Resource Management
• Process Migration
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.36
36
Resource Management
• Name server in each node must be updated when
resources are added
– Solution: use an arrangement of name servers as in the
domain name service (DNS)
• Only the name server of a domain needs to be updated
when a resource is added
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.37
37
Process Migration
• Reasons for process migration:
– To achieve load balancing
– To reduce network traffic involved in utilizing a remote
resource
– To provide availability of services when a node has to be
shut down for maintenance
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.38
38
Process Migration
• Difficulties
– Process state is distributed in various data structures of
the OS
– Process id’s may change due to migration
• Process id’s are used in interprocess communication
• Solution: Use global process ids as in Sun cluster
– Delivery of messages requires a special provision
• A node receiving a message would redirect it if the
destination process has migrated out of it
– This residual state causes poor reliability
• Alternatively, all processes may be informed when a process
is migrated
– Requires a complex protocol
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.39
39
Summary
• Actions of a distributed control algorithms are
performed in many nodes of the system
– Two aspects of correctness are liveness and safety
• Distributed system models: physical and logical
• Examples of distributed control algorithms:
–
–
–
–
–
Distributed mutual exclusion: e.g., token based
Distributed deadlock detection: diffusion computation
Distributed scheduling (to balance load)
Distributed termination: e.g., credit-based
Election (highest priority process wins)
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
18.40
40