Distributed synchronization

Download Report

Transcript Distributed synchronization

Distributed Synchronization: outline
 Introduction
 Causality and time
o Lamport timestamps
o Vector timestamps
o Causal communication
 Snapshots
This presentation is based on the book: “Distributed operating-systems & algorithms” by Randy Chow and Theodore Johnson
Operating Systems, 2012, Danny Hendler & Roie Zivan
1
Distributed systems
 A distributed system is a collection of
independent computational nodes,
communicating over a network, that is
abstracted as a single coherent system
o Grid computing
o Cloud computing (“infrastructure as a
service”, “software as a service”)
o Peer-to-peer computing
o Sensor networks
o …
 A distributed operating system allows
sharing of resources and coordination
of distributed computation in a
transparent manner
Operating Systems, 2012, Danny Hendler & Roie Zivan
(a), (b) – a distributed system
(c) – a multiprocessor
2
Distributed synchronization
 Underlies distributed operating systems and algorithms
 Processes communicate via message passing (no shared memory)
 Inter-node coordination in distributed systems challenged by
o
o
o
o
o
Lack of global state
Lack of global clock
Communication links may fail
Messages may be delayed or lost
Nodes may fail
 Distributed synchronization supports correct coordination in distributed
systems
o May no longer use shared-memory based locks and semaphores
Operating Systems, 2012, Danny Hendler & Roie Zivan
3
Distributed Synchronization: outline
 Introduction
 Causality and time
o Lamport timestamps
o Vector timestamps
o Causal communication
 Snapshots
Operating Systems, 2012, Danny Hendler & Roie Zivan
4
Distributed computation model
 Events
o Sending a message
o Receiving a message
o Timeout, internal interrupt
 Processors send control messages to each other
o send(destination, action; parameters)
 Processes may declare that they are waiting for events:
o Wait for A1, A2, …., An
A1(source; parameters)
code to handle A1
.
.
An(source; parameters)
code to handle An
Operating Systems, 2012, Danny Hendler & Roie Zivan
5
Causality and events ordering
 A distributed system has no global state nor global clock
 no global order on all events may be determined
 Each processor knows total orders on events occurring in it
 There is a causal relation between the sending of a message and its
receipt
Lamport's happened-before relation H
1.
2.
3.
e1 <p e2  e1 <H e2 (events within same processor are ordered)
e1 <m e2  e1 <H e2 (each message m is sent before it is received)
e1 <H e2 AND e2 <H e3  e1 <H e3 (transitivity)
Leslie Lamport (1978): “Time, clocks, and the ordering of events in a distributed system”
Operating Systems, 2012, Danny Hendler & Roie Zivan
6
Causality and events ordering (cont'd)
P1
P3 Time
P2
e1
e2
e3
e4
e5
e6
e7
e8
e1 <H e7 ?
Yes.
e5 <H e7 ?
No.
e1 <H e3 ?
Operating Systems, 2012, Danny Hendler & Roie Zivan
Yes.
e1 <H e8 ?
Yes.
7
Lamport's timestamp algorithm
1 Initially my_TS=0
2 Upon event e,
3 if e is the receipt of message m
4
my_TS=max(m.TS, my_TS)
5 my_TS++
6 e.TS=my_TS
7 If e is the sending of message m
8
m.TS=my_TS
To create a total order, ties are broken by process ID
Operating Systems, 2012, Danny Hendler & Roie Zivan
8
Lamport's timestamps (cont'd)
P1
1.1
P2
e1
e2
e3
2.1
1.2
2.2
e4
e5
3.2
e6
3.1
Time
P3
1.3
e7
e8
Operating Systems, 2012, Danny Hendler & Roie Zivan
4.3
9
Distributed Synchronization: outline
 Introduction
 Causality and time
o Lamport timestamps
o Vector timestamps
o Causal communication
 Snapshots
Operating Systems, 2012, Danny Hendler & Roie Zivan
10
Vector timestamps - motivation
 Lamport's timestamps define a total order
o e1 <H e2  e1.TS < e2.TS
o However, e1.TS < e2.TS  e1 <H e2 does not hold, in general.
(concurrent events ordered arbitrarily)
Definition: Message m1 casually precedes message m2 (written as m1 <c m2) if
s(m1) <H s(m2) (sending m1 happens before sending m2)
Definition: causality violation occurs if m1 <c m2 but r(m2) <p r(m1).
In other words, m1 is sent to processor p before m2 but is received after it.
 Lamport's timestamps do not allow to detect (hence nor prevent)
causality violations.
Operating Systems, 2012, Danny Hendler & Roie Zivan
11
Causality violations – an example
P1
P2
P3
Migrate O
to P2
Where is O?
On p2
M2
M1
M3
Where is O?
I don’t
know
ERROR!
Causality violation between… M1 and M3.
Operating Systems, 2012, Danny Hendler & Roie Zivan
12
Vector timestamps
1 Initially my_VT=[0,…,0]
2
3
4
5
6
7
8
9
Upon event e,
if e is the receipt of message m
for i=1 to M
my_VT[i]=max(m.VT[i], my_VT[i])
My_VT[self]++
e.VT=my_VT
if e is the sending of message m
m.VT=my_VT
e1.VT ≤V e2.VT if and only if e1.VT[i] ≤ e2.VT[i], for every i=1,…,M
e1.VT <V e2.VT if and only if e1.VT ≤V e2.VT and e1.VT ≠ e2.VT
For vector timestamps it does hold that: e1 <VT e2
Operating Systems, 2012, Danny Hendler & Roie Zivan
e1 <H e2
13
An example of a vector timestamp
P1
P2
P3
P4
1
1
1
1
2
2
3
2
3
4
2
3
3
4
5
4
5
6
e
e.VT=(3,6,4,2)
Operating Systems, 2012, Danny Hendler & Roie Zivan
14
Comparison of vector timestamps
P1
P2
P3
P4
1
1
1
1
2
2
3
2
3
4
2
3
e3
3
4
5
4
5
e1
6
e2
e1.VT=(5,4,1,3)
Operating Systems, 2012, Danny Hendler & Roie Zivan
e2.VT=(3,6,4,2)
e3.VT=(0,0,1,3)
15
VTs can be used to detect causality violations
P1
P2
P3
(1,0,0)
Migrate O
to P2
(0,0,1)
Where is O?
(2,0,1)
On p2
M2
(3,0,1)
M1
(3,1,3)
M3
(3,0,2)
Where is O?
(3,0,3)
I don’t
know
(3,2,3)
ERROR!
(3,2,4)
(3,3,3)
Causality violation between… M1 and M3.
Operating Systems, 2012, Danny Hendler & Roie Zivan
16
Distributed Synchronization: outline
 Introduction
 Causality and time
o Lamport timestamps
o Vector timestamps
o Causal communication
 Snapshots
Operating Systems, 2012, Danny Hendler & Roie Zivan
17
Preventing causality violations
A processor cannot control the order in which it receives messages…
But it may control the order in which they are delivered to applications
Application
Deliver in
FIFO order
Message arrival order
FIFO ordering
Assign
timestamps
Message passing
x1
x2
x4
x5
x3
Deliver
x1
Deliver
x2
Buffer
x4
Buffer
Deliever
x3 x4 x5
x5
Protocol for FIFO message delivery
(as in, e.g., TCP)
Operating Systems, 2012, Danny Hendler & Roie Zivan
18
Preventing causality violations (cont'd)
 Senders attach a timestamp to each message
 Destination delays the delivery of out-of-order messages
 Hold back a message m until we are assured that no message m’ <H m will
be delivered
o For every other process p, maintain the earliest timestamp of a message m
that may be delivered from p
o Do not deliver a message if an earlier message may still be delivered from
another process
Operating Systems, 2012, Danny Hendler & Roie Zivan
19
Algorithm for preventing causality violations
1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]
2 blocked[1…M] initially [ {}, …, {} ]
3
4
5
6
7
8
9
10
11
12
13
14
15
For each processor p,
the earliest timestamp
with which a message
from p may still be
delivered
Upon the receipt of message m from processor p
Delivery_list = {}
If (blocked[p] is empty)
earliest[p]=m.timestamp
Add m to the tail of blocked[p]
While ( k such that blocked[k] is non-empty AND
 i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])
remove the message at the head of blocked[k], put it in delivery_list
if blocked[k] is non-empty
earliest[k]  m'.timestamp, where m' is at the head of blocked[k]
else
increment the k'th element of earliest[k]
End While
Deliver the messages in delivery_list, in causal order
Operating Systems, 2012, Danny Hendler & Roie Zivan
20
Algorithm for preventing causality violations
1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]
2 blocked[1…M] initially [ {}, …, {} ]
3
4
5
6
7
8
9
10
11
12
13
14
15
Upon the receipt of message m from processor p
For each process p,
Delivery_list = {}
the messages from p
that were received but
If (blocked[p] is empty)
were not delivered yet
earliest[p]=m.timestamp
Add m to the tail of blocked[p]
While ( k such that blocked[k] is non-empty AND
 i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])
remove the message at the head of blocked[k], put it in delivery_list
if blocked[k] is non-empty
earliest[k]  m'.timestamp, where m' is at the head of blocked[k]
else
increment the k'th element of earliest[k]
End While
Deliver the messages in delivery_list, in causal order
Operating Systems, 2012, Danny Hendler & Roie Zivan
21
Algorithm for preventing causality violations
1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]
2 blocked[1…M] initially [ {}, …, {} ]
3
4
5
6
7
8
9
10
11
12
13
14
15
Upon the receipt of message m from processor p
Delivery_list = {}
If (blocked[p] is empty)
List of messages to be
earliest[p]=m.timestamp
delivered as a result
Add m to the tail of blocked[p]
of m’s receipt
While ( k such that blocked[k] is non-empty AND
 i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])
remove the message at the head of blocked[k], put it in delivery_list
if blocked[k] is non-empty
earliest[k]  m'.timestamp, where m' is at the head of blocked[k]
else
increment the k'th element of earliest[k]
End While
Deliver the messages in delivery_list, in causal order
Operating Systems, 2012, Danny Hendler & Roie Zivan
22
Algorithm for preventing causality violations
1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]
2 blocked[1…M] initially [ {}, …, {} ]
If no blocked
3
4
5
6
7
8
9
10
11
12
13
14
15
messages from p,
update earliest[p]
Upon the receipt of message m from processor p
Delivery_list = {}
If (blocked[p] is empty)
earliest[p]=m.timestamp
Add m to the tail of blocked[p]
While ( k such that blocked[k] is non-empty AND
 i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])
remove the message at the head of blocked[k], put it in delivery_list
if blocked[k] is non-empty
earliest[k]  m'.timestamp, where m' is at the head of blocked[k]
else
increment the k'th element of earliest[k]
End While
Deliver the messages in delivery_list, in causal order
Operating Systems, 2012, Danny Hendler & Roie Zivan
23
Algorithm for preventing causality violations
1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]
2 blocked[1…M] initially [ {}, …, {} ]
3
4
5
6
7
8
9
10
11
12
13
14
15
Upon the receipt of message m from processor p
m is now the most
Delivery_list = {}
recent message from
If (blocked[p] is empty)
p not yet delivered
earliest[p]=m.timestamp
Add m to the tail of blocked[p]
While ( k such that blocked[k] is non-empty AND
 i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])
remove the message at the head of blocked[k], put it in delivery_list
if blocked[k] is non-empty
earliest[k]  m'.timestamp, where m' is at the head of blocked[k]
else
increment the k'th element of earliest[k]
End While
Deliver the messages in delivery_list, in causal order
Operating Systems, 2012, Danny Hendler & Roie Zivan
24
Algorithm for preventing causality violations
1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]
2 blocked[1…M] initially [ {}, …, {} ]
3
4
5
6
7
8
9
10
11
12
13
14
15
Upon the receipt of message m from processor p
Delivery_list = {}
If there is a process k with delayed
messages for which it is now safe
If (blocked[p] is empty)
to deliver its earliest message…
earliest[p]=m.timestamp
Add m to the tail of blocked[p]
While ( k such that blocked[k] is non-empty AND
 i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])
remove the message at the head of blocked[k], put it in delivery_list
if blocked[k] is non-empty
earliest[k]  m'.timestamp, where m' is at the head of blocked[k]
else
increment the k'th element of earliest[k]
End While
Deliver the messages in delivery_list, in causal order
Operating Systems, 2012, Danny Hendler & Roie Zivan
25
Algorithm for preventing causality violations
1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]
2 blocked[1…M] initially [ {}, …, {} ]
3
4
5
6
7
8
9
10
11
12
13
14
15
Upon the receipt of message m from processor p
Remove k'th earliest message
Delivery_list = {}
from blocked queue, make sure it
If (blocked[p] is empty)
is delivered
earliest[p]=m.timestamp
Add m to the tail of blocked[p]
While ( k such that blocked[k] is non-empty AND
 i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])
remove the message at the head of blocked[k], put it in delivery_list
if blocked[k] is non-empty
earliest[k]  m'.timestamp, where m' is at the head of blocked[k]
else
increment the k'th element of earliest[k]
End While
Deliver the messages in delivery_list, in causal order
Operating Systems, 2012, Danny Hendler & Roie Zivan
26
Algorithm for preventing causality violations
1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]
2 blocked[1…M] initially [ {}, …, {} ]
3
4
5
6
7
8
9
10
11
12
13
14
15
Upon the receipt of message m from processor p
Delivery_list = {}
If there are additional blocked
If (blocked[p] is empty)
messages of k, update earliest[k] to be
the timestamp of the earliest such
earliest[p]=m.timestamp
message
Add m to the tail of blocked[p]
While ( k such that blocked[k] is non-empty AND
 i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])
remove the message at the head of blocked[k], put it in delivery_list
if blocked[k] is non-empty
earliest[k]  m'.timestamp, where m' is at the head of blocked[k]
else
increment the k'th element of earliest[k]
End While
Deliver the messages in delivery_list, in causal order
Operating Systems, 2012, Danny Hendler & Roie Zivan
27
Algorithm for preventing causality violations
1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]
2 blocked[1…M] initially [ {}, …, {} ]
3
4
5
6
7
8
9
10
11
12
13
14
15
Upon the receipt of message m from processor p
Delivery_list = {}
Otherwise the earliest message that
If (blocked[p] is empty)
may be delivered from k would have
earliest[p]=m.timestamp
previous timestamp with the k'th
component incremented
Add m to the tail of blocked[p]
While ( k such that blocked[k] is non-empty AND
 i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])
remove the message at the head of blocked[k], put it in delivery_list
if blocked[k] is non-empty
earliest[k]  m'.timestamp, where m' is at the head of blocked[k]
else
increment the k'th element of earliest[k]
End While
Deliver the messages in delivery_list, in causal order
Operating Systems, 2012, Danny Hendler & Roie Zivan
28
Algorithm for preventing causality violations
1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]
2 blocked[1…M] initially [ {}, …, {} ]
3
4
5
6
7
8
9
10
11
12
13
14
15
Upon the receipt of message m from processor p
Delivery_list = {}
Finally, deliver set of messages that
If (blocked[p] is empty)
will not cause causality violation (if
there are any).
earliest[p]=m.timestamp
Add m to the tail of blocked[p]
While ( k such that blocked[k] is non-empty AND
 i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])
remove the message at the head of blocked[k], put it in delivery_list
if blocked[k] is non-empty
earliest[k]  m'.timestamp, where m' is at the head of blocked[k]
else
increment the k'th element of earliest[k]
End While
Deliver the messages in delivery_list, in causal order
Operating Systems, 2012, Danny Hendler & Roie Zivan
29
Execution of the algorithm as multicast
p3 state
P1
P2
(0,1,0)
P3
earliest[1] earliest[2] earliest[3]
(1,0,0)
(0,1,0)
(0,0,1)
m1
Upon receipt of m2
(1,1,0) m2
(0,1,0)
(0,0,1)
(1,1,0)
Upon receipt of m1
m2
(1,1,0) m2
(0,1,0) m1
(0,0,1)
deliver m1
(1,1,0) m2
(0,2,0)
(0,0,1)
deliver m2
(2,1,0)
(0,2,0)
(0,0,1)
Since the algorithm is “interested” only in causal order of sending
events, vector timestamp is incremented only upon send events.
Operating Systems, 2012, Danny Hendler & Roie Zivan
30
Distributed Synchronization: outline
 Introduction
 Causality and time
o Lamport timestamps
o Vector timestamps
o Causal communication
 Snapshots
Operating Systems, 2012, Danny Hendler & Roie Zivan
31
Snapshots motivation – phantom deadlock
 Assume we would like to implement a distributed deadlock-detector
 We record process states to check if there is a waits-for-cycle
P1
r1
r2
P2
request
r1
OK
p1
request
1
2
p2
r2
Observed waits-for graph
OK
release
request
r1
3
OK
request
p1
4
p2
r2
Actual waits-for graph
Operating Systems, 2012, Danny Hendler & Roie Zivan
32
What is a snapshot?
 Global system state:
o S=(s1, s2, …, sM) – local processor states
o The contents Li,j =(m1, m2, …, mk) of each communication channel Ci,j
(channels assumed to be FIFO)
These are messages sent but not yet received
 Global state must be consistent
o If we observe in state si that pi received message m from pk, then in
observation sk ,k must have sent m.
o Each Li,j must contain exactly the set of messages sent by pi but not
yet received by pj, as reflected by si, sj.
• Snapshot state much be consistent, one that might have existed
during the computation.
• Observations must be mutually concurrent that is, no observation
casually precedes another observation (a consistent cut)
Operating Systems, 2012, Danny Hendler & Roie Zivan
33
Snapshot algorithm – informal description
 Upon joining the algorithm, a process records its local state
 The process that initiates the snapshot sends snapshot tokens to its
neighbors (before sending any other messages)
o Neighbors send them to their neighbors – broadcast
 Upon receiving a snapshot token:
o a process records its state prior to sending/receiving additional messages
o Must then send tokens to all its other neighbors
 How shall we record sets Lp,q?
o q receives token from p and that is the first time q receives token:
Lp,q={ }
o q receives token from p but q received token before:
Lp,q={all messages received by q from p since q received token}
Operating Systems, 2012, Danny Hendler & Roie Zivan
34
Snapshot algorithm – data structures
 The algorithm supports multiple ongoing snapshots – one per process
 Different snapshots from same process distinguished by version number
Per process variables
integer my_version initially 0
integer current_snap[1..M] initially [0,…,0]
integer tokens_received[1..M]
processor_state S[1…M]
channel_state [1..M][1..M]
Operating Systems, 2012, Danny Hendler & Roie Zivan
The version number
of my current
snapshot
35
Snapshot algorithm – data structures
 The algorithm supports multiple ongoing snapshots – one per process
 Different snapshots from same process distinguished by version number
Per process variables
integer my_version initially 0
integer current_snap[1..M] initially [0,…,0]
integer tokens_received[1..M]
processor_state S[1…M]
channel_state [1..M][1..M]
Operating Systems, 2012, Danny Hendler & Roie Zivan
current_snap[r]
contains version
number of snapshot
initiated by processor r
36
Snapshot algorithm – data structures
 The algorithm supports multiple ongoing snapshots – one per process
 Different snapshots from same process distinguished by version number
Per process variables
integer my_version initially 0
integer current_snap[1..M] initially [0,…,0]
integer tokens_received[1..M]
processor_state S[1…M]
channel_state [1..M][1..M]
Operating Systems, 2012, Danny Hendler & Roie Zivan
tokens_received[r]
contains the number
of tokens received for
the snapshot initiated
by processor r
37
Snapshot algorithm – data structures
 The algorithm supports multiple ongoing snapshots – one per process
 Different snapshots from same process distinguished by version number
Per process variables
integer my_version initially 0
integer current_snap[1..M] initially [0,…,0]
integer tokens_received[1..M]
processor_state S[1…M]
channel_state [1..M][1..M]
Operating Systems, 2012, Danny Hendler & Roie Zivan
processor_state[r]
contains the state
recorded for the
snapshot of processor r
38
Snapshot algorithm – data structures
 The algorithm supports multiple ongoing snapshots – one per process
 Different snapshots from same process distinguished by version number
Per process variables
integer my_version initially 0
integer current_snap[1..M] initially [0,…,0]
integer tokens_received[1..M]
processor_state S[1…M]
channel_state [1..M][1..M]
Operating Systems, 2012, Danny Hendler & Roie Zivan
channel_state[r][q]
records the state of
channel from q to
current processor for
the snapshot initiated
by processor r
39
Snapshot algorithm – pseudo-code
execute_snapshot()
Wait for a snapshot request or a token
Snapshot request:
1
my_version++, current_snap[self]=my_version
2
S[self]  my_state
3
for each outgoing channel q, send(q, TOEKEN, my_version)
4
tokens_received[self]=0
TOKEN(q; r,version):
5. If current_snap[r] < version
6.
S[r]  my state
7.
current_snap[r]=version
8.
L[r][q]  empty, send token(r, version) on each outgoing channel
9.
tokens_received[r]  1
10. else
11. tokens_received[r]++
12. L[r][q]  all messages received from q since first receiving token(r,version)
13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished
Operating Systems, 2012, Danny Hendler & Roie Zivan
40
Snapshot algorithm – pseudo-code
execute_snapshot()
Wait for a snapshot request or a token
Snapshot request:
1
my_version++, current_snap[self]=my_version
2
S[self]  my_state
3
for each outgoing channel q, send(q, TOEKEN, my_version)
4
tokens_received[self]=0
TOKEN(q; r,version):
5. If current_snap[r] < version
6.
S[r]  my state
7.
current_snap[r]=version
8.
L[r][q]  empty, send token(r, version) on each outgoing channel
9.
tokens_received[r]  1
10. else
11. tokens_received[r]++
12. L[r][q]  all messages received from q since first receiving token(r,version)
13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished
Operating Systems, 2012, Danny Hendler & Roie Zivan
41
Snapshot algorithm – pseudo-code
execute_snapshot()
Wait for a snapshot request or a token
Increment version
number of this snapshot,
record my local state
Snapshot request:
1
my_version++, current_snap[self]=my_version
2
S[self]  my_state
3
for each outgoing channel q, send(q, TOEKEN, my_version)
4
tokens_received[self]=0
TOKEN(q; r,version):
5. If current_snap[r] < version
6.
S[r]  my state
7.
current_snap[r]=version
8.
L[r][q]  empty, send token(r, version) on each outgoing channel
9.
tokens_received[r]  1
10. else
11. tokens_received[r]++
12. L[r][q]  all messages received from q since first receiving token(r,version)
13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished
Operating Systems, 2012, Danny Hendler & Roie Zivan
42
Snapshot algorithm – pseudo-code
execute_snapshot()
Wait for a snapshot request or a token
Send snapshot-token on all outgoing
channels, initialize number of
received tokens for my snapshot to 0
Snapshot request:
1
my_version++, current_snap[self]=my_version
2
S[self]  my_state
3
for each outgoing channel q, send(q, TOEKEN; self, my_version)
4
tokens_received[self]=0
TOKEN(q; r,version):
5. If current_snap[r] < version
6.
S[r]  my state
7.
current_snap[r]=version
8.
L[r][q]  empty, send token(r, version) on each outgoing channel
9.
tokens_received[r]  1
10. else
11. tokens_received[r]++
12. L[r][q]  all messages received from q since first receiving token(r,version)
13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished
Operating Systems, 2012, Danny Hendler & Roie Zivan
43
Snapshot algorithm – pseudo-code
execute_snapshot()
Wait for a snapshot request or a token
Upon receipt from q of TOKEN for
snapshot (r,version)
Snapshot request:
1
my_version++, current_snap[self]=my_version
2
S[self]  my_state
3
for each outgoing channel q, send(q, TOEKEN; self, my_version)
4
tokens_received[self]=0
TOKEN(q; r,version):
5. If current_snap[r] < version
6.
S[r]  my state
7.
current_snap[r]=version
8.
L[r][q]  empty, send token(r, version) on each outgoing channel
9.
tokens_received[r]  1
10. else
11. tokens_received[r]++
12. L[r][q]  all messages received from q since first receiving token(r,version)
13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished
Operating Systems, 2012, Danny Hendler & Roie Zivan
44
Snapshot algorithm – pseudo-code
execute_snapshot()
Wait for a snapshot request or a token
If this is first token for snapshot
Snapshot request:
(r,version)
1
my_version++, current_snap[self]=my_version
2
S[self]  my_state
3
for each outgoing channel q, send(q, TOEKEN; self, my_version)
4
tokens_received[self]=0
TOKEN(q; r,version):
5. If current_snap[r] < version
6.
S[r]  my state
7.
current_snap[r]=version
8.
L[r][q]  empty, send token(r, version) on each outgoing channel
9.
tokens_received[r]  1
10. else
11. tokens_received[r]++
12. L[r][q]  all messages received from q since first receiving token(r,version)
13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished
Operating Systems, 2012, Danny Hendler & Roie Zivan
45
Snapshot algorithm – pseudo-code
execute_snapshot()
Wait for a snapshot request or a token
Snapshot request:
Record local state for r'th snapshot
1
my_version++, current_snap[self]=my_versionUpdate version number of r'th snapshot
2
S[self]  my_state
3
for each outgoing channel q, send(q, TOEKEN; self, my_version)
4
tokens_received[self]=0
TOKEN(q; r,version):
5. If current_snap[r] < version
6.
S[r]  my state
7.
current_snap[r]=version
8.
L[r][q]  empty, send token(r, version) on each outgoing channel
9.
tokens_received[r]  1
10. else
11. tokens_received[r]++
12. L[r][q]  all messages received from q since first receiving token(r,version)
13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished
Operating Systems, 2012, Danny Hendler & Roie Zivan
46
Snapshot algorithm – pseudo-code
execute_snapshot()
Wait for a snapshot request or a token
Snapshot request:
1
my_version++, current_snap[self]=my_version
2
S[self]  my_state
3
for each outgoing channel q, send(q, TOEKEN; self, my_version)
4
tokens_received[self]=0
Set of messages on channel from q is empty
Send token on all outgoing channels
TOKEN(q; r,version):
Initialize number of received tokens to 1
5. If current_snap[r] < version
6.
S[r]  my state
7.
current_snap[r]=version
8.
L[r][q]  empty, send token(r, version) on each outgoing channel
9.
tokens_received[r]  1
10. else
11. tokens_received[r]++
12. L[r][q]  all messages received from q since first receiving token(r,version)
13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished
Operating Systems, 2012, Danny Hendler & Roie Zivan
47
Snapshot algorithm – pseudo-code
execute_snapshot()
Wait for a snapshot request or a token
Snapshot request:
1
my_version++, current_snap[self]=my_version
2
S[self]  my_state
3
for each outgoing channel q, send(q, TOEKEN; self, my_version)
4
tokens_received[self]=0
Not the first token for (r,version)
TOKEN(q; r,version):
5. If current_snap[r] < version
6.
S[r]  my state
7.
current_snap[r]=version
8.
L[r][q]  empty, send token(r, version) on each outgoing channel
9.
tokens_received[r]  1
10. else
11. tokens_received[r]++
12. L[r][q]  all messages received from q since first receiving token(r,version)
13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished
Operating Systems, 2012, Danny Hendler & Roie Zivan
48
Snapshot algorithm – pseudo-code
execute_snapshot()
Wait for a snapshot request or a token
Snapshot request:
1
my_version++, current_snap[self]=my_version
2
S[self]  my_state
3
for each outgoing channel q, send(q, TOEKEN; self, my_version)
4
tokens_received[self]=0
Yet another token for snapshot (r,version)
TOKEN(q; r,version):
5. If current_snap[r] < version
6.
S[r]  my state
7.
current_snap[r]=version
8.
L[r][q]  empty, send token(r, version) on each outgoing channel
9.
tokens_received[r]  1
10. else
11. tokens_received[r]++
12. L[r][q]  all messages received from q since first receiving token(r,version)
13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished
Operating Systems, 2012, Danny Hendler & Roie Zivan
49
Snapshot algorithm – pseudo-code
execute_snapshot()
Wait for a snapshot request or a token
Snapshot request:
1
my_version++, current_snap[self]=my_version
2
S[self]  my_state
3
for each outgoing channel q, send(q, TOEKEN; self, my_version)
4
tokens_received[self]=0
These messages are the state of the channel
from q for snapshot (r,version)
TOKEN(q; r,version):
5. If current_snap[r] < version
6.
S[r]  my state
7.
current_snap[r]=version
8.
L[r][q]  empty, send token(r, version) on each outgoing channel
9.
tokens_received[r]  1
10. else
11. tokens_received[r]++
12. L[r][q]  all messages received from q since first receiving token(r,version)
13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished
Operating Systems, 2012, Danny Hendler & Roie Zivan
50
Snapshot algorithm – pseudo-code
execute_snapshot()
Wait for a snapshot request or a token
Snapshot request:
1
my_version++, current_snap[self]=my_version
2
S[self]  my_state
3
for each outgoing channel q, send(q, TOEKEN; self, my_version)
4
tokens_received[self]=0
If all tokens of snapshot (r,version) arrived,
TOKEN(q; r,version):
snapshot computation is over
5. If current_snap[r] < version
6.
S[r]  my state
7.
current_snap[r]=version
8.
L[r][q]  empty, send token(r, version) on each outgoing channel
9.
tokens_received[r]  1
10. else
11. tokens_received[r]++
12. L[r][q]  all messages received from q since first receiving token(r,version)
13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished
Operating Systems, 2012, Danny Hendler & Roie Zivan
51
Sample execution of the snapshot algorithm
 A token passing system
 Each processor receives, in its turn, a privilege token, uses it to perform
privileged operations and then passes it on
 Assume processor p did not receive the token for a long duration so it
invokes the snapshot algorithm to find out why
p
Operating Systems, 2012, Danny Hendler & Roie Zivan
q
52
Sample execution of the snapshot algorithm
p
1. p requests a snapshot
t
q
State(p)={}
p
2. q sends privilege token
t
q
State(p)={}
p
3. Snapshot token arrives at q
State(p)={}
p
4. Snapshot token arrives at p
State(p)={}
Lq,p={ }
Observed state:
Operating Systems, 2012, Danny Hendler & Roie Zivan
p
q
t
State(q)={}, Lp,q={}
q
State(q)={}, Lp,q={}
q
53