Modeling and Analyzing Fault-Tolerant Real

Download Report

Transcript Modeling and Analyzing Fault-Tolerant Real

Modeling and Analyzing
FaultTolerant, Real-Time Communication
Protocols
Nancy Lynch
Theory of Distributed Systems
MIT
Second MURI Workshop
Berkeley, California
June 4, 2001
1
MIT Participants
• Leaders: Nancy Lynch, Idit Keidar
• Students: Carl Livadas, Roger Khazan, Ziv
Bar-Joseph
• Collaborators: Paul Attie, Alex Shvartsman,
Roberto Segala, Frits Vaandrager
2
At Last Year’s Workshop…
3
General Models and Proof Methods
• I/O automaton models [Lynch, Tuttle 87]
– Nondeterministic, infinite-state machines
– Input/output/internal actions, traces
– Modularity: Composition, levels of abstraction
• Mathematical, language-independent
• Used to model distributed algorithms,
communication protocols
• Validation, code generation,
upper and lower bounds
4
Timing, Hybrid Considerations
• Timing: TIOAs [Lynch, Vaandrager]
– Timeout-based algorithms.
– Local clocks, clock synchronization
– Performance analysis
• Hybrid: HIOAs [L, Segala, V, Weinberg 96]
– Real world + computer components
– Continuous flows of data
5
Other Embellishments
• Probabilities: PIOA, PTIOA [Segala 95]
– Probabilistic and nondeterministic behavior.
– Randomized distributed algorithms
– Systems with probabilistic assumptions
• Dynamic systems: DIOA [Attie, Lynch 99]
– Run-time process creation and destruction, mobility.
– Agent systems
6
Communication Protocol Modeling
and Analysis.
•
•
•
•
•
•
At-most-once (AMO) Message Delivery
TCP, T/TCP
Reliable channels from unreliable channels
Self-stabilizing communication protocols
Network clock synchronization
Group communication systems
7
Group Communication Service
•
•
•
•
Communication middleware
Manages group membership, current view
Handles joins, leaves, failures, partitions, merges
Multicast communication among members
– Multicasts respect views
– Ordering, reliability constraints for message delivery,
e.g., FIFO, causal within each view.
• Isis, Transis, Totem,…
8
VStoTO
brcv
bcast
TO
VStoTO
VStoTO
gprcv
newview
gpsnd
VS
9
Conditional Performance Analysis
• Assume VS satisfies:
– If a network component C stabilizes, then soon
thereafter, views become consistent within C, and
messages sent in the final view are delivered
everywhere in C, within bounded time.
• And VStoTO satisfies:
– Simple timing, fault-tolerance assumptions.
• Then TO satisfies:
– If C stabilizes, then soon thereafter, any message
sent or delivered anywhere in C is delivered
everywhere in C, within bounded time.
10
Conditional Performance Analysis
• Give conditional claims about system performance
under particular assumptions about behavior of
environment and of network substrate, e.g.:
–
–
–
–
–
Stabilization of underlying network.
Limited rate of change.
Bounds on message delay.
Limited amount of failure (number, density).
Limited input arrivals (number, density).
• Assumptions => Guarantees.
• Get probabilistic statements as corollaries.
• Composable
11
What we proposed:
1. Model, analyze communication protocols.
2. Develop conditional performance analysis
techniques.
3. Extend I/O automata theory to accommodate
performance, reliability, hybrid, probability,
dynamic considerations.
4. Relate, integrate I/O automata with other
frameworks.
12
Progress this year
1. Communication protocol design/analysis
– Scalable Group Communication
– Totally Ordered Multicast with QoS
– Scalable Reliable Multicast
2. Conditional performance analysis methods
– Evolving…
3. I/O automaton models
– Hybrid I/O Automata
– Dynamic I/O Automata
– IOA language support
4. Comparing, integrating with other models
–
A start…
13
1. Protocol Modeling/Analysis
14
Scalable Group Communication
[Keidar, Khazan 00]
15
Group Communication Service
• Manages group membership, current view.
• Multicast communication among group members,
with ordering, reliability guarantees.
• Virtual Synchrony [Birman, Joseph 87]
– Integrates group membership and group communication.
– Processes that move together from one view to another deliver
the same messages in the first view.
– Useful for replicated data management.
– Before announcing new view, processes must synchronize,
exchange messages.
16
Example: Virtual Synchrony
i
j
3: i,j,k
k
3: i,j,k
3: i,j,k
mcast(m)
rcv(m)
4: i, j
rcv(m)
4: i, j
VS algorithm supplies missing m17
Group Communication in WANs
• Difficulties:
– High message latency, message exchanges are expensive
– Frequent connectivity changes
• New, scalable GC algorithm:
– Uses scalable GM service of [Keidar, Sussman, et al. 00],
implemented on special membership servers.
– GC (with virtual synchrony) implemented on clients.
VS
Net
VSGC
GM
18
Group Communication in WANs
• Try to minimize time from when network stabilizes
until GC delivers new views to clients.
• After stabilization: GM forms view, VSGC algorithm
synchronizes.
Net event
VSGC Algorithm
GM Algorithm
view(v)
• Existing systems (LANs):
– GM, VSGC uses several message exchange rounds
– Continue in spite of new network events
• Inappropriate for WANs
19
New Algorithm
• VSGC uses one message exchange round, in parallel with
GM’s agreement on views.
• GM usually delivers views in one message exchange.
Net event
VSGC Algorithm
GM Algorithm
view(v)
• Responds to new network events during reconfiguration:
– GM produces new membership sets
– VSGC responds to membership changes
• Distributed implementation [Tarashchanskiy 00]
20
Correctness Proofs
• Models, proofs (safety and liveness)
• Developed new incremental modeling, proof methods
[Keidar, Khazan, Lynch, Shvartsman 00]
– Proof Extension Theorem:
S
S’
A
A’
• Used new methods for the safety proofs.
21
Performance Analysis
• Analyze time from when network stabilizes
until GC delivers new views to clients.
• Compare with other strategies.
• System is a composition:
– Network and GM services, plus
– VSGC processes
• Use composition in the analysis.
22
Performance Analysis
1. Analyze the VSGC algorithm alone, in terms
of its inputs and timing assumptions.
2. State reasonable performance guarantees for
GM and Network.
3. Combine to get conditional performance
properties for the system as a whole.
23
1. Analysis of VSGC algorithm
• Assume component C stabilizes:
– GM delivers same views
– Net provides reliable communication with latency .
• Let
– T[start], T[view] be times of last GM events for C
–  be upper bound on local step time.
• Then VSGC outputs new views by time
max (T[start] +  + x, T[view]) + 
24
Analysis of VSGC Algorithm
 + x
VS Algorithm
view(v)
Net Event
start
start
view(v)
GM algorithm
T[start]
T[view]
25
2. Assumed Bounds for GM
T[start]
start
T[view]
start

GM
view(v)

• Bounds for “Fast Path” of [Keidar, et al. 00],
observed empirically in almost all cases.
26
3. Combining VSGC and GM Bounds
• Bounds for system, conditional on GM bounds.
 + x
VSGC
start
view(v)
start
view(v)
T[start]

T[view]
GM

27
Totally Ordered Multicast with QoS
[Bar-Joseph, Keidar, Anker, Lynch 00]
28
Totally Ordered Multicast with QoS
• Multicast to dynamic group, subject to joins, leaves,
and failures.
• Global total ordering of messages
• QoS: Message delivery latency
• Built on reliable network with latency guarantees
• Add ordering guarantees, preserve latency bounds.
• Target applications
–
–
–
–
State machine replication
Military command and control
Distributed games
Shared editing
29
Two Algorithms
• Algorithm 1: Basic Totally Ordered Multicast
– Sends, receives consistent with total ordering of messages.
– Non-failing processes agree on messages from non-failing processes.
– Latency: Constant, even with joins, leaves, failures.
• Algorithm 2: Atomic Multicast
– Non-failing processes agree on all messages.
– Latency:
• Joins, leaves only: Constant
fail_i
• With failures: Linear in f
TOM
Net
fail_j
30
Local Node Process
rcv(m)
join
leave
mcast(m)
FrontEnd_i
joiners(s,J),
leavers(s,J)
end-slot(s)
members(s,J)
Ord_i
Memb_i
mcast(m)
join
leave
mcast(join)
mcast(leave)
progress(s,j)
Sniffer_i
rcv(join)
rcv(leave)
rcv(m)
Net
31
Local Algorithm Operation
• FrontEnd divides time into slots, tags messages with slots.
• Ord delivers messages by slot, in order of process indices.
• Memb determines slot membership.
– Join, leave messages
– Failures:
• Algorithm 1 uses local failure detector.
• Algorithm 2 uses consensus on failures.
– Requires new dynamic version of consensus.
• Timing-dependent
32
Architecture for Algorithm 2
TO-QoS
Net
GM
33
Performance Analysis (Planned)
1. Latency of TO-QoS in terms of GM
2. GM latency bounds
3. Combine
34
Using Caching to Improve Reliable
Multicast Algorithms
[Livadas]
35
SRM [Floyd, et al.]
•
•
•
•
Reliable multicast to dynamic group.
Built over IP multicast
Based on requests (NACKs) and retransmissions
Limits duplicate requests/replies using:
– Deterministic suppression: Ancestors suppress
descendants, by scheduling requests/replies based on
distance to source.
– Probabilistic suppression: Siblings suppress each other,
by spreading out requests/replies.
36
SRM Architecture
SRM
IPMcast
37
New Protocol
• Tries to improve SRM by using loss history
information.
– Useful if future losses occur on same link.
• Uses deterministic suppression for siblings also
• Determines, caches best requestor, best replier
– Chooses requestor closest to source.
– Chooses replier closest to requestor.
– Break ties with processor ids.
• Defaults to SRM
Replier
Requestor
38
Performance
• Metrics:
– Loss recovery latency: Time from detection of packet loss to
receipt of first retransmission
– Loss recovery overhead: Number of messages multicast to
recover from a message loss
• Protocol performance benefits:
– Removes delays caused by probabilistic suppression
– Following election of requestor and replier:
• Reduces latency by using best requestor and replier.
• Reduces overhead by using single requestor and replier.
• Latency analysis (Planned)
39
3. I/O Automaton Models
40
Hybrid I/O Automata
[Lynch, Segala, Vaandrager, HSCC 01]
• New, simpler version of HIOA model of [LSVW96]
• Supports decomposing hybrid system descriptions:
– External behavior: Discrete actions and continuous flows
– Composition: Synchronizes external actions and flows,
respects external behavior
– Abstraction: Implementation and simulation relation
notions, respect external behavior.
• Separate mechanisms:
– External actions for discrete communication.
– External variables for continuous flow.
41
Example: Delay Buffer Del(d)
• Accepts discrete and continuous input,
produces isomorphic output, with delay d.
• Compose in sequence, in cycle:
Del(d1)
Del(d2)
• Composition implements
Del(d1 + d2):
Del(d1)
Del(d2)
Del(d1 + d2)
Del(d1)
Del(d2)
42
Example: Vehicle and Controller
• Keep vehicle speed in [v1, v2].
Vehicle
• Sensor senses velocity, reports to
acc-in
vel-out
Controller every time d.
• Controller suggests
Sensor
Actuator
acceleration.
report(v)
• Vehicle follows suggested
suggest(a)
acceleration, with uncertainty ε.
• Compose: Discrete, continuous
Controller
interactions
• Prove invariant: velocity in [v1,v2].
• Use auxiliary invariants, including timing.
43
HOIA definition
•
•
•
•
•
U, X, Y: Input, output, internal (state) variables
Θ: Initial states
I, O, H: Input, output, internal actions
D, discrete transitions
T, trajectories
– Mappings from time intervals to valuations of variables
• Closure properties
• Input-enabling for actions, flows
• Execution: τ0, a1, τ1, a2, τ2, …
• Trace: Restrict to external variables and actions
44
Composition and Abstraction
• Abstraction:
– A implements B if comparable and traces(A) subset of traces(B).
– Simulation relation: Start, step, trajectory conditions
– Theorem: Simulation relation implies implementation
• Composition:
– Synchronize external actions and variables
– Theorems: Projection, pasting, substitutivity
• Receptiveness:
– Doesn’t cooperative in producing Zeno behavior
– Theorem: Closed under composition (with technical assumption).
45
Dynamic I/O Automata
[Attie, Lynch, Concur 01]
• Dynamic version of I/O automata, including:
– Automaton creation and destruction
– Signature change
• Two-level model: Automata, configurations.
• Mobility modeled using signature change
46
IOA Language and Tools
• Language for describing I/O automata:
[Garland, Lynch]
I
• Front end: [Garland]
– Translates to Java objects
– Completely rewritten this year.
– Needs support for composition.
O A
• Theorem-prover connection: [Garland, Bogdanov]
– Connection with LP
– Seeking connections: SAL, Isabelle, STeP, NuPRL
47
IOA Language and Tools
• Simulator: [Chefter, Ramirez, Dean]
– Has support for paired simulation.
– Needs additions.
– Being instrumented for invariant discovery using
Daikon [Ernst]
• Code generator: Tauber, Tsai
– Local code-gen (translation to Java) running.
– Needs composition, communication service calls,
correctness proof.
• Challenge examples
48
Plans
49
Plans
1. Protocol modeling/verification
– Finish analysis of Scalable GC, Totally Ordered
Multicast with QoS, SRM
– Other protocols from this project.
2. Conditional analysis methods
– Develop general methods
– Compare with other methods (Trivedi)
50
Plans
3. I/O automaton models
– Timed models:
• Composition theorems for timing properties
• Specially structured TIOAs for CP analysis
– Hybrid models:
• Finish basic model
• Integrate control theory methods
– Probabilistic models:
• Compositional analysis methods
• Combine hybrid and probabilistic models
51
Plans
I/O automaton models, cont’d
– Dynamic models: External behavior notion,
composition results
– Combine extensions
– Language constructs to support extensions
– IOA tools: Finish, make portions available
4. Integration with other models/methods
– Shared variable models
52
Thank you!
53
Best Requestor and Replier
S
Replier
Requestor
54
Latency Analysis (Planned)
1.
2.
3.
4.
Bound recovery latency assuming cache hits.
Bound latency for cache misses.
Combine
Compare with SRM
55
1. Cache Hit Performance
• Recovery latency bounded by
c tSA* + tAB + d tAB* + tBA,
where:
–
–
–
–
–
S = source,
A = requestor, B = retransmitter
c,d : delay parameters of deterministic suppression
tSA*, tAB* : inter-host delay estimates
tAB ,, tBA : actual inter-host delays
56
2. Cache Miss Performance
• Depends on:
– Loss location
– Locations of requestor, retransmitter, …
– Timing
• Needs analysis
• Hope: Similar to SRM
57