Transcript PPT

Paxos
Student Presentation by Jeremy Trimble
CS 5204 – Operating Systems
1
Paxos
Overview

Distributed State Machine

Distributed Consensus


Fault-tolerance

Requirements/Assumptions
Paxos Algorithm

Concepts

The Algorithm Itself

Examples

Multi-Paxos

Variations of Paxos
CS 5204 – Operating Systems
2
Paxos
Distributed State Machine

Fault-tolerance through replication.

Need to ensure that replicas remain consistent.

Replicas must process requests in the same order.
CS 5204 – Operating Systems
3
Paxos
The Distributed Consensus Problem

In a distributed system, how can we:

Select a single action among many options?


Simple solution:

A single node acts as the “decider.”


How can this be done in a fault-tolerant way?
But this is not fault tolerant. (What if the decider fails?)
A better solution: Paxos
CS 5204 – Operating Systems
4
Paxos
Fault-Tolerant Consensus

Requirements

Safety




Only a value that has been proposed may be chosen.
Only a single value is chosen.
A process never learns that a value has been chosen
until it actually has been.
Goals

Liveness

FLP Impossibility Proof (1985)
CS 5204 – Operating Systems
5
Paxos
Assumptions

Failures

“Fail Stop” assumption



Messages





When a node fails, it ceases to function entirely.
May resume normal operation when restarted.
May be lost.
May be duplicated.
May be delayed (and thus reordered).
May not be corrupt.
Stable Storage
CS 5204 – Operating Systems
6
Paxos
Paxos Terms

Proposer





Proposal

Advocates for a client.

A1
Renders an accept/reject
decision.
An alternative proposed by a
proposer.
Consists of a unique number
and a proposed value.
( 42, B )
Considers the values proposed by
proposers.
Learner



Suggests values for consideration
by Acceptors.
Acceptor

P1

We say a value is chosen when
consensus is reached on that
value.
Learns the chosen value.
In practice, each node will usually play
all three roles.
CS 5204 – Operating Systems
7
Paxos
Strong Majority

“Strong Majority” / “Quorum”



Any two quorums have a nonempty
intersection.
A1

Acceptors decisions are not in
agreement.
Common node acts as “tie-breaker.”
In a system with 2F+1 acceptors, F
acceptors can fail and we'll be OK.
CS 5204 – Operating Systems
A5
A6
Helps avoid “split-brain” problem.


A2
A set of acceptors consisting of more
than half of all acceptors.
A7
A4
A3
Quorums in a system with
seven acceptors.
8
Paxos
Consensus
(N1, V1)
A1
(N5, V3)
(N2, V2)
A2
time
(N7, V3)
(N4, V1)
A3
(N6, V3)
(N3, V3)
A4
(N2, V2)
A5
(N7, V3)
consensus reached, V3 chosen


Values proposed by proposers are constrained so that once consensus has been reached, all
future proposals will carry the chosen value.
P2c . For any v and n, if a proposal with value v and number n is issued, then there is a set S
consisting of a majority of acceptors such that either:


(a) no acceptor in S has accepted any proposal numbered less than n, or
(b) v is the value of the highest-numbered proposal among all proposals numbered less
than n accepted by the acceptors in S.
CS 5204 – Operating Systems
9
Paxos
Basic Paxos Algorithm
Phase 1a:
“Prepare”
Select proposal number* N and send a prepare(N) request to a quorum
of acceptors.
Proposer
Phase 1b:
“Promise”
If N > number of any previous promises or acceptances,
* promise to never accept any future proposal less than N,
- send a promise(N, U) response
(where U is the highest-numbered proposal accepted so far (if any))
Phase 2a:
“Accept!”
If proposer received promise responses from a quorum,
- send an accept(N, W) request to those acceptors
Acceptor
(where W is the value of the highest-numbered proposal among the promise
responses, or any value if no promise contained a proposal)
Phase 2b:
“Accepted”
If N >= number of any previous promise,
* accept the proposal
- send an accepted notification to the learner
CS 5204 – Operating Systems
* = record to stable storage
10
Paxos
P1
A1
A2
A3
start
prepare(1)
prepare(1)
promise(1, -)
promise(1)
accept(1, A)
prepare(1)
accepted(1, A)
prepare(1)
time
CS 5204 – Operating Systems
11
Paxos
P1
P2
A1
A2
A3
accepted(1, A)
prepare(1)
continued...
prepare(2)
prepare(1)
promise(2, -)
promise(2, (1,A))
accept(2, A)
prepare(1)
accepted(2, A)
accepted(2, I)
time
CS 5204 – Operating Systems
12
Paxos
P1
P2
A1
A2
A3
start
prepare(1)
prepare(1)
promise(1)
promise(1)
prepare(2)
prepare(1)
promise(2, -)
prepare(1)
accept(1, C)
prepare(1)
accepted(1, C)
accept(2, D)
prepare(1)
accepted(2, D)
accepted(1, I)
CS 5204 – Operating Systems
time
13
Paxos
P1
P2
A1
A2
A3
start
prepare(1)
prepare(1)
promise(1)
promise(1)
prepare(2)
prepare(1)
promise(2, -)
prepare(1)
accept(1, F)
prepare(1)
accepted(1, F)
prepare(3)
prepare(1)
...
prepare(4)
prepare(1)
time
...
CS 5204 – Operating Systems
14
Paxos
Other Considerations
Liveness



Can't be guaranteed in
general.

Learning the Chosen Value

Distinguished Proposer



All proposals are funneled
through one node.
Can re-elect on failure.

Acceptors notify some set
of learners upon
acceptance.
Distinguished Learner
A node may play the role of both distinguished proposer and
distinguished learner – we call such a node the master.
CS 5204 – Operating Systems
15
Paxos
Multi-Paxos


chosen value
S
P
???
O
instance
39
40
41
42
A single instance of Paxos yields a single chosen value.
To form a sequence of chosen values, simply apply Paxos
iteratively.


“time”
To distinguish, include an instance number in messages.
Facilitates replication of a state machine.
CS 5204 – Operating Systems
16
Paxos
Paxos Variations

Cheap Paxos



Reconfiguration
Fast Paxos


Eject failed acceptors.

Fault-tolerant with only F+1
nodes (vs 2F+1).

Failures must not happen
too quickly.


Clients send accept
messages to acceptors.
Master is responsible for
breaking ties.
Reduces message traffic.
Byzantine Paxos


Arbitrary failures – lying, collusion, fabricated messages,
selective non-participation.
Adds an extra “verify” phase to the algorithm.
CS 5204 – Operating Systems
17
Paxos
Conclusion

State-Machine Replication

Distributed Consensus

Basic Paxos

Examples

Optimizations

Variations
CS 5204 – Operating Systems
18
Paxos
Questions?
CS 5204 – Operating Systems
19
Paxos
References

Paxos Made Simple


Paxos Made Live


http://en.wikipedia.org/wiki/Paxos_algorithm
The Byzantine Generals Problem


http://courses.cs.vt.edu/cs5204/fall10-kafura-NVC/Papers/FaultTolerance/Paxos-Chubby.pdf
Wikipedia – Paxos Algorithm


http://courses.cs.vt.edu/cs5204/fall10-kafura-NVC/Papers/FaultTolerance/Paxos-Simple-Lamport.pdf
http://research.microsoft.com/en-us/um/people/lamport/pubs/byz.pdf
Impossibility of distributed consensus with one faulty process

http://portal.acm.org/citation.cfm?doid=3149.214121
CS 5204 – Operating Systems
20