Transcript PPT
Paxos
Student Presentation by Jeremy Trimble
CS 5204 – Operating Systems
1
Paxos
Overview
Distributed State Machine
Distributed Consensus
Fault-tolerance
Requirements/Assumptions
Paxos Algorithm
Concepts
The Algorithm Itself
Examples
Multi-Paxos
Variations of Paxos
CS 5204 – Operating Systems
2
Paxos
Distributed State Machine
Fault-tolerance through replication.
Need to ensure that replicas remain consistent.
Replicas must process requests in the same order.
CS 5204 – Operating Systems
3
Paxos
The Distributed Consensus Problem
In a distributed system, how can we:
Select a single action among many options?
Simple solution:
A single node acts as the “decider.”
How can this be done in a fault-tolerant way?
But this is not fault tolerant. (What if the decider fails?)
A better solution: Paxos
CS 5204 – Operating Systems
4
Paxos
Fault-Tolerant Consensus
Requirements
Safety
Only a value that has been proposed may be chosen.
Only a single value is chosen.
A process never learns that a value has been chosen
until it actually has been.
Goals
Liveness
FLP Impossibility Proof (1985)
CS 5204 – Operating Systems
5
Paxos
Assumptions
Failures
“Fail Stop” assumption
Messages
When a node fails, it ceases to function entirely.
May resume normal operation when restarted.
May be lost.
May be duplicated.
May be delayed (and thus reordered).
May not be corrupt.
Stable Storage
CS 5204 – Operating Systems
6
Paxos
Paxos Terms
Proposer
Proposal
Advocates for a client.
A1
Renders an accept/reject
decision.
An alternative proposed by a
proposer.
Consists of a unique number
and a proposed value.
( 42, B )
Considers the values proposed by
proposers.
Learner
Suggests values for consideration
by Acceptors.
Acceptor
P1
We say a value is chosen when
consensus is reached on that
value.
Learns the chosen value.
In practice, each node will usually play
all three roles.
CS 5204 – Operating Systems
7
Paxos
Strong Majority
“Strong Majority” / “Quorum”
Any two quorums have a nonempty
intersection.
A1
Acceptors decisions are not in
agreement.
Common node acts as “tie-breaker.”
In a system with 2F+1 acceptors, F
acceptors can fail and we'll be OK.
CS 5204 – Operating Systems
A5
A6
Helps avoid “split-brain” problem.
A2
A set of acceptors consisting of more
than half of all acceptors.
A7
A4
A3
Quorums in a system with
seven acceptors.
8
Paxos
Consensus
(N1, V1)
A1
(N5, V3)
(N2, V2)
A2
time
(N7, V3)
(N4, V1)
A3
(N6, V3)
(N3, V3)
A4
(N2, V2)
A5
(N7, V3)
consensus reached, V3 chosen
Values proposed by proposers are constrained so that once consensus has been reached, all
future proposals will carry the chosen value.
P2c . For any v and n, if a proposal with value v and number n is issued, then there is a set S
consisting of a majority of acceptors such that either:
(a) no acceptor in S has accepted any proposal numbered less than n, or
(b) v is the value of the highest-numbered proposal among all proposals numbered less
than n accepted by the acceptors in S.
CS 5204 – Operating Systems
9
Paxos
Basic Paxos Algorithm
Phase 1a:
“Prepare”
Select proposal number* N and send a prepare(N) request to a quorum
of acceptors.
Proposer
Phase 1b:
“Promise”
If N > number of any previous promises or acceptances,
* promise to never accept any future proposal less than N,
- send a promise(N, U) response
(where U is the highest-numbered proposal accepted so far (if any))
Phase 2a:
“Accept!”
If proposer received promise responses from a quorum,
- send an accept(N, W) request to those acceptors
Acceptor
(where W is the value of the highest-numbered proposal among the promise
responses, or any value if no promise contained a proposal)
Phase 2b:
“Accepted”
If N >= number of any previous promise,
* accept the proposal
- send an accepted notification to the learner
CS 5204 – Operating Systems
* = record to stable storage
10
Paxos
P1
A1
A2
A3
start
prepare(1)
prepare(1)
promise(1, -)
promise(1)
accept(1, A)
prepare(1)
accepted(1, A)
prepare(1)
time
CS 5204 – Operating Systems
11
Paxos
P1
P2
A1
A2
A3
accepted(1, A)
prepare(1)
continued...
prepare(2)
prepare(1)
promise(2, -)
promise(2, (1,A))
accept(2, A)
prepare(1)
accepted(2, A)
accepted(2, I)
time
CS 5204 – Operating Systems
12
Paxos
P1
P2
A1
A2
A3
start
prepare(1)
prepare(1)
promise(1)
promise(1)
prepare(2)
prepare(1)
promise(2, -)
prepare(1)
accept(1, C)
prepare(1)
accepted(1, C)
accept(2, D)
prepare(1)
accepted(2, D)
accepted(1, I)
CS 5204 – Operating Systems
time
13
Paxos
P1
P2
A1
A2
A3
start
prepare(1)
prepare(1)
promise(1)
promise(1)
prepare(2)
prepare(1)
promise(2, -)
prepare(1)
accept(1, F)
prepare(1)
accepted(1, F)
prepare(3)
prepare(1)
...
prepare(4)
prepare(1)
time
...
CS 5204 – Operating Systems
14
Paxos
Other Considerations
Liveness
Can't be guaranteed in
general.
Learning the Chosen Value
Distinguished Proposer
All proposals are funneled
through one node.
Can re-elect on failure.
Acceptors notify some set
of learners upon
acceptance.
Distinguished Learner
A node may play the role of both distinguished proposer and
distinguished learner – we call such a node the master.
CS 5204 – Operating Systems
15
Paxos
Multi-Paxos
chosen value
S
P
???
O
instance
39
40
41
42
A single instance of Paxos yields a single chosen value.
To form a sequence of chosen values, simply apply Paxos
iteratively.
“time”
To distinguish, include an instance number in messages.
Facilitates replication of a state machine.
CS 5204 – Operating Systems
16
Paxos
Paxos Variations
Cheap Paxos
Reconfiguration
Fast Paxos
Eject failed acceptors.
Fault-tolerant with only F+1
nodes (vs 2F+1).
Failures must not happen
too quickly.
Clients send accept
messages to acceptors.
Master is responsible for
breaking ties.
Reduces message traffic.
Byzantine Paxos
Arbitrary failures – lying, collusion, fabricated messages,
selective non-participation.
Adds an extra “verify” phase to the algorithm.
CS 5204 – Operating Systems
17
Paxos
Conclusion
State-Machine Replication
Distributed Consensus
Basic Paxos
Examples
Optimizations
Variations
CS 5204 – Operating Systems
18
Paxos
Questions?
CS 5204 – Operating Systems
19
Paxos
References
Paxos Made Simple
Paxos Made Live
http://en.wikipedia.org/wiki/Paxos_algorithm
The Byzantine Generals Problem
http://courses.cs.vt.edu/cs5204/fall10-kafura-NVC/Papers/FaultTolerance/Paxos-Chubby.pdf
Wikipedia – Paxos Algorithm
http://courses.cs.vt.edu/cs5204/fall10-kafura-NVC/Papers/FaultTolerance/Paxos-Simple-Lamport.pdf
http://research.microsoft.com/en-us/um/people/lamport/pubs/byz.pdf
Impossibility of distributed consensus with one faulty process
http://portal.acm.org/citation.cfm?doid=3149.214121
CS 5204 – Operating Systems
20