Transcript ppt

QUORUMS
By gil ben-zvi
definition
• Assume a universe U of servers, sized n. A
quorum system S is a set of subsets of U,
every pair of which intersect, each Q
belongs to S is called a quorum.
EXAMPLES
• Weighted majorities: assume that
every server s in the universe U is
assigned a number of votes w(s). Then
weighted majorities is a quorum set
defined by
S  {Q  U :  w( s )   w( s ) / 2}
sQ
sU
EXAMPLES
• MAJORITIES : a weighted majorities
quorum system when all weights are the
same.
• Singleton: a weighted majorities quorum
system when for one server s: w(s)=1, and
for each v of the other servers w(v)=0.
(only quorum is s)
EXAMPLES
• Grid: suppose n is a square of some
integer k. arrange the universe in a k x k
grid. A quorum is the union of a full row
and one element from each row below.
• FPP: suppose a projective plane over a
field sized q. each point is an element, and
each line is a quorum. By projective plane
attributes, each quorum intersect.
More definitions
• Coterie: a coterie S is a quorum system
such that for any Q1,Q2 quorums in S: Q1
isn’t included in Q2
• Domination: coterie S1 dominates coterie
S2 if for every quorum Q2 belongs to S2,
there exist Q1 in S1, such that S1 is
contained in S2.
• Strategy: a probability vector representing
the probability to access each quorum.
measures
• Load: the load L(S) of a quorum system is
the minimal access probability minimized
over the strategies.
• Resilience: resilience is k, if k is the
largest number such that for every k
server crashes, one quorum remains
unhit.
measures
• Failure probability: if every server has
certain probability to crash (assuming
independently here), the probability that
each quorum is hit. Usually assuming
each server has the same crash
probability p.
Measures examples
• Singleton: load=1, resilience=0, failure
probability=p
• Majorities: load is about ½. Resilience
about (n-1)/2. failure probability (if p < ½)
smaller than exp(e,-n).
• Grid: load is O(1/sqrt(n)). Resilience =
sqrt(n)-1, failure probability tends to 1 as n
grows.
Access protocol
• Implements the semantics of a multi-writer
multi-reader atomic variable.
• Assumes all clients and servers are non
byzantine, unique timestamp for a client
• Write: a client asks some quorum to
obtain a set of value/timestamps pairs,
then he writes his value with higher
timestamp than each of the timestamps
received to each server in the quorum.
Access protocol
• Read: a client asks for each server in
some quorum to obtain a set of
value/timestamp. The client chooses the
pair with the highest timestamp. It writes
back the pair to each server in some
quorum
• Server S updates a pair of
value/timestamp, only if the timestamp is
greater than the timestamp currently in S
Byzantine quorum systems
• We will use access protocol to
demonstrate the subject
• Assuming communication is reliable,
clients are correct, servers can be
byzantine, assuming that a non-empty set
of subsets of U: BAD, is known, some B in
BAD contains all the faulty servers.
Masking quorum systems
• A quorum system S is a masking quorum
system for a fail-prone system BAD if the
following properties are satisfied:
M  Consistency : (Q1, Q 2  S )B1, B 2  BAD :
(Q1  Q 2) \ B1  B 2
M  Availabili ty : B  BAD , Q  S : B  Q  
Access protocol
• write: remains the same
• Read: for a client to read the variable x, it
queries servers for some quorum Q to
obtain a set of value/timestamp pairs
A  { vu , t u }uQ
computes :
C  { v, t : B1  Q[B  BAD[ B1  B] 
u  B1[vu  v  t u  t ]}
Access protocol
• The client chooses the pair with the
highest timestamp in C, or null if C is
empty.
Access protocol
• Claim: a read operation that is concurrent
with no write operations return the value
written by the last preceding write
operation in some serialization of all
preceding write operations.
• Claim: there exists a masking quorum
system for BAD iff S  {U \ B : B  BAD}
is a masking quorum system for BAD
Access protocol
• Criterion: there exists a masking quorum
system for BAD iff for all
B1, B 2, B3, B 4  BAD ,U  B1  B 2  B3  B 4
proof : B1, B 2, B3, B 4  BAD :
((U \ B1)  (U \ B 2)) \ B3  B 4 
U \ ( B1  B 2)  B3  B 4 
U  B1  B 2  B3  B 4
F-masking quorum systems
• F-masking quorum system: A masking
quorum system where BAD is the set of all
groups of servers sized f.
• By previous claims:
• There exists a masking quorum system for
BAD iff n>4f
• Each pair of quorums must intersect by at
least 2f+1 elements.
examples
• For f-masking
quorums:
Q  {Q  U :| Q | (n  2 f  1) / 2}
grid :
(3 f  1  n )
Q \ {C j   Ri : I , { j}  {1... n}, | I | 2 f  1}
iI
Dissemination quorum systems
• Assumes clients can digitally sign the
value/timestamp they propagate.
• Therefore weaker demands than masking
• A quorum system S is a dissemination
quorum system for a fail-prone system BAD
if the following properties are satisfied:
D  Consistency : Q1, Q 2  S , B  BAD : Q1  Q 2  B
D  Availabili ty : B  BAD , Q  S : B  Q  
Dissemination quorum systems
• The same way as masking we reach the
(different) criterion:
• There exists a dissemination quorum system for
BAD iff B1, B 2, B3  BAD ,U  B1  B 2  B3
• If no more than f servers can fail, but any set of f
servers can fail, then must hold: n>3f
Opaque masking quorum systems
• Motivation: We want not to expose the failprone system BAD.
• done by majority decision.
• properties for quorum system to become opaque
masking system:
O  Consistency1 : Q1, Q 2  S , B  BAD :
| (Q1  Q 2) \ B || (Q 2  B )  (Q 2 \ Q1) |
O  Consistency 2 : Q1, Q 2  S , B  BAD :
| (Q1  Q 2) \ B || Q 2  B |
O  Availabili ty : B  BAD , Q  S : B  Q  
Opaque masking quorum systems
• Read: the modification is that the client
choose the pair <v,t> that appears most
often, if there are multiple such sets, it
chooses the newest one.
• Claim: Suppose maximum f servers can
fail, there exists an opaque quorum
system for BAD iff n>=5f, sufficient
because quorums sized [(2n+2f)/3] is an
opaque quorum system for B.
Opaque masking quorum system
• Claim: The load of any opaque system is
at least ½.
• Proof: if we sum up the load of a certain
quorum, we’ll get it’s bigger than it’s size/2.
the claim follows.
• Example: hadamard matrix, world of size
exp(2,l)
Faulty clients
• Solves the problem that a client will try to
fail the protocol.
• The treatment here provides a singlewriter multi-reader semantics.
• The write operation starts when the 1st
server receives update request, and ends
when the last server sent
acknowledgment.
Faulty clients
• Write: for a client c to write the value v, it
chooses legal timestamp, larger than any
timestamp it has chosen before, chooses
a quorum Q, And then it sends
<update,Q,v,t> to each server in Q, if after
some timeout period it has not received
acknowledgment, than it chooses another
quorum.
Faulty clients-servers protocol
• The servers protocol is as follows:
1. if a server receives <update,Q,v,t> from a
client c, with legal timestamp, then it
sends <echo,Q,v,t> to each member of
Q.
2. If a server receives identical echo
messages <echo,Q,v,t> from every
server in Q, then it sends <ready,Q,v,t>
to each member of Q.
Faulty clients-servers protocol
3. If a server receives identical ready
messages <ready,Q,v,t> from a set of
servers that certainly doesn’t contain faulty
server, it sends <ready,Q,v,t> to Q.
4. If a server receives identical ready
messages <ready,Q,v,t> from a set Q1 of
servers, such that Q1=Q\B for some B in
BAD, it sends acknowledgment for c, and
update the pair if t is greater than the
timestamp it currently has.
Faulty servers-properties
• Agreement: if a correct server delivers
<v,t> and a correct server delivers <r,t>
then r=v
• Proof: if a correct server delivers <v,t>,
then echo must have been send by all
correct servers in Q1. same about Q2,
they intersect in a correct server, which
doesn’t send different value with the same
timestamp
Faulty servers-properties
• Claim: Read received last written value if
it’s not concurrent with write operations.
• Proof: same as masking quorum system.
• Propagation: similar ideas to r.b, and
byzantine agreement, if server decides to
deliver, it is promised that all other decides
that too.
• Validity: at the end a correct quorum will
be accessed, so the write can end.
Load, capacity, availability
• Load: we will mark L(S), definition as
before
• availability: failure probability with the
same “p” for all the servers, we will mark it
as Fp(S)
• Capacity: we’ll define a(S,k) as the
maximum number of quorum accesses
that S can handle during a period of k time
units. Capp(S) is the limit of a(S,k)/k as k
tends to infinity.
Load, capacity, availability
• Example: majorities
• The claim is that cap(S)=1/L(S), and there
is a trade off between good availability and
good load.
definitions
• The cardinality of the smallest quorum is
denoted by c(S)
• The degree of an element i in a quorum
system S is the number of quorums that
contain i
• Let S be a quorum system. S is a suniform if |Q| = s for each Q in S
• S is (s,d) fair if it is s-uniform and deg(i)=d
foreach i, it is called s-fair if it is (s,d) fair
for some d.
LP
• We can use a linear programming to
calculate the load and the strategy
achieving the load.
m
w
j 1
j
1
LP : L( S )  min L, s.t  w j  L, i  U
is j
w j 0, L  0
DUAL LP
• Some time we want to use the dual linear
program, in which we give probabilities
over the elements of the world. It is a
known fact that DLP<=LP
n
{ yi  1
i 1
DLP : t ( S )  max T , s.t { y i  T , Q  S
iQ
{ yi  0
{T  0
The load with failures
• A configuration is a vector x  {0,1}
in which it holds 1 in places representing
the failing elements in the world
• Dead(x) is the group of elements failed,
live(x) is the non failed ones
• S(x) is the sub collection of functioning
quorums
n
S ( x)  {Q  S : Q  live( x)}
Load with failiures
• The load of quorum system S over a
configuration x, if S(x) is empty then
L(S(x)) = 1, if there are functioning
quorums we define it in similar way as
before by linear programming problem.
• Let the elements fail with probabilities
P=(p1,………,pn). Then the load is a
random variable Lp(S) defined by:
P( Lp( S )  L) 
 
X
idead ( x )
L ( Sx )  L
pi
 (1  pi)
ilive( x )
Load with fails
• Claim: E(Lp(S))>=Fp(S)
• Claim: If (configurations) x>=z than
L(S(x))>=L(S(z))
• Proof: S(z) contains S(x), strategy for S(x)
is for S(z) too.
• Claim: E(Lp(S)) is a non decreasing
function.
Properties of the load
• Claim: L(S)>=c(S)/n
• Claim: L(S)>=1/c(S)
• Proof: if we choose probability 1/c(S) for
every element in c(S) and 0 in the rest, we
achieve possible solution for the DLP
problem.
• Conclusion: L(S)>1/sqrt(n) (achieved
when c(S) is close to sqrt(n)
Load/fail probability trade off
• Claim: Fp(S)>=exp(p,n*L(S))
• Proof: the probability that all the elements
in the smallest quorum will fail, (and
therefore the quorum system fails) =
exp(p,c(S)). Since c(S)<=nL(S) the claim
follows.
examples
• Optimal load, optimal load/ failure tradeoff,
good failure load – paths system
• B-grid system
• SC-grid system
• AndOr system
Load analyses
• Claim: Non dominated coteries have
lower bounds.
• The claim follows if you choose strategy
for the dominator by giving the probability
only in quorums which contained by a
quorum in the dominated quorum system
• Claim: voting systems have high load
(more than ½)
Last slide!!!!
• Proof: if we define V=the sum of all votes
(Vi), then the vector Yi=Vi/V is a solution
for DLP larger than ½.