Transcript Kelvin

Epidemic Techniques
Chiu Wah So (Kelvin)
Database Replication

Why do we replicate database?
–
–

Low latency
High availability
To achieve strong (sequential) consistency
on replicated database. Not scalable.
–
–
One primary database
Quorum system (contact over half of replicas)
Database Replication

To scale and have high availability, we need
a weaker consistency model
–

Eventual consistency: If all updating stops then
eventually all replicas will converge to the
identical values.
The two papers talk about how to use
epidemic techniques to achieve eventual
consistency to scale.
Epidemic Techniques for replicated
database

Epidemic Algorithms for Replicated Database
Maintenance
–

Look at different epidemic algorithms to reduce
bandwidth consumption to maintain replicated
database
Astrolabe
–
Scalable and Robust information management
system.
Motivation on the first paper



Clearinghouse service maintains translations
from names to machine addresses. (like DNS)
Problem: Using direct mail and anti-entropy,
too much traffic to maintain consistency
between highly replicated servers. Some key
links are overloaded.
Look at techniques to reduce bandwidth:
rumor spreading and spatial distributions.
Direct mail


Direct mail: each server sends update to all
other servers.
Advantage
–
–

Easy to implement
Good enough for small and static servers
Disadvantage:
–
–
Not scale (O(n) message for each update)
Updates may get lost.
Anti-entropy


Servers pick random server and resolve differences.
3 ways to resolve differences: push, pull, and pushpull.
Anti-entropy Example
Push
Pull
Push-Pull
Anti-entropy Average time

Average converge in O(log(n)) steps
Pull, push-pull
Push
Anti-entropy (2)


Very expensive to send the whole database across
network to compare
Some techniques for optimizing comparing
bandwidth
–
–
–

Compute Checksum
Exchange list of of recent updates. Then apply the update
and compute checksum
Exchange updates in reverse chronological order until
checksums agree
Still too much bandwidth…..
Rumor spreading





Main idea: send out updates randomly. Instead of
comparing whole database.
Three states: susceptible, infective, and removed.
Initially all servers are susceptible
Once server has a rumor (infective), and then pick a
random server to send the rumor.
With probability 1/k, the server loses interest
(removed) to spread rumor
Rumor spreading (2)

But maybe not every server got the rumor.
–

With probability of remaining susceptible after the
epidemic finishes:
Run anti-entropy infrequently to make sure
every server gets the update.
Three goals in rumor spreading



Low Residue: the probability of remaining
susceptible when the epidemic finishes
Low Traffic: total traffic sent per site
Low Delay: Average time and the last time
between the injection of an update and the
arrival of update.
Variations in rumor spreading

Many variations in rumor spreading.
–
–
–
–
–
Blind with Coin vs Feedback with Counter
Push vs Pull
Increase the smaller counter of the two
Connection limit
Hunting
Feedback Counter vs Blind Prob.
Feedback and Counter
Blind and Probability
Deletion and Death Certificates


Simple solution: death certificates and store
for a fixed threshold of time
2nd solution: dormant death certificates. Use
two threshold time, and some servers keep it
longer. 2 different timestamp: original
timestamp and reactivation timestamp.
Motivation of Spatial Distributions


Network is not uniform.
Certain key links in the network are overload.
–

Transatlantic links about 80 conversations, but on
average conversations per link is 6.
Therefore, we should favor nearby neighbors.
Spatial Distributions



Each servers, sort the list of sites by distance
from s.
Select anti-entropy exchange partners from
the sorted list according to a function f(i), i =
index on the sorted list.
We can use f(i) = i^(-a), where a is the
parameter for tuning spatial distribution.
Spatial Distribution
Next Paper: Astrolabe



The first paper talks about how to use rumor
spreading and spatial distribution to reduce
bandwidth.
But the storage grows O(n) and total
bandwidth taken up by gossip grows O(n^2)
We need a more scalable solution.
Astrolabe



Scalable and Robust information
management system.
Monitors the dynamically changing state of a
collection of distributed resources.
Reports summaries of this information to
users.
Four design goals




Scalability: scale through its zone hierarchy.
Information is summarized before exchanges.
Flexibility: easy to install new aggregated
function in a form of SQL aggregation query
Robustness: randomized peer-to-peer
approach to exchange information.
Security: use signed certificates.
Structure of Astrolabe


Structure of Astrolabe’s
zones can be viewed
as a trees. Leaves of
this tree are hosts.
Each hosts run an
astrolabe agent.
Astrolabe Detail




Each agent is a virtual database.
Each agent has a path name.
(For example: /USA/Cornell/pc3)
Each agent contains information,
called MIB, for all the ancestor
zone (For example, it contains /,
/USA, /USA/Cornell)
Each ancestor MIB is generated
using aggregation for scalability,
instead of having O(n) entries.
Astrolabe Detail (2)




Each zone can be viewed as relational table of the
attributes of its child zone.
How do we gather or generate the information in the
zone relational table?
Two ways: If the agent is in the zone, use
aggregation to construct the MIB for the zone.
Otherwise, gossip for the information.
Therefore, MIB for internal zones has to be small in
order to scale.
Aggregation

Aggregation Function Certificates contain
information on how to collect and aggregate
attributes of child zone MIBs into entries for
internal zone MIBs.
–
–
Programmed in SQL-like language
Propagates by two ways: copying to parent
(propagates like other normal attributes), and look
for new AFC from its ancestor zone
Aggregation (2)

Here are the SQL aggregation functions that
are provided by Astrolabe.
Gossip



Each zone has a small set of addresses for
representative agents.
Representative agents are computed using
an aggregation function, such as using load
and longevity.
An agent gossips on behalf of those zones
for which it is a representative.
Gossip (2)



Periodically, the agent picks one of the child
zones, and talks to one of the contact agents.
(anti-entropy)
Then, it sends all the child zones at that level,
and does the same thing for the higher levels
in the tree up until the root level.
Then the two agents can compare which
entries are newer and keep them.
Example of gossip (taken from ken slides)
Name
Time
Load
Weblogic?
SMTP?
Word
Versio
n
swift
2011
2.0
0
1
6.2
falcon
1971
1.5
1
0
4.1
cardinal
2004
4.5
1
0
6.0
swift.cs.cornell.edu
cardinal.cs.cornell.edu
Name
Time
Load
Weblogic?
SMTP?
Word
Version
swift
2003
.67
0
1
6.2
falcon
1976
2.7
1
0
4.1
cardinal
2201
3.5
1
1
6.0
Example of gossip (2)
Name
Time
Load
Weblogic?
SMTP?
Word
Versio
n
swift
2011
2.0
0
1
6.2
falcon
1971
1.5
1
0
4.1
cardinal
2004
4.5
1
0
6.0
swift.cs.cornell.edu
swift
cardinal
cardinal.cs.cornell.edu
Name
Time
Load
Weblogic?
SMTP?
Word
Version
swift
2003
.67
0
1
6.2
falcon
1976
2.7
1
0
4.1
cardinal
2201
3.5
1
1
6.0
2011
2201
2.0
3.5
Example of gossip (3)
Name
Time
Load
Weblogic?
SMTP?
Word
Versio
n
swift
2011
2.0
0
1
6.2
falcon
1971
1.5
1
0
4.1
cardinal
2201
3.5
1
0
6.0
swift.cs.cornell.edu
cardinal.cs.cornell.edu
Name
Time
Load
Weblogic?
SMTP?
Word
Version
swift
2011
2.0
0
1
6.2
falcon
1976
2.7
1
0
4.1
cardinal
2201
3.5
1
1
6.0
Dynamically changing
query output is visible
system-wide
Name
Avg
Load
WL contact
SMTP contact
SF
2.6
123.45.61.3
123.45.61.17
NJ
1.8
127.16.77.6
127.16.77.11
Paris
3.1
14.66.71.8
14.66.71.12
Name
Load
Weblogic?
SMTP?
Word
Version
swift
2.0
0
1
falcon
1.5
1
cardinal
4.5
1
…
SQL query
“summarizes”
data
Name
Load
Weblogic?
SMTP?
Word
Version
6.2
gazelle
1.7
0
0
4.5
0
4.1
zebra
3.2
0
1
6.2
0
6.0
gnu
.5
1
0
6.2
San Francisco
New Jersey
…
Membership


When an agent has not seen an update for a zone
from a particular representative for some time Tfail.
Remove its MIB.
Connect different pieces of the trees and add in new
machines
–
–
–

IP multicast
Broadcast
Relatives
Administrators responsible for configuring the
system by assigning zone names.
Communication


Through Http and UDP(need to fragment the
messages into more than one UDP packets)
If there is firewall,
–
Use ALG in core internet or an astrolabe agent in
core internet.
Security



Each zone is a management unit.
Children have a way to override policy
enforced by parents.
Each zone: 2 pairs of key, CA and zone keys
–
–
–
–
Zone certificate
MIB certificate
Aggregation function certificate
Client certificate
Related work





Directory Services (Clearinghouse, Bayou,
Globe)
Network Monitoring
Event Notification
Sensor Networks
Peer-to-peer routing
Measurement on expected # rounds
Measurement on expected # rounds (2)
Measurement on Latency
Real
Simulation
Conclusion ??