Transcript CAN
Distributed Hash Tables and
Structured P2P Systems
Ningfang Mi
September 27, 2004
1
Outline
The Lookup problem
Definition
Approaches
Issues
Case study: CAN
Overview
Basic design
More design improvements (Performance, robustness)
Topology-aware routing in CAN
Conclusion
2
The Lookup Problem -- Definition
Given a data item X stored at some
dynamic set of nodes, find it
It is an important and critical common
problem in P2P.
How to efficiently find it?
3
The Lookup Problem -- Approaches
Central database: Napster
Central point of failure
Use hierarchy: DNS for name lookup
Failure or removal of the root or nodes high in hierarchy
the root or nodes high in hierarchy overload
Symmetric lookup algorithm: Gnutella
No more important node
“Flooding” request not scale well
“Superpeers” in a hierarchical structure: KaZaA
Failure of “Superpeers”
Innovative symmetric lookup strategy: Freenet
Need visit large number of nodes and no guarantee
4
The Lookup Problem -- Approaches
Structured and symmetric algorithms:
Implement the DHT (distributed hash table)
Convert name to key using a hash function, SHA-1
One operation: Lookup(key)
Examples:
CAN[2] -- ACM SIGCOMM 2001
Chord[3] -- ACM SIGCOMM 2001
Pastry[5] -- Middleware 2001
Tapestry[6] – UC Berkeley 2001
5
The Lookup Problem -- Issues
Mapping keys to nodes in a load balanced
way
Forwarding a lookup for a key to an
appropriate node
Distance function
Building routing tables adaptively
Discuss these issues by a case study -- CAN
6
“What is a scalable content-addressable
network”??? --- CANs Design Overview
Features:
Completely distributed system
No centralized control, coordination and configuration
Scalable
Maintain only a small number of control states
Independent of the number of nodes
content-addressable
Resembling a hash table: keys values
Fault-tolerant
Can route around failures
7
“How does a scalable CAN work” ???
--- CANs Design Overview
Using a virtual d-dimensional Cartesian
coordinate space
Zone: hyper-rectangles
CAN is composed of individual nodes
Each node “owns” its individual, distinct zone
Each node “holds” state information about its neighbor
zones
Three operations
insertion, lookup and deletion
(key, value)
Requests for a key are routed by using a greedy
routing algorithm
8
Map key k1 onto a
point p1 using a
uniform hash table
(k2,v2)
Store (k1,v1) at the
node that owns the
zone with p1
p2
p1
(k1,v1)
2-d coordinate space with 5 nodes
To retrieve (k1,v1),
apply the same hash
function to map k1
onto point p1 and get
the value from p1
9
Basic design for a CAN
CAN routing
Construction of the CAN overlay
Maintenance of the CAN overlay
Three most basic pieces of the design.
10
“How can we find a point (x/y)?”
--- Routing in a CAN
1.0
Maintain a routing table of IP
address and virtual coordinate
zone of its neighbors
D
B
C
(0.2/0.4)
E
Greedy forwarding CAN message
to neighbor closest to destination
A
For d-dimensional space
partitioned into n equal zones,
Each node maintains 2d
neighbors
Average routing path length:
(d/4)(n1/d)
G
F
0.0
0.0
1.0
D’s neighbor set = {C,E}
C’s neighbor set = {B,D,E,F}
B’s neighbor set = {A,C,G}
11
“I want to join!”
--- CAN Construction
Find a node already in the CAN
Retrieve a bootstrap node’s IP address by looking up the CAN
domain name in DNS
Bootstrap node provides IP addresses
of random nodes in CAN
Finding a zone in the space
Randomly choose a point in the space
Send “JOIN” request to P via an exiting CAN node A
Node B in zone with P splits the zone and allocates “half” with
(key, value) pairs to new node
Joining the routing by notifying the neighbors
Learn IP addresses of its neighbors set from previous occupant
Previous occupant updates its neighbor set
New and old neighbors be informed of this reallocation.
12
H
1.0
1.0
D
B
JOIN (0.2/0.1)
D
C
B
C
E
E
(0.2/0.1)
H
A
G
F
G
F
A
0.0
0.0
1.0
0.0
0.0
1.0
A’s neighbor set = {B,G}
A’s neighbor set = {G,H}
B’s neighbor set = {A,C,G}
B’s neighbor set = {C,G,H}
G’s neighbor set = {A,B,F}
G’s neighbor set = {A,B,F,H}
H’s neighbor set = {A,B,G}
13
“I want to leave!”
--- CAN Maintenance
Zone must be taken over by some node
Normal procedure
Explicitly hand over zone and (key, value) pairs to one
neighbor
Merge to valid zone if possible, otherwise the neighbor
with smallest zone handle both zones temporarily
Failure: immediate takeover algorithm
Neighbors initialize takeover timers
Alive Neighbor with smallest zone takes over zone
14
1.0
B
D
Senario1: H leaves explicitly
E
Senario2: C leaves explicitly
C
Senario3: G dies
H
A A
G
F
A
0.0
1.0
15
“Can we do Better???”
--- Design Improvements (1)
Basic design
Balance low per-node state (O(d)) and short
path lengths (O(dn1/d))
Application level hops, not IP level hops
Neighbor nodes may be geographically distant
with many IP hops
Average total latency of a look up
== Avg(# of CAN hops) X Avg(latency of each
CAN hop)
16
“Can we do Better???”
--- Design Improvements (2)
Primary goal:
Reduce the latency of CAN routing
Path length
Per-CAN-hop latency
Other achievements
Robustness– routing & data availability
Load balancing
Simulated CAN design on Transit-Stub (TS)
topologies using the GT-ITM topology generator
17
Multi-Dimensioned Coordinate Spaces
(1)
Dimensions
Path length (latency)
Size of the coordinate
routing table
(small)
Fault tolerance
Effect of dimensions on path length
18
Realities: Multiple Coordinate Spaces
(2)
Maintain multiple
independent coordinate
spaces (realities)
For a CAN with r realities:
Each node is assigned r
zones and holds r
independent neighbor sets
Contents of the hash table
are replicated for each
reality
(k1,V1)
1.0
(0.6/0.8) G
A B
C
A D G
D
C F
E
B F
0.0
E
1.0
r=2
19
Realities: Multiple Coordinate Spaces
(2)
Realities
Data availability
Faulty tolerance
Path length (latency)
Effect of multiple realities on path length
20
Dimensions vs. Realities
More dimensions has
greater effect on path
length
More realities provides
stronger fault-tolerance
and increased data
availability
Path length with increasing neighbor state
21
Better CAN Routing Metrics
(3)
(Basic CAN routing metric:
Cartesian distance in virtual coordinate space)
RTT-weighted routing (round-trip-time)
Reflect underlying IP topology
Each node measures RTT to each neighbor
Forward message to neighbor with maximum
ratio of progress to RTT
Reduce the latency of individual hops
22
Overloading Coordinate Zones
(4)
Multiple nodes share the same zone, with
a threshold MAXPEERS
Nodes know all peers in its zone and one
node in its neighbor zones
How to achieve overloading zones?
<
A
join
B
A join B’s zone
Replicate key-value pairs
? MAXPEERS
=
Split B’s zone
Divide B’s peer list
Divide key-value pairs
23
Overloading Coordinate Zones
(4)
Advantage:
Reduce path length (number of hops)
Reduce per-hop latency
Improve fault tolerance
Disadvantage:
Add somewhat to system complexity
24
Multiple Hash Functions
(5)
Use k different hash
functions to map a key
onto k points and
replicate (key, value) at k
distinct nodes
Data availability
Average query latency
(query in parallel)
The size of (key,value)
database and query
traffic by a factor of k
25
Topologically-Sensitive Construction of
the CAN Overlay Network (6)
Problem of randomly allocating nodes to zones
Strange routing scenario inefficient routing
Goal: construct CAN overlay that are congruent with
the underlying IP topology
Landmarks: a well-known set of machines like DNS
servers
Each node measures its RTT to each landmark
Order each landmark in order of increasing RTT
For m landmarks: m! possible orderings
Partition coordinate space into m! equal size partitions
Nodes join CAN at random point in the partition
corresponding to its landmark ordering
26
H
(l2l3l1)
1.0
A lll B
123
E
l1l3l2
F F H
C
(0.8/0.6)
l2l1l3
l2l3l1
D
l3l1l2
G
l3l2l1
0.0
0.0
1.0
m=3, m!=6 partitions
27
Topologically-Sensitive Construction of
the CAN Overlay Network (6)
Evaluating metric:
Latency stretch=
the latency on the CAN network
the average latency on the IP network
The coordinate space
is no longer uniformly
populated
Background load
balancing techniques
28
Load Balancing Techniques
(7)
More uniform partitioning-- volume balancing
When “JOIN” is received, compare its own zone volume with
neighbors’ zone volumes
Split zone with largest volume
Results: V=VT/n, VT is the total volume
Type
% of nodes with volume V
Largest Volume
without
40%
8V
with
90%
4V
Catching and replication for “hot spot”
Catch the data keys it recently accessed
Replicate the “hot” data key at its neighboring nodes
29
Design Review
d : dimensions
r : number of realities
p : number of peers per zone
k : number of hash functions
n=218 nodes
30
Topology-Aware Routing in CAN
“It is critical for overlay routing to be aware of the
network topology” [6]
31
D
A
B
C
D A
A
D
C
B
C
B
32
Approaches to Topology-Aware Routing in
CAN
Proximity routing
Topology-based nodeId assignment
Proximity neighbor selection
Doesn’t work on CAN
33
Proximity Routing
The overlay is constructed without regard
for the physical network topology
Among a set of possible next hops, select
the one that is
Closest in the physical network
Represents a good compromise between
progress in the id space and proximity
Example
Better CAN routing metrics --- RTT
34
Topology-based NodeId Assignment
Map the overlay’s logical id space onto the
physical network
Example1: Landmark binning [5][7]
Destroy the uniform population of id space
Landmark sites can become overloaded.
Coarse-grained and difficult to distinguish
relatively close nodes
Example2: SAT-match [8]
35
Self-Adaptive Topology (SAT) Matching
Two phases:
The probing phase: probe nearby nodes for distance
measurements as soon as join the system
The jumping phase: pick the closest zone accordingly to
jump to
The iterative process completes until it is close
enough to the zone where all its physically close
neighbors are located
A global topology matching optimization is
achieved.
36
SAT-Matching
-- Probing Phase (1)
Effective flooding: flooding with a low number of
TTL hops is highly effective, with produce few
redundant messages.
Having joined the system based on a DHT
assignment, source node floods a message to
its neighbors
(source IP address, source timestamp, small TTL k)
Node that receives message responds to the
source with its IP address and flood to its
neighbors with k-1 if k-1>0
37
SAT-Matching
-- Probing Phase (2)
Example
TTL-2 neighborhood of source node
5 6
8
4
7
11
Node 2’s TTL-2
neighborhood
3 14
10
9
1
2
12
13
Node 13’s TTL-2
neighborhood
38
SAT-Matching
-- Probing Phase (3)
After collecting a list of IP address, source
node uses a ping facility to measure the
RTTs (Round-Trip-Times) to each node
that has responded.
Sort these RTTs and select two nodes with
the smallest RTTs.
Select one zone associated with one of
the two nodes to jump in.
39
SAT-Matching
-- Jumping Phase
If simply jumping to the zone, the stretch
reduction could be offset by latency increases
from other new connection.
Jumping criteria:
only when the local stretch of the source’s and the
sink’s (two closest nodes) TTL-1 neighborhoods is
reduced
How to jump? (XY)
X return its zone and (key, value) pairs to its
neighbor
Y allocate half of its zone and pairs to X
40
Discussion
Is there way to make structured P2P more
practical?
It is not free for peers. In CAN, peers are
required to follow the protocol and be
responsible for some data keys.
Only key search
Can the topology-aware routing be a costeffective manner in a highly dynamic
environment?
41
Conclusion
CAN, Chord, Pastry and Tapestry are representative
Structured P2P system
Self-organizing
Scalable
Distributed DHT
Fault tolerance
Structured
P2P
Size of routing
table
number of hops
CAN
O(d)
O(n1/d)
Chord
O(log n)
O(log n)
Pastry
O(log n)
O(log n)
Tapestry
O(log n)
O(log n)
42
Reference
[1]Hari Balakrishnan, et., al., “Looking Up Data in P2P Systems”, Communication of
the ACM, Vol. 46, N. 2, 2003.
[2] Sylvia Ratnasamy, Paul Francis, Mark Handley, and Richard Karp, “A Scalable
Content-Addressable Network”, ACM SIGCOMM 2001.
[3] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari
Balakrishnan, “Chord: A Scalable Peer-to-peer Lookup Service for Internet
Applications”, ACM SIGCOMM 2001.
[4]Antony Rowstron and Peter Druschel, “Pastry: Scalable, decentralized object
location and routing for large-scale peer-to-peer systems”, Middleware 2001.
[5] B. Zhao and J. Kubiatowicz and A. Joseph, “Tapestry: An infrastructure for faulttolerant wide-area location and routing”, U. C. Berkeley, 2001
[6] M. Castro, P. Druschel, Y. C. Hu, and A. Rowstron, "Topologyaware routing in
structured peer-to-peer overlay networks“, presented at Intl. Workshop on Future
Directions in Distributed Computing, June 2002.
[7] S. Ratnasamy, M. Handley, R. Karp, and S. Shenker, “Topologically-aware overlay
construction and server selection”, INFOCOM 2002.
[8] Shansi Ren, Lei Guo, Song Jiang, and Xiaodong Zhang, “SAT-Match: a selfadaptive topology matching method to achieve low lookup latency in structured P2P
overlay networks" , Proceedings of the 18th International Parallel and Distributed
Processing Symposium (IPDPS'04), April 26-30, 2004.
43