Early Measurements of a Cluster-based Architecture for P2P
Download
Report
Transcript Early Measurements of a Cluster-based Architecture for P2P
Early Measurements of a
Cluster-based Architecture for
P2P Systems
Yinglian Xie
Carnegie Mellon University
Balachander Krishnamurthy, Jia Wang
ATT Labs---Research
Motivation
Peer-to-peer(P2P) applications provide us
with a new content service model
End-hosts self organized into an overlay network
and share content with each other
For a wide deployment of P2P applications
7/21/2015
We need a scalable content location and routing
scheme in the application layer
We need to study and understand P2P traffic
patterns
2
Recent Work
Existing approaches for content location
Napster: uses a centralized server
Gnutella: relies on flooding of queries
Recent designs
Distributed indexing schemes based on
hash functions
CAN, Chord, Pastry, Tapestry
7/21/2015
3
Our Work
A Cluster-based architecture (CAP) for P2P
systems
Example application: distributed search
(support keyword searching)
Design: using network-aware clustering
Early measurements of CAP
trace analysis + simulations
7/21/2015
4
CAP System Design
Network-aware clustering
B. Krishnamurthy and J.Wang. On Network-Aware Clustering of
Web Clients. In proceedings of ACM Sigcomm, August 2000
An effective technique to group clients that are
topologically close and under common
administrative domain
Apply network-aware clustering to P2P
applications
7/21/2015
An additional level in the hierarchy
Less dynamism
More scalability
5
CAP Architecture
Clustering server
Three entities
Two operations
client
7/21/2015
Clustering server
Delegate
Client
Node join and
node leave
Query lookup
delegate
6
Inter-cluster Routing
Each query has a maximum search depth
Each delegate keeps a neighbor list
Assigned randomly when the delegate joins the
network
Updated gradually based on application
requirements
Depth-first search among neighbors
7/21/2015
7
CAP Evaluation
Collect Gnutella traces, apply networkaware clustering in trace data analysis
To examine the potential advantage of using
network-aware clustering
Trace-driven simulations
Measure CAP system performance based on
real deployment (ongoing work)
7/21/2015
8
Collecting Gnutella Trace
A modified open source Gnutella client (gnut) to
passively monitor and log all Gnutella messages
Location Trace length Number of IP
addresses
Location
CMU
10 hours
799,386
CMU
89 hours
301,025
ATT
14 hours
302,262
ATT
139 hours
261,094
6 hours
185,905
96 hours
409,084
75 hours
292,759
10 hours
69,285
ACIRI
UKY
WPI
Table 1
Traces with unlimited connections
7/21/2015
Trace length
Number of IP
addresses
Table 2
Traces with limited connections
9
Cluster Distribution
CMU trace
7/21/2015
5/24/2001 – 5/25/2001, 799,386 IP addresses, 45,129 clusters
Clustering helps reduce query latency by caching repeated queries
10
Client and Cluster Distribution
along Time
Network-aware
clustering helps reduce
dynamism in the P2P
network
7/21/2015
11
Simulation
Trace-driven simulation
Performance metrics
7/21/2015
Use Gnutella trace to generate “join, leave, search”
Assume the query distribution follows the file
distribution
Hit rate
Overhead
Search Latency
12
Hit Rate
7/21/2015
Use CMU trace
1,000 node stationary
network
311 clusters
4,615search messages
3,793 unique files
13
Overhead and Search Latency
Overhead
Messages per search, forward operations
per delegate
In Gnutella, overhead grows exponentially
In CAP, overhead grows linearly
Search Latency
Application level hop length
In CAP, search path length is short
7/21/2015
14
Summary
CAP is promising to increase stability and
scalability of distributed applications
Ongoing work: We are implementing CAP,
deploying it in machines around the world,
and measuring the performance
7/21/2015
15