Early Measurements of a Cluster-based Architecture for P2P

Download Report

Transcript Early Measurements of a Cluster-based Architecture for P2P

Early Measurements of a
Cluster-based Architecture for
P2P Systems
Yinglian Xie
Carnegie Mellon University
Balachander Krishnamurthy, Jia Wang
ATT Labs---Research
Motivation

Peer-to-peer(P2P) applications provide us
with a new content service model


End-hosts self organized into an overlay network
and share content with each other
For a wide deployment of P2P applications


7/21/2015
We need a scalable content location and routing
scheme in the application layer
We need to study and understand P2P traffic
patterns
2
Recent Work


Existing approaches for content location
 Napster: uses a centralized server
 Gnutella: relies on flooding of queries
Recent designs
 Distributed indexing schemes based on
hash functions
 CAN, Chord, Pastry, Tapestry
7/21/2015
3
Our Work
 A Cluster-based architecture (CAP) for P2P
systems
 Example application: distributed search
(support keyword searching)
 Design: using network-aware clustering
 Early measurements of CAP
 trace analysis + simulations
7/21/2015
4
CAP System Design

Network-aware clustering



B. Krishnamurthy and J.Wang. On Network-Aware Clustering of
Web Clients. In proceedings of ACM Sigcomm, August 2000
An effective technique to group clients that are
topologically close and under common
administrative domain
Apply network-aware clustering to P2P
applications



7/21/2015
An additional level in the hierarchy
Less dynamism
More scalability
5
CAP Architecture
Clustering server

Three entities




Two operations


client
7/21/2015
Clustering server
Delegate
Client
Node join and
node leave
Query lookup
delegate
6
Inter-cluster Routing


Each query has a maximum search depth
Each delegate keeps a neighbor list



Assigned randomly when the delegate joins the
network
Updated gradually based on application
requirements
Depth-first search among neighbors
7/21/2015
7
CAP Evaluation

Collect Gnutella traces, apply networkaware clustering in trace data analysis



To examine the potential advantage of using
network-aware clustering
Trace-driven simulations
Measure CAP system performance based on
real deployment (ongoing work)
7/21/2015
8
Collecting Gnutella Trace
A modified open source Gnutella client (gnut) to
passively monitor and log all Gnutella messages

Location Trace length Number of IP
addresses
Location
CMU
10 hours
799,386
CMU
89 hours
301,025
ATT
14 hours
302,262
ATT
139 hours
261,094
6 hours
185,905
96 hours
409,084
75 hours
292,759
10 hours
69,285
ACIRI
UKY
WPI
Table 1
Traces with unlimited connections
7/21/2015
Trace length
Number of IP
addresses
Table 2
Traces with limited connections
9
Cluster Distribution

CMU trace


7/21/2015
5/24/2001 – 5/25/2001, 799,386 IP addresses, 45,129 clusters
Clustering helps reduce query latency by caching repeated queries
10
Client and Cluster Distribution
along Time
 Network-aware
clustering helps reduce
dynamism in the P2P
network
7/21/2015
11
Simulation

Trace-driven simulation

Performance metrics





7/21/2015
Use Gnutella trace to generate “join, leave, search”
Assume the query distribution follows the file
distribution
Hit rate
Overhead
Search Latency
12
Hit Rate





7/21/2015
Use CMU trace
1,000 node stationary
network
311 clusters
4,615search messages
3,793 unique files
13
Overhead and Search Latency
 Overhead
 Messages per search, forward operations
per delegate
 In Gnutella, overhead grows exponentially
 In CAP, overhead grows linearly
 Search Latency
 Application level hop length
 In CAP, search path length is short
7/21/2015
14
Summary
 CAP is promising to increase stability and
scalability of distributed applications
Ongoing work: We are implementing CAP,
deploying it in machines around the world,
and measuring the performance
7/21/2015
15