p2p_traf_char

Download Report

Transcript p2p_traf_char

A Measurement Study of Peer-to-Peer
File Sharing Systems
Presented by
Cristina Abad
Motivation

In a P2P file sharing system, peers are
usually in the “edge” of the network
 Does this affect/limit the quality of the
infrastructure?
 What are the characteristics of hosts that
choose to participate?
 Solution: Measure Gnutella and Napster
traffic to help understand these issues
Napster
Gnutella
Methodology

Crawler periodically takes “snapshot” of
Napster/Gnutella
– capture basic info (peers, files shared, …)

For peers discovered
– measure bottleneck bandwidth
– measure latency
– track content and degree of sharing

Measure lifetime
– track availability of peers (at P2P and IP level)
Crawling Napster

Peers can only be discovered by querying
index
 Crawler issues queries with names of
popular song artists
 Query responses contain
– IP, reported bandwidth, files shared (number,
names and sizes)

Results:
– Captured 40-60% of Napster hosts
(contributing to 80-95% of total files)
– Could not capture peers that do not share files
Crawling Gnutella

Crawler uses ping/pong to discover peers
 Each crawl captured aprox. 10000 peers
Measuring bandwidth

Reported bandwidth may not be accurate
(ignorance or lies)
 Use bottleneck bandwidth as approximation
to available bandwidth
– capacity of slowest host along path between
two hosts

Used SProbe to actively measure both
upstream and downstream bottleneck
bandwidth
– Similar to “packet pair” technique
Packet Pair Technique

Two packets queued next to each other at
bottleneck link exit the link t seconds apart:
t

s2
bbnl
s2 :
bbnl:
Then,
bbnl
size of second packet
bottleneck bandwidth
s2

t
Kevin Lai and Mary Baker. “Measuring bandwidth”. In Proceedings of IEEE INFOCOM '99. 1999.
How many peers are server-like?

High-bandwidth, low latency, high
8% have upstream
availability
bb  10Mbps

Availability – Host uptimes

Availability – Session duration
Free-riders
Is Gnutella robust?
Measurement, Modeling, and
Analysis of a Peer-to-Peer
File-Sharing Workload
Presented by
Cristina Abad
Three-tiered approach
1.
Analyze 200-day trace of Kazaa traffic
 Considered only traffic going from U.
Washington to the outside
2.
Develop a model of multimedia workloads
 Analyze and confirm hypothesis
3.
Explore potential impact of locality awareness in Kazaa
Contributions

Obtained some useful characterizations of
Kazaa’s traffic
 Showed that Kazaa’s workload is not Zipf
– Showed that other workloads (multimedia) may
not be Zipf either

Presented a model of P2P file-sharing
workloads based on their trace results
– Validated the model through simulations that
yielded results very similar to those from traces

Proved the usefulness of exploiting localityaware request routing
Measurement results

Users are patient
 Users slow down as they age
 Kazaa is not one workload
 Kazaa clients fetch objects at-most-once
 Popularity of objects is often short-lived
 Kazaa is not Zipf
User characteristics (1)

Users are patient
User characteristics (2)

Users slow down as they age
– clients “die”
– older clients ask for less each time they use
system
User characteristics (3)

Client activity
– Tracing used could only detect users when their
clients transfer data
– Thus, they only report statistics on client
activity, which is a lower bound on availability
– Avg session lengths are typically small
(median: 2.4 mins)
• Many transactions fail
• Periods of inactivity may occur during a request if
client cannot find an available server with the object
Object characteristics (1)

Kazaa is not
one workload
Object characteristics (2)

Kazaa object dynamics
– Kazaa clients fetch objects at most once
– Popularity of objects is often short-lived
– Most popular objects tend to be recently born
objects
– Most requests are for old objects
Object characteristics (3)

Kazaa is not Zipf
 Web access patterns are Zipf: small number of
objects are extremely popular, but there is a long
tail of unpopular requests.
 Zipf’s law: popularity of ith-most popular object is
proportional to i-α, (α: Zipf coefficient)
 (Zipf) looks
linear on
log-log scale
Model of P2P file-sharing
workloads

On average, a client requests 2 objects/day
 P(x): probability that a user requests an
object of popularity rank x  Zipf(1)
– Adjusted so that objects are requested at most
once

A(x): probability that a newly arrived object
is inserted at popularity rank x  Zipf(1)
 All objects are assumed to have same size
 Use caching to observe performance
changes (effectiveness  hit rate)
Model – Simulation results

File-sharing effectiveness diminishes with
client age
– System evolves towards one with no locality
and objects chosen at random from large space

New object arrivals improve performance
– Arrivals replenish supply of popular objects

New clients cannot stabilize performance
– Can’t compensate for increasing number of old
clients
– Overall bandwidth increases in proportion to
population size
Model validation

By tweaking the arrival rate of of new
objects, were able to match trace results
(with 5475 new arrivals per year)
Exploring locality-awareness

Currently organizations shape or filter P2P
traffic
 Alternative strategy: exploit locality in filesharing workload
– Caching; or,
– Use content available within organization to
substantially decrease external bandwidth usage
– Result: 86% of externally downloaded bytes
could be avoided by using an organizational
proxy
Questions?
Analysis

How can results obtained be used when
evaluating P2P schemes?
 Are any of the measurements obtained
biased?
 Peers are heterogeneous
– Incentives
– Enforcement (e.g. super-peers in Kazaa)
SProbe

Works in uncooperative environments
 Works on asymmetric network paths
 Exploit properties of TCP protocol
– Send SYN packet with large payload; then,
measure time dispersion of received RST
packet
Zipf

Linguist George Kingsley Zipf observed that for
many frequency distributions, the n-th largest
frequency is proportional to a negative power of
the rank order n
 "Zipf's law" is also sometimes used to refer to the
corresponding probability distribution
 Is an instance of a power law
 Zipf's law is often demonstrated by plotting the
data, with the axes being log(rank order) and
log(frequency). If the points are close to a single
straight line, the distribution follows Zipf's law.