Biased Peer Selection in BitTorrent

Download Report

Transcript Biased Peer Selection in BitTorrent

Peer-Assisted Content
Distribution Networks:
Techniques and Challenges
Pei Cao
Stanford University
Traditional Intra-Provider Content
Distribution Networks
National Center
Regional Center
. . .
Branch
Users
...
...
...
...
...
...
Peer-to-Peer Content Distribution
National Center
Regional Center
. . .
Branch
Users
...
...
...
...
...
...
P2P vs CDN
• P2P:
– No infrastructure cost
– Supply grows linearly with demand
– Simple distributed, randomized algorithms
– No QoS
• CDN:
– Initial infrastructure cost
– Centralized scheduling algorithms
– Network efficiency
– Capable of supporting QoS
Combine P2P with CDN?
• Use P2P to complement CDN
– P2P reduces load on the CDN, covers areas where
CDN is not installed
– Must be able to control, or “shape”, P2P traffic
• Use CDN to complement P2P
– CDN steps in when peer-based distribution is falling
short, enabling QoS
– Must be able to detect when peers won’t meet the
delivery time guarantee
Outline
• Review of BitTorrent
• Traffic-shaping BitTorrent: biased neighbor
selection
• QoS in BitTorrent: delivery time prediction
BitTorrent File Sharing Network
Goal: replicate K chunks of data among N
nodes
• Form neighbor connection graph
• Neighbors exchange data
BitTorrent: Neighbor Selection
Seed
Whole file
Tracker
file.torrent
1
4
3
2
5
A
BitTorrent: Piece Replication
Seed
Whole file
1
Tracker
file.torrent
3
5
A
BitTorrent: Piece Replication
Algorithms
• “Tit-for-tat” (choking/unchoking):
– Each peer only uploads to 7 other peers at a time
– 6 of these are chosen based on amount of data
received from the neighbor in the last 20 seconds
– The last one is chosen randomly, with a 75% bias
toward newcomers
• (Local) Rarest-first replication:
– When peer 3 unchokes peer A, A selects which piece
to download
Analysis of BitTorrent
• Conclusion from modeling studies:
BitTorrent is nearly optimal in idealized,
homogeneous networks
– Demonstrated by simulation studies
– Confirmed by theoretical modeling studies
• Intuition: in a random graph,
Prob(Peer A’s content is a subset of Peer B’s) ≤ 50%
Traffic-Shaping BitTorrent
Random Neighbor Graph
• Existing studies all assume random
neighbor selection
– BitTorrent no longer optimal if nodes in the
same ISP only connect to each other
• Random neighbor selection  high crossISP traffic
Difficulty in Traffic-Shaping P2P
Applications
• ISPs:
– Different links have different monetary costs
– Prefer “clustering” of traffic
• P2P Applications:
– No knowledge of underlying ISP topology
– Use randomized algorithms that don’t do well
under clustering
• Current solution: throttling  users suffer
A Network-Friendly BitTorrent?
• ISPs inform BitTorrent of its link
preferences
• Algorithm of BitTorrent is adjusted such
that both users and ISPs benefit
• Example: Biased Neighbor Selection
– Works when cost function is transitive
Biased Neighbor Selection
• Idea: of N neighbors, choose N-k from peers in
the same ISP, and choose k randomly from
peers outside the ISP
ISP
Implementing Biased Neighbor
Selection
• By Tracker
– Need ISP affiliations of peers
• Peer to AS maps
• Public IP address ranges from ISPs
• Special “X-” HTTP header
• By traffic shaping devices
– Intercept “peer  tracker” messages and
manipulate responses
– No need to change tracker or client
Evaluation Methodology
• Event-driven simulator
– Use actual client and tracker codes as much as possible
– Calculate bandwidth contention, assume perfect fairshare from TCP
• Network settings
– 14 ISPs, each with 50 peers, 100Kb/s upload, 1Mb/s
download
– Seed node, 400Kb/s upload
– Optional “university” nodes (1Mb/s upload)
– Optional ISP bottleneck to other ISPs
Limitation of Throttling
Throttling: Cross-ISP Traffic
Redundancy: Average # of times a data chunk enters the ISP
Redundancy
50
40
30
20
10
0
No
2.5Mbps
throttling
1.5Mbps
Bottleneck Bandwidth
500kbps
Biased Neighbor Selection:
Download Times
Biased Neighbor Selection: CrossISP Traffic
Redundancy
50
40
30
20
10
0
Regular
k=17
k=5
Neighbor Selection
k=1
Importance of Rarest-First
Replication
• Random piece replication performs badly
– Increases download time by 84% - 150%
– Increase traffic redundancy from 3 to 14
• Biased neighbors + Rarest-First  More
uniform progress of peers
Presence of External HighBandwidth Peers
• Biased neighbor selection alone:
– Average download time same as regular BitTorrent
– Cross-ISP traffic increases as # of “university” peers
increase
• Result of tit-for-tat
• Biased neighbor selection + Throttling:
– Download time only increases by 12%
• Most neighbors do not cross the bottleneck
– Traffic redundancy (i.e. cross-ISP traffic) same as the
scenario without “university” peers
Comparison with Simple Clustering
• Gateway peer: only one peer connects to
the peers outside the ISP, all other peers
only connect to peers inside the ISP
– Gateway peer must have high bandwidth
• It is the “seed” for this ISP
– Ends up benefiting peers in other ISPs
Combining Biased Neighbor
Selection with Caches
• Under random neighbor selection
– bandwidth requirement of cache is high
• Under biased neighbor selection
– bandwidth needed from the cache is reduced
by an order of magnitude
Conclusions
• By choosing neighbors well, BitTorrent can
achieve high peer performance without
increasing ISP cost
– Biased neighbor selection: choose initial set of
neighbors well
– Can be combined with throttling and caching
 BitTorrent’s algorithm can be shaped!
Delivery Time Prediction
Motivation
• Provide delivery time guarantee under
P2P+CDN
• What contributes to delivery time of a
download via BitTorrent?
– From simulations: seed bandwidth and even
replication of blocks
– Missing: node join/leave dynamics, TCP effects,
etc.
Side-by-Side Live Experiments
• Two clients, running on the same
machine, starting at the same time,
downloading the same
• 13 experiments from Apr-May 2006
• File sizes: 700MB ~ 1.4GB
• Network size: 1100 ~ 2100 peers
• Duration: 10 hrs ~ 2 days
Results from Experiments
• Effective download rate: 10 ~ 30KB/s
• Speed difference between the two peers:
3% ~ 82%
• What made the slower peer slow?
Suspicion #1: Slower Neighbors?
• Calculate unweighted average of observed throughput at
application level
– R1: average from all neighbors
– R2: average from neighbors uploading >250KB of data
– R3: average from neighbors uploading >2.5MB of data
• Low correlation between download-time ratio and
neighbor-speed ratio
– 0.57 for R1, 0.43 for R2, 0.47 for R3
– Faster neighbors corresponds to slower downloads in 3
experiments
Suspicion #2: Fewer Neighbors
Uploading to the Peer?
• Slot analysis: calculate download concurrency
– Maximum number of neighbors: 35
– Neighbors come and go  align neighbors into 35
slots
– Calculate time-average of number of concurrent slots
with neighbors uploading
• Upload concurrency varies from 7 to 11
– Explains one of the download-time/neighbor-speed
reversal case
– But doesn’t explain the two others
“Close” Neighbors
• 90% of data downloaded from 1-4% of
neighbors
• Let F(p) and G(p) be the number of
neighbors that provides p of data to peers
F and G, then
F(p) > G(p)  peer F is slower than G
– This holds for p = 90%, 75%, and 50%
What makes a neighbor close?
• Not related to speed, or order of
connection to peer, or order of unchoking
by peer
Cost of Departure of a Close
Neighbor
• Departure cost: if one close neighbor
leaves, calculate the time until the earliest
next close neighbor
• The average departure cost: 30 min
 The convergence time of the tit-for-tat
algorithm is slow
Why Do Close Neighbors Leave
• Five possible reasons
– A: Random disconnect
– B: Finished downloading
– C: Peer broke off the relationship
– D: Neighbor broke off the relationship
• Results: B is most common, followed by
C/D, then A
Conclusions
• Content delivery time in BitTorrent is
determined by:
– Neighbor upload speed
– Stability of neighbor relationship
• Disruption of the pairing leads to long delivery time
• Neighbors may leave due to random disconnection,
completion of download, or finding faster
neighbors
Using CDN to Complement P2P
• Use nodes CDN as high-speed specially
managed seeds
• Seeds are called to help whenever a node
loses a close neighbor
Summary
• A way to shape BitTorrent traffic
• Predicting BitTorrent performance by
monitoring close peer relationship
Related Work
• Many modeling studies of BitTorrent
• Simulation studies
• Measurements of real torrents
Ongoing Work
• Live experiments with biased neighbor
selections
• A k-regular graph algorithm with faster
convergence
• Prototype implementation of “P2P+CDN”