Lecture 18 - The Chinese University of Hong Kong
Download
Report
Transcript Lecture 18 - The Chinese University of Hong Kong
Stochastic Analysis of File
Swarming Systems
John C.S. Lui
The Chinese University of Hong Kong
Collaborators: D.M. Chiu, M.H. Lin, B. Fan
Background
Traditional Client/Server Sharing
Performance deteriorates rapidly as the number of clients
increases
IP Multicast
Application Multicast (e.g., CDN, ESM)
reliability, unused resources at leaf nodes
P2P (e.g., Naspter, Gnutella)
Free riders only download without contributing to the
network.
BitTorrent P2P systems:
Good scalability
Built-in incentive mechanism to contribute
BT Components
On a public domain site, obtain torrent file,
for example:
http://bt.btchina.net
http://bt.ydy.com/
Web Server
The Lord of Ring.torrent
Transformer.torrent
Harry Potter.torrent
BT Components
The .torrent file
Static “metainfo” file to contain necessary information :
File name
# of chunks, size
checksum
IP address of the Tracker,…etc
A BitTorrent tracker
Non-content-sharing node
Track peers
File:
F C1 C2
Cm ,Ci C j
Chunk size (256KB), has individual hash code in the torrent file
Types of peers:
Leechers
Seeders
BT: publishing a file
Harry Potter.torrent
Moe
Web Server
Tracker
Downloader:
Larry
Seeder:
John
Downloader:
Curly
Simple example
{1,2,3,4,5,6,7,8,9,10}
Seeder:
John
{}{1,2,3}
{1,2,3,5}
{}
{1,2,3}
{1,2,3,4}
{1,2,3,4,5}
Downloader
Moe
Downloader
Larry
BT: internal Chunk Selection
mechanisms
Strict Priority
First Priority
Rarest First
General rules
Random First Piece
Special case, at the beginning
Endgame Mode
Special case
BT: internal mechanism
Built-in incentive mechanism (where all
the magic happens):
Choking Algorithm
Optimistic Unchoking
BT: internal mechanism
• Choking is a temporal refusal to upload
• Each peer unchokes a fixed number of peers
• Reasons for choking:
Yaokun Wu
– Avoid free riders
– Network congestion
– Contribute to “useful” peers
Choked
Choked
John C.S Lui
BT: internal mechanism (optimistic
unchoking)
A BitTorrent peer has a single “optimistic
unchoke” which uploads regardless of the
current download rate from it. This peer rotates
every 30s
Reasons:
To discover currently unused connections are better
than the ones being used
To provide minimal service to new peers
Example: optimistic unchoking
Andy Yao
100kb/s
40kb/s
70kb/s
70kb/s.
110kb/s
10kb/s
Downloader
Moe
20kb/s
70kb/s
10kb/s
30kb/s
5kb/s
Downloader:
Larry
15kb/s
Downloader:
Melinda
Downloader:
Curly
Downloader:
John Lui
P2P content distribution
BitTorrent
Sending a file to a large number of peers, with the help
of peers
Producing the most Internet traffic today (over 50% of
traffic, creates contention but ....)
What IP multicast tried to support
Modeling these systems => insights
Why Study BitTorrent-like System?
BitTorrent is very efficient.
Which features make it perform so well?
Motivating questions
What is the effect of bandwidth constraints?
Is the Rarest First policy really necessary?
Must nodes perform seeding after file downloading?
How serious is the Last Piece Problem?
Is source coding useful?
Does the incentive mechanism affect the performance much?
Our aim is to develop mathematical models of file swarming systems,
allowing us to investigate these issues via analytical means.
Model for the File Swarming System
A file has K non-overlapping chunks.
Peers arrive according to a Poisson process. Each peer is initialized
with one random chunk.
Peers leave the system immediately when finish downloading.
The system is slotted: downlink bandwidth is one chunk per time slot
for all peers. (download constraint)
In each time slot, each peer contacts m neighbors uniformly from the
system to see whether they are useful. If some neighbors are useful, it
randomly chooses one and requests a random useful chunk.
If a peer receives several requests, it will satisfy all / random one
request(s). (without/with upload constraint)
Model for the File Swarming System
Example: m=2
Without upload constraint
With upload constraint
peer C
peer A
peer D
peer B
peer E
The case “m = 1 & no upload constraint” was studied by
L.Massoulie et.al in ”Coupon replication systems”.
Model 1: Download Constraint Only
Classify peers into K−1 types. Peers holding i chunks are named
type i peers. Denote
the number of type i peers,
We are interested in the average sojourn time Ti for type i peers.
The average downloading time
For a type i peer, the probability that a type j peer is useful:
For a type i peer, the probability that a randomly picked peer is
useful:
Model 1: Download Constraint Only
Given the system state
,
is a
Multi-dimensional infinite state-space Markov Process:
It is hard to solve this Markov Chain directly
Transform the Markov Chain to a “Density Dependent
jump Markov Process”
Focusing on its steady state and asymptotic behavior
We derive tight bounds.
Model 1: Download Constraint Only
The average downloading time
.
The case m=1 has been studied in [1], in which the authors gave a
looser bound:
[1] L.Massoulie, M.VojnoviC, ”Coupon replication systems”, SIGMETRICS, 2005.
Lower bound v.s. Upper bound (K=200)
m=1
m=2
Last Piece Problem
It takes a peer a longer time to download the last few chunks of the file,
since it gets increasingly more difficult to find other peers that can help.
Bounds v.s. Simulation (K=200)
m=1
m=2
The simulation shows the accuracy of our model.
How to relief the last piece problem?
System with Source Coding
Source
K=4
Q=6
peer C
peer A
peer D
peer B
peer E
System with Source Coding
The source encodes the original K chunks into Q chunks,
Any peer could reconstruct the original file after he receives any K
distinct chunks.
Source Coding vs. No Coding(K=200)
m=1, no coding
m=1, source coding (
Source coding eliminates the Last Piece Problem !!!
)
Download constraint only
K=200; m=1
K=500; m=1
Download Constraint
K=200; m=2
K=500; m=2
Model 2: Download & Upload Constraints
—— m=1
peer C
peer A
peer D
peer B
peer E
Model 2: Download & Upload Constraints
—— m=1
Stage One: Requesting
The same as Model 1.
Stage Two: Downloading
The distribution of the number of requests one peer
would receive (depending on its type).
Only one request will be satisfied.
Still a density dependent jump Markov process
The transition rates are more complicated.
Model 2: Download & Upload Constraints
—— m=1
≈ 1.58
Bounds v.s. Simulation (K=200, without
source coding)
m=1 & satisfying one request
Ti is NOT close 1 any more, i.e. downloading time is far from being optimal.
Model 3: An Incentive Mechanism
Assuming peers are matched randomly at the beginning of each
time slot. Each pair will perform chunk transfer iff both of them are
useful to each other.
peer C
peer A
Request C5
Request C2
peer D
peer B
Request C1
peer E
Model 3: An Incentive Mechanism
Bounds v.s. Simulation (K=200, without
source coding)
First Piece Problem
It is not easy to download the first few chunks when a peer enters the system,
but one can solve this in various of ways….
Incentive Mechanism
K=200; m=1
K=500; m=1
Conclusion
Many peers, steady state, certain mechanism to ensure file
availability (e.g. some seeders), then
The nature of swarming makes P2P systems very efficient.
Rarest First policy is not necessary for performance. If peers are
cooperative, “random policy” is good enough, though it may be
helpful to enhance file availability.
Peers are not necessary to perform seeding after file downloading.
Simple strategies (everything is random) can make the downloading
time near optimal.
Source coding is useful, to relief the last piece problem.
With certain incentive mechanism, the downloading time can still
approach optimal.
Our mathematical models provide a basis for
designing new BT-like protocol.
Research Questions
What about fairness?
How to extend file swarming to
multimedia streaming? For Joost?
What about wide area network
exchange?
What happen if there is ``network
congestion’’? What is the impact?
Network Coding?
Security?
Q & A
Thank You