Transcript BitTorrent
Overview
Centralized Database
Napster
Swarming
BitTorrent
Hierarchical Query Flooding
KaZaA
Skype (VOIP)
1
P2P
P2P: centralized directory
original “Napster” design
(1999, S. Fanning)
1) when peer connects, it
informs central server:
Bob
centralized
directory server
1
peers
IP address
content
2) Alice queries directory
server for “file A”
3) Alice requests file from
Bob
Problems?
1
3
1
2
r Single point of failure
r Performance bottleneck
1
Alice
r Copyright infringement
2
P2P
Napster: Publish
insert(X,
123.2.21.23)
...
Publish
I have X, Y, and Z!
123.2.21.23
3
P2P
Napster: Search
123.2.0.18
Fetch
Query
search(A)
-->
123.2.0.18
Reply
Where is file A?
4
P2P
Napster: Discussion
Pros:
Simple
Search scope is O(1)
Controllable (pro or con?)
Cons:
Server maintains O(N) State
Server does all processing
Single point of failure
5
P2P
BitTorrent: History
In 2002, B. Cohen debuted BitTorrent
Workshop on Economics of Peer-to-Peer Systems, 2003
Key Motivation:
Popularity exhibits temporal locality
Now
New file releases
Linux ISO檔, 魔獸世界更新檔, etc.
Network TV(PPstream) also borrows the idea from BT
Focused on Efficient Fetching, not Searching
Does not include a search mechanism, but rather, it relies on
central-directory based search facilities as provided by Web sites
such as suprnova.org
Has some “real” publishers:
Blizzard Entertainment using it to distribute games
6
Something you may know
種子
擁有檔案的人
File is partitioned into several pieces
.torrent與tracker
紀錄檔案片段全部名稱
下載成員名單與網路位置
斷種
7
BitTorrent-Characteristic
Fast download speed
Reduce the load of file publisher
Fairness
Choke/unchoke
8
Policies
Relay on .torrent file to know the file
pieces and members
Rarest First Policy
Choking Policy
Optimistic Unchoking
9
BitTorrent: Overview
Swarming:
Join: contact centralized “tracker” server, get a
list of peers.
Publish: Run a tracker server. Create .torrent
file.
Search: Out-of-band. E.g., use Google to find a
torrent for the file you want.
Fetch: Download pieces of the file from your
peers. Upload pieces you have to them.
10
P2P
Components
Peers
Seed (file owner)
Leech (downloader)
Data
divided into 256KB pieces
利用hash 檢查檔案正確性
Metainfo File (.torrent file)
Data資訊和各個pieces的hash value
Tracker的url
Etc…
Tracker
HTTP/HTTPS services
維護torrent中各個peers的資訊
幫助peers追蹤其他peers的途徑
BT client
BitComet
mTorrent
Etc…
11
File distribution: BitTorrent
P2P file distribution
tracker: tracks peers
participating in torrent
torrent: group of
peers exchanging
chunks of a file
obtain list
of peers
trading
chunks
peer
12
2: Application Layer
BitTorrent - Algorithms
Two components in BitTorrent downloading
algorithm:
Peer Selection – determines from whom to
download the piece?
Piece Selection – determines which piece
to download?
13
Tit for Tat
"tip for tat", an agent using this strategy will respond
in kind to a previous opponent's action.
If the opponent previously was cooperative, the
agent is cooperative. If not, the agent is not.
This strategy is dependent on the following
conditions :
1. Unless provoked, the agent will always cooperate
2. If provoked, the agent will retaliate
3. The agent is quick to forgive
14
BitTorrent - Peer selection
Choke Algorithm
Choking is a temporal refusal to upload
Each peer unchokes a fixed number of peers
(default = 4)
3 peers on tit-for-tat basis
1 peer on optimistic unchoke basis
15
BitTorrent - Peer selection (cont’d)
Tit-for-tat peer selection
Select the 3 peers from which you downloaded
most and that are interested in your chunks
Peer selection is done every 10 seconds, based
on the download rates are of the last 30
seconds.
16
BitTorrent - Peer selection (cont’d)
Optimistic unchoke peer selection
Select one peer at random that is interested in
your chunks, regardless of the current
download rate from it
Rotates every 30 seconds.
Reason:
To discover currently unused connections that
are better than the ones being used
17
BitTorrent: Tit-for-tat
(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers
With higher upload rate,
can find better trading
partners & get file faster!
2: Application Layer
18
BitTorrent: Sharing Strategy
“Tit-for-tat” sharing strategy
“I’ll share with you if you share with me”
Be optimistic: occasionally let freeloaders download
Otherwise no one would ever start!
Also allows you to discover better peers to download from
when they reciprocate
19
P2P
BitTorrent - Piece selection
Random first piece
Only applies if leecher has downloaded less than 4
pieces (chunks)
Choose randomly the next piece to download
Local rarest first policy
Determine the pieces that are most rare among your
peers and download those first
Ensures that the most common pieces are left till the
end to download
Rarest first also ensures that a large variety of pieces
are downloaded from the seed
20
Mathematic model(Markov Chain)
Four cities
A city : the men not join BT
B city : Leeches (downloaders) in BT
C city : Seeds (file owners) in BT
D city : the men finish the downloading and
leave BT
Parameters
A->B : λ
C->D : g
B->C :
How many peers in BT
Average bandwidth
21
We want to find in a static state
X’: How many peers live in B city (下載者人數)
Y’: How many peers live in C city (seeds人數)
T: Average moving speed from B to C (下載時間)
X’= l/(b(1+q/b))
Y’= l/(g(1+q/b))
Remind
g : 完成下載後離開的比例
q : 中途取消下載的比例
b
受分享頻寬、檔案分享效率與種子離開率影響的一個變22
數
We know X’ and Y’, by queuing theory
T=1/(q+b)
Properties
平均下載時間與人潮湧進的速度無關
檔案分享的效率越高,下載時間越短
種子離開系統的速率會影響下載時間
23
Another Property
多數人下載慢,少數人下載速度超快
下載速度仍比ftp快
上傳頻寬最大的族群會優先獲得檔案,當
這個族群的下載速度已經快到不能再快的
時候,多出來的上傳頻寬會分享給第二快
的族群,然後依此類推…。
只要每個族群的人數到達一定的數量,就
會達到人人有檔抓、速度都不慢的平衡點。
24
How many bandwidth can free rider
get?
Assume only one free rider in the system
Parameters
N(總人數)
μ(每人平均上傳頻寬)
n (當下有上傳關係的人數)
N*m*(1/(N-n))*(1/(n+1))
N*m : 總上傳頻寬
1/(N-n) : free rider被選到的機率
1/(n+1) : optimistic unchoking分到的頻寬比例
N>>n
μ/(n+1)
25
BitTorrent - Summary
Efficient file download thanks to simple incentive
mechanisms
Local rarest first
High piece entropy
Tit-for-tat
Avoids free-riding
Optimizes resource utilization
26
BitTorrent References
Section inspired by:
“Rarest First and Choke Algorithms are Enough”,
Arnaud Legout, G. Urvoy-Keller, P. Michiardi,
IMC 2006.
“The Bittorrent P2P File-sharing System:
Measurements and Analysis”, J.A Pouwelse, P.
Garbacki, D.H.J Epema, H.J. Sips, IPTPS 05,
February 2005.
“Incentives Build Robustness in BitTorrent”,
Bram Cohen, First Workshop on Economics of
Peer-to-peer Systems, June 2003.
27
BTW: File Distribution: Server-Client vs
P2P
Question : How much time to distribute file
from one server to N peers?
us: server upload
bandwidth
Server
us
u1
d1
u2
ui: peer i upload
bandwidth
d2
File, size F
di: peer i download
bandwidth
uN
Network (with
abundant bandwidth)
dN
28
2: Application Layer
BTW: File distribution time: server-client
server
sequentially
sends N copies:
NF/us time
client i takes F/di
time to download
Time to distribute F
to N clients using
client/server approach
Server
F
us
dN
u1 d1 u2
d2
Network (with
abundant bandwidth)
uN
= dcs = max { NF/us, F/min(di) }
i
increases linearly in N
(for large N)
2: Application Layer
29
BTW: File distribution time: P2P
server must send one
copy: F/us time
client i takes F/di time
to download
NF bits must be
downloaded
(aggregate)
Server
u1 d1 u2
F
us
uN
d2
Network (with
abundant bandwidth)
dN
fastest possible upload rate: us + Sui
dP2P = max { F/us, F/min(di) , NF/(us + Sui) }
i
30
2: Application Layer
Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
Minimum Distribution Time
3.5
P2P
Client-Server
3
2.5
2
1.5
1
0.5
0
0
5
10
15
20
25
30
35
N
31
2: Application Layer
All Peers Equal?
1.5Mbps DSL
1.5Mbps DSL
56kbps Modem
1.5Mbps DSL
10Mbps LAN
1.5Mbps DSL
56kbps Modem
56kbps Modem
32
P2P
KaZaA: History
In 2001, KaZaA created by Dutch company
Kazaa BV
Popular file sharing network with >10 million
users (number varies)
33
P2P
KaZaA: Overview
“Smart” Query Flooding:
Join: on startup, client contacts a “supernode” ... may at
some point become one itself
Publish: send list of files to supernode
Search: send query to supernode, supernodes flood
query amongst themselves.
Fetch: get the file directly from peer(s);
34
P2P
KaZaA: Network Design
“Super Nodes”
35
P2P
KaZaA: File Insert
insert(X,
123.2.21.23)
...
Publish
I have X!
123.2.21.23
36
P2P
KaZaA: File Search
search(A)
-->
123.2.22.50
123.2.22.50
Query
Replies
search(A)
-->
123.2.0.18
Where is file A?
123.2.0.18
37
P2P
KaZaA: Discussion
Pros:
Tries to take into account node heterogeneity:
Bandwidth
Host Computational Resources
Host Availability
Cons:
Mechanisms easy to circumvent
Still no real guarantees on search scope or search time
P2P architecture used by Skype
38
P2P
Skype Overlay
Protocol not fully understood today
Content and control messages are encrypted
Builds upon an unstructured overlay
Two tier hierarchy
Super Nodes (SN)
Ordinary Nodes (ON)
39
Skype Overlay (cont’d)
Super Nodes (SN)
Connect to each other, building a flat unstructured
overlay (similar to the Gnutella overlay)
Ordinary Nodes (ON)
Connect to Super Nodes
Skype login server
Only central component
Stores and verifies usernames and passwords
40
Skype Overlay (cont’d)
Skype
login
server
Message
exchange during
login for
authentication
SN
ON
Neighbor relationship
41
How is the overlay constructed? - Super
Node Lists
Each node keeps a host cache(HC) with a
list of Super Nodes IP-addresses and port
pairs
Up to 200 entries
Cache in the windows registry
42
How is the overlay constructed? Login
Contact login server and authenticate
Advertise your presence to other peers
Contact a Super Node
Contact your buddies (through Super Node),
and notify your presence
43
Login processcontact a
super node
44
Super Nodes – Index servers
Super Nodes are index servers
I.e. index of locally connected Skype users (and
their IP addresses)
If buddy is not found in local index of a
Super Node
Spread node search to neighboring Super
Nodes
Not clear how this is implemented
Possibly flood the request similar to Gnutella
45
Super Nodes – Relay nodes
Super nodes also act as relay nodes
Enables NAT traversals
Alice would like to call Bob (or inversely)
Alice
Bob
46
Super Nodes – Relay nodes
Alice would like to call Bob (or inversely)
Alice
Contact Call
Relay Node
Skype
relay node
Bob
47
Super Node election
When does an ordinary node becomes a
super node?
High bandwidth, Public IP address, but details
not clear
48
Skype References
Section inspired by:
“An Analysis of the Skype Peer-to-Peer Internet
Telephony Protocol”, S.A. Baset and H.G.
Schulzrinne, Infocom 2006, April 2006.
“An Experimental Study of the Skype Peer-toPeer VoIP System”, Saikat Guha, Neil Daswani,
Ravi Jain, IPTPS’06, February 2006.
“Characterizing and Detecting Skype-Relayed
Traffic”, K. Suh, D. R. Figueiredo, J. Kurose, D.
Towsley, Infocom 2006, April 2006.
49