Transcript BitTorrent

Overview
Centralized Database
 Napster
Swarming
 BitTorrent
Hierarchical Query Flooding
 KaZaA
 Skype (VOIP)
1
P2P
P2P: centralized directory
original “Napster” design
(1999, S. Fanning)
1) when peer connects, it
informs central server:
Bob
centralized
directory server
1
peers
 IP address
 content
2) Alice queries directory
server for “file A”
3) Alice requests file from
Bob
Problems?
1
3
1
2
r Single point of failure
r Performance bottleneck
1
Alice
r Copyright infringement
2
P2P
Napster: Publish
insert(X,
123.2.21.23)
...
Publish
I have X, Y, and Z!
123.2.21.23
3
P2P
Napster: Search
123.2.0.18
Fetch
Query
search(A)
-->
123.2.0.18
Reply
Where is file A?
4
P2P
Napster: Discussion
 Pros:
Simple
Search scope is O(1)
Controllable (pro or con?)
 Cons:
Server maintains O(N) State
Server does all processing
Single point of failure
5
P2P
BitTorrent: History
 In 2002, B. Cohen debuted BitTorrent
 Workshop on Economics of Peer-to-Peer Systems, 2003
 Key Motivation:
 Popularity exhibits temporal locality
 Now
 New file releases
 Linux ISO檔, 魔獸世界更新檔, etc.
 Network TV(PPstream) also borrows the idea from BT
 Focused on Efficient Fetching, not Searching
 Does not include a search mechanism, but rather, it relies on
central-directory based search facilities as provided by Web sites
such as suprnova.org
 Has some “real” publishers:
 Blizzard Entertainment using it to distribute games
6
Something you may know
種子
擁有檔案的人
File is partitioned into several pieces
.torrent與tracker
紀錄檔案片段全部名稱
下載成員名單與網路位置
斷種
7
BitTorrent-Characteristic
Fast download speed
Reduce the load of file publisher
Fairness
Choke/unchoke
8
Policies
Relay on .torrent file to know the file
pieces and members
Rarest First Policy
Choking Policy
Optimistic Unchoking
9
BitTorrent: Overview
Swarming:
Join: contact centralized “tracker” server, get a
list of peers.
Publish: Run a tracker server. Create .torrent
file.
Search: Out-of-band. E.g., use Google to find a
torrent for the file you want.
Fetch: Download pieces of the file from your
peers. Upload pieces you have to them.
10
P2P
Components
 Peers
 Seed (file owner)
 Leech (downloader)
 Data
 divided into 256KB pieces
 利用hash 檢查檔案正確性
 Metainfo File (.torrent file)
 Data資訊和各個pieces的hash value
 Tracker的url
 Etc…
 Tracker
 HTTP/HTTPS services
 維護torrent中各個peers的資訊
 幫助peers追蹤其他peers的途徑
 BT client
 BitComet
 mTorrent
 Etc…
11
File distribution: BitTorrent
P2P file distribution
tracker: tracks peers
participating in torrent
torrent: group of
peers exchanging
chunks of a file
obtain list
of peers
trading
chunks
peer
12
2: Application Layer
BitTorrent - Algorithms
Two components in BitTorrent downloading
algorithm:
Peer Selection – determines from whom to
download the piece?
Piece Selection – determines which piece
to download?
13
Tit for Tat
 "tip for tat", an agent using this strategy will respond
in kind to a previous opponent's action.
 If the opponent previously was cooperative, the
agent is cooperative. If not, the agent is not.
 This strategy is dependent on the following
conditions :
 1. Unless provoked, the agent will always cooperate
 2. If provoked, the agent will retaliate
 3. The agent is quick to forgive
14
BitTorrent - Peer selection
Choke Algorithm
Choking is a temporal refusal to upload
Each peer unchokes a fixed number of peers
(default = 4)
3 peers on tit-for-tat basis
1 peer on optimistic unchoke basis
15
BitTorrent - Peer selection (cont’d)
Tit-for-tat peer selection
Select the 3 peers from which you downloaded
most and that are interested in your chunks
Peer selection is done every 10 seconds, based
on the download rates are of the last 30
seconds.
16
BitTorrent - Peer selection (cont’d)
Optimistic unchoke peer selection
Select one peer at random that is interested in
your chunks, regardless of the current
download rate from it
Rotates every 30 seconds.
Reason:
To discover currently unused connections that
are better than the ones being used
17
BitTorrent: Tit-for-tat
(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers
With higher upload rate,
can find better trading
partners & get file faster!
2: Application Layer
18
BitTorrent: Sharing Strategy
“Tit-for-tat” sharing strategy
“I’ll share with you if you share with me”
Be optimistic: occasionally let freeloaders download
 Otherwise no one would ever start!
 Also allows you to discover better peers to download from
when they reciprocate
19
P2P
BitTorrent - Piece selection
 Random first piece
Only applies if leecher has downloaded less than 4
pieces (chunks)
Choose randomly the next piece to download
 Local rarest first policy
Determine the pieces that are most rare among your
peers and download those first
Ensures that the most common pieces are left till the
end to download
Rarest first also ensures that a large variety of pieces
are downloaded from the seed
20
Mathematic model(Markov Chain)
Four cities
A city : the men not join BT
B city : Leeches (downloaders) in BT
C city : Seeds (file owners) in BT
D city : the men finish the downloading and
leave BT
Parameters
A->B : λ
C->D : g
B->C :
How many peers in BT
Average bandwidth
21
We want to find in a static state
X’: How many peers live in B city (下載者人數)
Y’: How many peers live in C city (seeds人數)
T: Average moving speed from B to C (下載時間)
X’= l/(b(1+q/b))
Y’= l/(g(1+q/b))
Remind
g : 完成下載後離開的比例
q : 中途取消下載的比例
b
受分享頻寬、檔案分享效率與種子離開率影響的一個變22
數
We know X’ and Y’, by queuing theory
T=1/(q+b)
Properties
平均下載時間與人潮湧進的速度無關
檔案分享的效率越高,下載時間越短
種子離開系統的速率會影響下載時間
23
Another Property
多數人下載慢,少數人下載速度超快
下載速度仍比ftp快
上傳頻寬最大的族群會優先獲得檔案,當
這個族群的下載速度已經快到不能再快的
時候,多出來的上傳頻寬會分享給第二快
的族群,然後依此類推…。
只要每個族群的人數到達一定的數量,就
會達到人人有檔抓、速度都不慢的平衡點。
24
How many bandwidth can free rider
get?
 Assume only one free rider in the system
 Parameters
N(總人數)
μ(每人平均上傳頻寬)
n (當下有上傳關係的人數)
 N*m*(1/(N-n))*(1/(n+1))
N*m : 總上傳頻寬
1/(N-n) : free rider被選到的機率
1/(n+1) : optimistic unchoking分到的頻寬比例
 N>>n
μ/(n+1)
25
BitTorrent - Summary
 Efficient file download thanks to simple incentive
mechanisms
Local rarest first
High piece entropy
Tit-for-tat
Avoids free-riding
Optimizes resource utilization
26
BitTorrent References
Section inspired by:
 “Rarest First and Choke Algorithms are Enough”,
Arnaud Legout, G. Urvoy-Keller, P. Michiardi,
IMC 2006.
 “The Bittorrent P2P File-sharing System:
Measurements and Analysis”, J.A Pouwelse, P.
Garbacki, D.H.J Epema, H.J. Sips, IPTPS 05,
February 2005.
 “Incentives Build Robustness in BitTorrent”,
Bram Cohen, First Workshop on Economics of
Peer-to-peer Systems, June 2003.
27
BTW: File Distribution: Server-Client vs
P2P
Question : How much time to distribute file
from one server to N peers?
us: server upload
bandwidth
Server
us
u1
d1
u2
ui: peer i upload
bandwidth
d2
File, size F
di: peer i download
bandwidth
uN
Network (with
abundant bandwidth)
dN
28
2: Application Layer
BTW: File distribution time: server-client
server
sequentially
sends N copies:
NF/us time
client i takes F/di
time to download
Time to distribute F
to N clients using
client/server approach
Server
F
us
dN
u1 d1 u2
d2
Network (with
abundant bandwidth)
uN
= dcs = max { NF/us, F/min(di) }
i
increases linearly in N
(for large N)
2: Application Layer
29
BTW: File distribution time: P2P
 server must send one
copy: F/us time
 client i takes F/di time
to download
 NF bits must be
downloaded
(aggregate)
Server
u1 d1 u2
F
us
uN
d2
Network (with
abundant bandwidth)
dN
fastest possible upload rate: us + Sui
dP2P = max { F/us, F/min(di) , NF/(us + Sui) }
i
30
2: Application Layer
Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
Minimum Distribution Time
3.5
P2P
Client-Server
3
2.5
2
1.5
1
0.5
0
0
5
10
15
20
25
30
35
N
31
2: Application Layer
All Peers Equal?
1.5Mbps DSL
1.5Mbps DSL
56kbps Modem
1.5Mbps DSL
10Mbps LAN
1.5Mbps DSL
56kbps Modem
56kbps Modem
32
P2P
KaZaA: History
 In 2001, KaZaA created by Dutch company
Kazaa BV
 Popular file sharing network with >10 million
users (number varies)
33
P2P
KaZaA: Overview
 “Smart” Query Flooding:
Join: on startup, client contacts a “supernode” ... may at
some point become one itself
Publish: send list of files to supernode
Search: send query to supernode, supernodes flood
query amongst themselves.
Fetch: get the file directly from peer(s);
34
P2P
KaZaA: Network Design
“Super Nodes”
35
P2P
KaZaA: File Insert
insert(X,
123.2.21.23)
...
Publish
I have X!
123.2.21.23
36
P2P
KaZaA: File Search
search(A)
-->
123.2.22.50
123.2.22.50
Query
Replies
search(A)
-->
123.2.0.18
Where is file A?
123.2.0.18
37
P2P
KaZaA: Discussion
 Pros:
 Tries to take into account node heterogeneity:
 Bandwidth
 Host Computational Resources
 Host Availability
 Cons:
 Mechanisms easy to circumvent
 Still no real guarantees on search scope or search time
 P2P architecture used by Skype
38
P2P
Skype Overlay
 Protocol not fully understood today
Content and control messages are encrypted
 Builds upon an unstructured overlay
Two tier hierarchy
 Super Nodes (SN)
 Ordinary Nodes (ON)
39
Skype Overlay (cont’d)
 Super Nodes (SN)
Connect to each other, building a flat unstructured
overlay (similar to the Gnutella overlay)
 Ordinary Nodes (ON)
Connect to Super Nodes
 Skype login server
Only central component
Stores and verifies usernames and passwords
40
Skype Overlay (cont’d)
Skype
login
server
Message
exchange during
login for
authentication
SN
ON
Neighbor relationship
41
How is the overlay constructed? - Super
Node Lists
Each node keeps a host cache(HC) with a
list of Super Nodes IP-addresses and port
pairs
Up to 200 entries
Cache in the windows registry
42
How is the overlay constructed? Login
Contact login server and authenticate
Advertise your presence to other peers
Contact a Super Node
Contact your buddies (through Super Node),
and notify your presence
43
Login processcontact a
super node
44
Super Nodes – Index servers
Super Nodes are index servers
I.e. index of locally connected Skype users (and
their IP addresses)
If buddy is not found in local index of a
Super Node
Spread node search to neighboring Super
Nodes
Not clear how this is implemented
Possibly flood the request similar to Gnutella
45
Super Nodes – Relay nodes
Super nodes also act as relay nodes
Enables NAT traversals
Alice would like to call Bob (or inversely)
Alice
Bob
46
Super Nodes – Relay nodes
Alice would like to call Bob (or inversely)
Alice
Contact Call
Relay Node
Skype
relay node
Bob
47
Super Node election
When does an ordinary node becomes a
super node?
High bandwidth, Public IP address, but details
not clear
48
Skype References
Section inspired by:
 “An Analysis of the Skype Peer-to-Peer Internet
Telephony Protocol”, S.A. Baset and H.G.
Schulzrinne, Infocom 2006, April 2006.
 “An Experimental Study of the Skype Peer-toPeer VoIP System”, Saikat Guha, Neil Daswani,
Ravi Jain, IPTPS’06, February 2006.
 “Characterizing and Detecting Skype-Relayed
Traffic”, K. Suh, D. R. Figueiredo, J. Kurose, D.
Towsley, Infocom 2006, April 2006.
49