Transcript BitTorrent

Bittorrent: The protocol, its
background and uses
1. BitTorrent Background
a) What is BitTorrent?
b) Who’s the author, history
2. The Protocol
a)
b)
c)
d)
Terminology
Distributed Scenario
Structure of .torrent files
Protocol between peers and trackers
3. BitTorrent Applications
a) Bittorent Inc, Usages throughout industry
BitTorrent
“You get so tired of having your work die,”
he says. “I just wanted to make something
that people would actually use.”
• The above quote if from Bram Cohen, BitTorrent’s
author, in an interview with Wired in 2005.
What is BitTorrent?
From 10,000 feet
Efficient content distribution system using
file swarming. Does not perform
all the functions of a typical p2p system,
like searching.
http://www.cs.uiowa.edu/~ghosh/bittorrent.ppt
What is BitTorrent?
• BitTorrent introduced two novel concepts
• Rather than providing a search protocol itself, it
was designed to integrate seamlessly with
the Web and made files (torrents) available via
Web pages, which could be searched for using
standard Web search tools.
• It enabled so-called file swarming; that is, once
a peer starts downloading that file, it also
makes whatever portion of the file that is
downloaded immediately available for sharing.
What is BitTorrent
• The file-swarming process is enabled through
the use of a tracker:
• an HTTP-based server used to dynamically
synchronise and update the peers as they are
downloading - tracks availability of pieces of the file
on the network.
• The tracker also can monitor users’ usage on the
network – how much do they contribute?
• Then implements a tit-for-tat scheme, which
divides bandwidth according to how much a peer
contributes to the other peers in the network – if you
do not share, you cannot consume.
BitTorrent Bram Cohen
• Born 1975 - computer programmer
• Engineered large parts of Mojo
Nation (mojonation.net) - parts of it
similar in flavour to Bittorrent (Pre
April 2001).
• April 2001, Focused on authoring the
peer-to-peer (P2P) BitTorrent
protocol and writing the first file
sharing program to use the protocol,
also known as BitTorrent.
• He is also the organizer of the San
Francisco Bay Area P2P-hackers
meeting, and the co-author of
Codeville.
Currently lives in the
San Francisco Bay
Area
Start of BitTorrent - CodeCon
• Cohen unveiled his novel ideas at the first
CodeCon conference in 2002
• CodeCon is a conference for hackers and
technology enthusiasts.
• Co-organised by Bram and his roommate Len
Sassaman.
• CodeCon intended to be a low cost conference
(I.e. <$100) with a focus on developers doing
presentations of working code, rather than on
companies with products to sell.
• It remains an event for those seeking information
about new directions in software, though
BitTorrent continues to lay claim to the title of
"most famous presentation".
Features?
• Peer-to-peer in nature
Taxonomy for Distributed Systems
Taxonomy is based on following factors and their relation to centralization:
1. Resource Discovery: Mechanism for discovering resources on a distributed
system?
• Examples: DNS, Napster Lookup, Jini LUS, UDDI,
Gnutella broadcast etc
2. Resource Availability: Scalability – do resources scale with network?
- does access to them scale with network?
3. Resource Communication: Two types:
Brokered Communication (centralized): communication is passed
through a central server - resources do not have direct references
to each other.
Point to point (decentralized -peer to peer): a direct connection
between the sender and the receiver.
Centralization of
Point-to-Point Connections
True Peer to Peer e.g. Gnutella
Web
Server
Equal Peers, balanced (equal) load
on communication
Many to one relationship
between users and the web
server and therefore this can
be considered centralized
communication
BitTorrent
pieces
pieces
pieces
Features?
•
•
•
•
•
Peer-to-peer in nature
Central server called a tracker
Tracker uses HTTP
Download and upload at the same time
Efficiency improves the more a file is
downloaded
Downloading Speeds
Download speeds depend on two factors:
• BitTorrent keeps track of how much you
contribute to hosting files for the group.
• The more you share, the faster your downloads.
• The more people trading a file, the more
options for obtaining its pieces.
• So, unlike the old Napster, popularity doesn't bog
down the process -- it gives it a shot of adrenaline
• Trackers also more dynamic than Napster servers
- provide updates
File Swarming
• File swarming allows users to download files
to the maximum of their Download capability
of their broadband connection
• Enables simultaneous downloads of pieces
of the same file from multiple users.
• Significant because broadband has a far
lower Upload bandwidth than Download
• upload bandwidth can be ten times slower than
download
• You can connect to, say, ten peers, will balance
this mismatch and enable full download capacity
BitTorrent Protocol
• The BitTorrent protocol is an open
specification
• Can be found in full on the BitTorrent
Web site
• Is updated periodically in order to keep
various BitTorrent applications
compatible.
Terminology 1
• Torrent - metadata file containing the
information about a file to be shared on the
BitTorrent network
• Peer - a participant in the network
• Seed - the peer that has a complete copy of
the file (who probably created the torrent)
• Swarm - peers that are connected
(interested) in a particular file
• Tracker - server responsible for keeping
track of the people in a swarm
Terminology 2
• Choked - state of a connection when a peer does not wish
to upload information at this time (perhaps because s/he
already has too many connections)
• Interested - a client is “interested” if they are interested in
downloading a file from another BT node.
• Piece - piece of a file in Bittorrent - typically a power of 2,
depends on file size - common sizes are 256K, 512K or
1MB.
• Bencoding - terse format for BitTorrent messages
BitTorrent
A BitTorrent application generally has the following
components:
•
•
•
•
•
•
An 'original' downloader - seed
An ordinary web server
The end user web browsers - they click on a:
A static 'metainfo' file (a .torrent file)
Start the end user downloading apps (BitTorrent)
A BitTorrent tracker
• There are ideally many end users for a single file.
Lectures as .Torrent
Seed
- Ian T.
1. Ian creates IansLectures.torrent,
(metadata) and uploaders it to Web site
Web Server
Web Sites contain
.torrent files
IansLectures.torrent
2. User clicks
IansLectures.torrent, which
launches the BitTorrent Client
User Web
Browser
BitTorrent
Client
(enthusiastic student)
Other BitTorrent
Client
(enthusiastic student)
Because of MIME
mapping from .torrent to
BitTorrent application
4. BitTorrent client contacts
specified tracker and finds
“interested” clients
Tracker
Other BitTorrent 3. Clients show interest in
IansLectures.torrent
Client
(enthusiastic student) 5. Clients connect to each other
and seed to download pieces
BitTorrent Messages - Bencoding
• Bencoding is a way to specify and organize data in
a terse format. It supports the following 4 types:
• Strings are encoded as follows: <string length>:<string
data> e.g. 4:spam represents the string "spam"
• Integers are encoded as follows: i<integer>e e.g. i3e
represents the integer "3”
• Lists are encoded as follows: l<bencoded values>e e.g. l4:spam4:eggse represents the list of two strings: [
"spam", "eggs" ]
• Dictionaries are encoded as follows: d<bencoded
string><bencoded element>e - note keys must be
bencoded strings. E.g. d4:spaml1:a1:bee represents the
dictionary { "spam" => [ "a", "b" ] }
.torrent Files
The content of a ".torrent" is a bencoded dictionary,
containing:
• announce: The URL of the tracker (string) - later
versions have lists of trackers.
• info: a dictionary that describes the file(s) of the torrent contains the following:
• Name - name for the file
• Piece length: number of bytes in each piece (integer)
• Pieces: string consisting of the concatenation of all
20-byte SHA1 hash values, one per piece (byte
string)
• Format changes if there’s one file (as above) or many,
where there are files occurrences of the above
information (piece length and pieces) and path is used
to replace name for uniqueness.
BitTorrent - Trackers
Centralised: All
clients go to
one server
The BitTorrent
Solution:
customers help
distribute content
Their contribution grows at the same rate as their demand, creating
limitless scalability for a fixed cost.
Tracker maintains the process
Tracker Scenario
Step 1 - Pieces 1, 2 and 3
Step 2 - Pieces 4, 5 and
6
Seed
Tracker
Update !
BT 1
BT 3
Step 2 - Piece 1
Step 2 - Piece 3
BT 2
Step 1
Step 2
Step 2 - Piece 2
Tracker GET Request
Peer -> Tracker
• Info_hash - 20 byte SHA1 hash of the bencoded form of
the info value from the metainfo file.
• Peer_id - string of length 20 containing ID of downloader
- generated at random at the start of a new download.
• IP - IP (or dns name) of peer.
• Port - port number for the peer - tries port 6881 and if
that port is taken try 6882, then 6883, etc. and give up
after 6889.
• Uploaded - total amount uploaded so far.
• Downloaded - The total amount downloaded so far.
• Left - number of bytes this peer still has to download
• Event - optional key which maps to started, completed,
or stopped (or empty, which is the same as not being
present).
Tracker Response
• Tracker -> peer
• Tracker responses are bencoded dictionaries.
• If a tracker response has a key failure reason,
then that maps to a human readable string which
explains why the query failed, and no other keys
are required.
• Otherwise, it must have two keys:
• Interval which maps to the number of seconds the
downloader should wait between regular rerequests
• Peers maps to a list of dictionaries corresponding to
peers, each of which contains the keys peer id, ip, and
port, which map to the peer's self-selected ID, IP
address or dns name as a string, and port number,
respectively.
Scenario
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Downloader
Peer
“US”
[Leech]
[Seed]
Scenario
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Downloader
Peer
“US”
[Leech]
[Seed]
Scenario
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Downloader
Peer
“US”
[Leech]
[Seed]
Scenario
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Downloader
Peer
“US”
[Leech]
[Seed]
Scenario
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Downloader
Peer
“US”
[Leech]
[Seed]
Scenario
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Downloader
Peer
“US”
[Leech]
[Seed]
Scenario
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Downloader
Peer
“US”
[Leech]
[Seed]
Strengths
• Better bandwidth utilization
• Never before speeds.
• Up to 7 MB/s from the Internet.
• Limit free riding – tit-for-tat
• Limit leech attack – coupling upload &
download
• Spurious files not propagated
• Ability to resume a download
• Open Source implementations !
Potential Drawbacks
• Small files – latency, overhead
• Scalability
• Millions of peers – Tracker behavior (uses 1/1000
of bandwidth)
• Single point of failure - although there can be
many trackers, there is only one tracker assigned
to each torrent file
• Difficult to load balance
• Solved later by having lists of alternative trackers
• Robustness
• System progress dependent on altruistic nature of
seeds (and peers)
• Malicious attacks and leeches.
Who Uses it?
• 160 million clients, 100 million active users
• According to their website, the company has
announced partnerships with some 55
companies, including:
Bittorrent: summary
1. BitTorrent
a)
b)
c)
d)
e)
Underlying file sharing protocol
Role of the .torrent
Use and role of the tracker
Bittorrent Scenario
How file swarming works