Transcript BitTorrent
BitTorrent
Introduction to BitTorrent
• BT: BitTorrent
• BT is not itself a network
– it allows small Internet networks to be created to share files
– Does not perform all the functions of a typical p2p system, like
searching
– Its virtual network is called data-oriented overlay
• Written by Bram Cohen in 2001
• Written in Python and it uses GTK for its GUI
• It is an efficient content distribution system using file swarming
– Each file split into smaller pieces
• equal-sized blocks (typically 32- 256 KB)
– Nodes request desired pieces from neighbors
• Encourages contribution by all nodes
– The throughput increases with the number of downloaders via
the efficient use of network bandwidth
About GTK
• GTK a cross-platform toolkit for creating graphical user interfaces
• GTK stands for GIMP ToolKit
– GIMP stands for GNU Image Manipulation Program
• It is an image retouching and editing tools
• It is open-source software
• It is developed by a self-organized group of volunteers under the banner of
the GNOME Project
• It is licensed under the terms of the GNU LGPL
– GNU Lesser General Public License
• It allows developers and companies to use and integrate LGPL
software into their own (even proprietary) software without being
required to release the source code of their own software-parts
• GNU is a recursive acronym meaning "GNU's not Unix"
Terminology
Seeder = a peer that provides the complete file
Initial seeder = a peer that provides the file that is torrented
Leecher
Initial seeder
One who is
downloading
Leecher
Seeder
Terminology
• Peer: a client to the network dedicated to a torrent
• Seeding: serving a file for download
• Leeching: downloading without serving a complete file
for download
• Leech: peer that’s downloading the file
– Fairer term might have been “downloader”
• Subpiece: Further subdivision of a piece
– The “unit for requests” is a subpiece
– But a peer uploads only after assembling complete piece
Leecher can become seeder
• As a leecher downloads pieces of the file, replicas of the
pieces are created
– More downloads mean more replicas available
• As soon as a leecher has a complete piece, it can potentially
share it with other downloaders
– Eventually each leecher becomes a seeder by
• obtaining all the pieces,
• assembles the file, and
• verifies the checksum
Swarm
• Swarm
– Set of peers all downloading the same file
– Organized as a random mesh
• BT is differentiated from the traditional file sharing as swarming
• File is divided into many small pieces for distribution
–
–
–
–
Each node knows list of pieces downloaded by neighbors
Node requests pieces it does not own from neighbors
Clients request different pieces from the seeder or from other clients
Clients become seeders for those pieces downloaded
• When all pieces are downloaded, clients can reconstruct the whole
file
• There exists no single BT network, but thousands of temporary networks
consisting of clients downloading the same file
Tracker
• The tracker is a central server keeping a list of all
peers participating in the swarm
– A peer joins a swarm by asking the tracker for a peer list and
connects to those peers
– The tracker gives the requester peers random selection of
peers
• Get Request consists of:
–
–
–
–
File ID
Peer ID
Peer IP
Peer Port
• Tracker responses with:
– List of peers, containing ID, IP and Port of each peer
• Peers may rerequest on nonscheduled times, if they
need more peers
How a node enters a swarm
for file “popeye.mp4”
• The file distributor
publishes details of the
.torrent file on (well-known)
web server
– File popeye.mp4.torrent
hosted at the webserver
• The .torrent has address of
tracker for file
• The tracker running on a webserver as well, keeps track of all peers
downloading file
• Tracker supplies peers with addresses of other peers that share the wanted
files
How a node enters a swarm
for file “popeye.mp4”
www.bittorrent.com
• URL of tracker
1
Peer
– .torrent file refers to the
tracker
• which steers the
download process
– This method makes it
very clear who is
responsible for the
legitimacy of the content:
• the operator of the
tracker server
How a node enters a swarm
for file “popeye.mp4”
www.bittorrent.com
Peer
2
Tracker
•
• File popeye.mp4.torrent
hosted at a (well-known)
webserver
• The .torrent has address
of tracker for file
• The tracker, which runs
on a webserver as well,
keeps track of all peers
downloading file
To download the file, peers access the tracker and join the torrent
• torrent is a group of peers connected to the same tracker
• The torrent is downloaded and the peer registers with the tracker, which provides a
list of available peers and seeds
How a node enters a swarm
for file “popeye.mp4”
www.bittorrent.com
• Piece length – Usually
256 KB
• SHA-1 hashes of each
piece in file
Peer
– For reliability
3
Swarm
Tracker
BT Client Software
• The BT client enables a host of features including multiple
parallel downloads
• The BT client also intermediates peering between itself,
source file servers (trackers) and other clients
– Thereby yielding great distribution efficiencies
• The BT client also enables users to create and share
torrent files
• When a peer has finished downloading a file, it may
become a seed by staying online for a while and sharing
the file for free
– i.e., without bartering
Requirements from the Web Server
and the Tracker
• The requirements from the Web hosting end are not too much
• To transmit a torrent you only need a standard HTTP Web server
and a free program called a tracker
• The Web server should be configured to use MIME type
application/x-bittorrent for any file with the ".torrent" extension
• The tracker's job is:
– to keep track of which clients can serve which files to other
clients
• The tracker
– can be installed
• either on individual Web servers
• or operated centrally by the Web host
– Its traffic load is relatively light, and
– offering a tracker to your hosting customers can make using BT
to distribute content a much simpler process for your customers
Overall Architecture
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Downloader
Peer
“US”
[Leech]
[Seed]
Overall Architecture
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Downloader
Peer
“US”
[Leech]
[Seed]
Overall Architecture
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Downloader
Peer
“US”
[Leech]
[Seed]
Overall Architecture
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Downloader
Peer
“US”
[Leech]
[Seed]
Overall Architecture
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Downloader
Peer
“US”
[Leech]
[Seed]
Overall Architecture
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Downloader
Peer
“US”
[Leech]
[Seed]
Overall Architecture
Tracker
Web Server
C
A
Peer
Peer
[Leech]
B
Downloader
Peer
“US”
[Leech]
[Seed]
Simple example
{1,2,3,4,5,6,7,8,9,10}
Seeder:
A
{}{1,2,3}
{1,2,3,5}
{}
{1,2,3}
{1,2,3,4}
{1,2,3,4,5}
Downloader
Downloader
B
C
Pipelining
• When transferring data over TCP, always have several
requests pending at once (typically 5), to avoid a delay
between pieces being sent
• Every time a piece or a sub-piece arrives, a new request is
sent out
Piece Selection
• The order in which pieces are selected by different peers
is critical for good performance
• If an inefficient policy is used, then peers may end up in a
situation where each has all identical set of easily
available pieces, and none of the missing ones.
• If the original seed is prematurely taken down, then the
file cannot be completely downloaded
Piece Selection
Small overlap is good
Large overlap is bad
-- wastes bandwidth
Piece selection
• Strict Priority
• Rarest First
– General rule
• Random First Piece
– Special case, at the beginning
• Endgame Mode
– Special case
Random First Piece
• Initially, a peer has nothing to trade
• Important to get a complete piece ASAP
• So as to assemble first complete piece quickly
• Then participate in uploads
• Select a random piece of the file and download it
• When first complete piece assembled, switch to rarestfirst
Rarest Piece First
• Determine the pieces that are most rare among your peers,
and download those first
– This ensures that the most commonly available pieces are left
till the end to download
– Increases diversity in the pieces downloaded
– avoids case where a node and each of its peers have exactly
the same pieces; increases throughput
– Increases likelihood all pieces still available even if original
seed leaves before any one node has downloaded entire file
Endgame Mode
• Near the end, missing pieces are requested from every peer
containing them
• This ensures that a download is not prevented from
completion due to a single peer with a slow transfer rate
– This can potentially delay the finishing of a download
• When all the sub-pieces a peer lacks are requested
– this request is flooded to all peers
– This helps to get the last chunk of the file as quickly as
possible
– To speed up completion of download
• Once a sub-piece arrives, it sends a cancel-message
indicating that the peer has obtained it
• Some bandwidth is of course wasted by this flooding
– but not much because of the short period of the endgame
mode
Difficulties with BitTorrent
• Works best with files that are widely copied on the
network
• In practice, files appear and disappear
– There is no permanent archive, or incentive for users to
keep old files
• Since blocks are not downloaded sequentially, a partial
file is not useful
Reason of the success
• The success of Bittorent is unlikely to be due purely to
the use of the tit-for-tat inspired protocol
• The real driving force behind the high cooperation might
be the by-product of the lack of meta-data search within
BitTorrent
– This results in the creation of a number of disconnected “tribes”
at both the swarm and the tracker level
• The users are active in the tribal dynamics by selecting
those tribes that best satisfy their needs
– hence tribes filled with free-riders will tend not to operate
An example
• The BitTorrent client BitTornado
displays information about the peers and
seeders engaged in sharing and
distributing a torrent
Additional features
• BitTorrent uses 6881 as the default port
– if that port is unreachable BitTorrent tries to connect to a number
of successive ports up to 6889
– If the client cannot connect to port 6889, it gives up
• BitTorrent supports resuming, it resumes where it left off
after checking the partial download
• How do the user know the download is not corrupted?
– BitTorrent does cryptographic hashing (SHA1) of all data
– When seeing "Download succeeded!" the user can be sure that
BitTorrent has already verified the integrity of the data
– The integrity and authenticity of a BitTorrent download is as
good as the original request to the tracker
– Checking the MD5/CRC32/other hash of a file downloaded via
BitTorrent is redundant
An application of the BitTorrent
Technology
• Cranberry Publishing uses Bittorrent as one of the
means of distribution for its free Home Computer
Magazine
– As the magazine is free to its readers, it is available simultaneously as a
free download from the magazine website and as a torrent file for
Bittorrent users
• However, the potential load on their server is enormous,
so they wanted a way to ensure that it could be delivered
• Bittorrent offered the means to make sure the users can
get the files they want distributed faster
• The best thing about it is that the more people that
download it, the faster it gets for everyone, not slower
• Torrent users can just grab the torrent and download
Why is (studying) BitTorrent
important?
(From CacheLogic, 2004)
Legal Issues
• You should know about some inherent dangers to using BitTorrent
to download movies and TV shows
• In the USA, organizations like the Recording Industry Association of
America (RIAA) and the Motion Picture Association of America
(MPAA) actively prosecute people and companies that are engaged
in making copyrighted content available to others illegally
• Internationally, the laws are even more complex
• You are not anonymous when you use BitTorrent, because the
process itself involves sharing of identifying information about your
computer
• This lack of anonymity puts you at risk if you use BitTorrent or other
file-sharing technologies to download or share music, movies, TV
shows, and other content