Transcript Tracker

End-to-end Publishing Using
Bittorrent
Bittorrent
•Bittorrent is a widely used peer-topeer network used to distribute files,
especially large ones
•It has a number of legal uses which
separate it from other P2P
Practical Applications
Distributing large files
Podcasting
Vlogging
Disk images
Legal distribution of movies (see
bittorrent.com)

Traditional vs. Bittorrent
•One server
provides
many clients
•Many clients
provide many
clients
Terminology
Swarm – clients downloading or uploading a
given file through BIttorrent
Tracker – centralized server that clients connect
to to ask for lists of other clients connected to
the swarm
Seed – A client that has a complete copy of the
file
Peer (Leecher) – A client that does not have a
complete copy of the file
Problem
Torrents that are less popular may
eventually “die” when there are no
longer any complete copies of the
file in the swarm
Everseed


Permanent seed
running on the
same server as the
tracker
Guarantees that
there will always
be a complete
copy of the file
Related Research
The creator of Bittorrent wrote a paper on the
process of downloading a file using Bittorrent at
http://www.bittorrent.org/protocol.html
Maintainers of various Bittorrent clients wrote
http://wiki.theory.org/BitTorrentSpecification, which is
like the official specification except far more in depth
Osprey (http://osprey.ibiblio.org/) seems to have
thought of something similar, but haven't made any
visible progress

Explanation
•The .torrent metadata file provides info
about where to find the tracker and about
the file being distributed
•Client connects to tracker
•Tracker gives client a list of other clients
•Client then downloads file from other
clients (not a centralized server)
•Periodic update with tracker for new client
list
Goals
•Complete internet publishing solution
using Bittorrent
•Metadata file generator (.torrent)
•Tracker
•“Everseed”
•Web interface
.torrent File
•Official documention on bittorrent.org
•Metadata on the file to be
downloaded (tracker URL, filename,
size, checksum hashes)
•Stored as “bencoded” strings,
integers, lists, dictionaries
Bencoding
•Integer: 6 => “i6e”
•String: “hello” => “5:hello”
•List: [“hello”,”world”] =>
“l5:hello5:worlde”
•Dictionary: {“hello”:”world”} =>
“d5:hello5:worlde”
Bencoding implementation
•Python, good string manipulation
•Structure of a .torrent file is a
dictionary containing string keys and
integer, string, list, and dictionary
values
•Recursion to encode/decode
Tracker
•Makes use of the bencoding algorithm
•Handles two types of requests:
“announce” and “scrape”
•Stores data on peers and torrents in a
SQLite database
•No performance issues
Network performance
Peer List Size vs Time (seconds)
0.1
0.09
Peer List Size
0.08
0.07
Peer List
Size
0.06
0.05
0.04
Column B
0.03
0.02
0.01
0
0
2000
Time (seconds)
Database performance
Number of Peers Inserted vs. Time (seconds)
Number of Peers Inserted
0.5
0.4
0.3
Column B
0.2
0.1
0
0
2000
Time (seconds)
Announce requests
Used by a client to announce presence in a
Bittorrent swarm
Client sends an HTTP GET request to the
announce URL in the .torrent file
Tracker parses request, urldecodes data about the
peer
Tracker stores data in the database, sends
appropriate response as bencoded string in a
text/plain document
Client bdecodes string, connects to other clients

Scrape requests
•Used by client to obtain info about the torrents
the tracker is tracking
•Client sends an HTTP GET request to the
scrape url found by transforming the announce
url
•Tracker urldecodes and parses the request
•Tracker fetches data about torrent from the
database
•Tracker returns a bencoded dictionary which
the client decodes
Smart Peer List Response
•Seeds often disconnect from other seeds
•Tracker can also do this to some extent
•Announce requests contain a list of
random peers
•If a client is seeding, it doesn't need IPs
of other seeds
•Increased overall swarm performance
Peer List Compression
•Peer list in the tracker response to peer lists is
normally ASCII encoded
•The peer list can be compressed to 4 bytes for
the IP address, 2 bytes for the port
•Huge bandwidth savings, ~80%
•Greatly enhanced tracker performance
•Reduced tracker hardware requirements
Test Client
•Concurrent development of a test
Bittorrent client written in Python
•Can send both announce and scrape
requests
•Key-value pairs are easily configurable
Testing
•Generalized method of handling
exceptions in the initialization methods
•Increased use of try/except statements to
improve robustness
•Testing with incorrect or missing data
Summary
•Python
•Benefits of P2P technology
•“Everseed” concept
•.torrent files and bencoding
•Tracker