Transcript Tracker

NETE4631
Network Information Systems (NISs):
Peer-to-Peer (P2P)
Suronapee, PhD
[email protected]
1
P2P
Peer-to-peer:


“direct” connections between peers

Peers are all equal - both a sender and a receiver of a content
P2P core principle


Self-organizing


Large collection of resources


Millions of simultaneous users, voluntary participation
Scalability

2
no central management, peers are completely independent
scalability with respect to number of nodes
P2P principle
P2P is an overlay network (of internet)


a virtual network on top of the underlying IP network
Overlay graph
3
Overlay network
4
Peer-to-Peer (P2P) Systems
Old ideas


1979 - USENET news service (still in use)
Popular around 1999






Napster, Kazaa and Gnutella for sharing files, music..
‘01: Skype launched (Kazaa) ‘06’, ’10’: Acquired by eBay, Microsoft
‘01: BitTorrent launched – heavily used for file and music sharing Still
very popular today for sharing multimedia content
BitTorrent – 30% of internet traffic (mid 2000s)
Skype – 663M users (2010), 700 M minutes a day
Problem: Free Riders - only consume, not contribute

5
Current State of P2P
P2P networks going strong, all over the world


Currently P2P accounts for almost 70% of network traffic

P2P networks currently mostly used for illegal sharing of
copyrighted material
 Music, videos, software, …

Content providers not so happy

6
Sue companies making P2P software (e.g., Napster), sue
software developers (Winny), sue users sharing material
P2P Application
P2P principle applicable to many kinds of systems


Content distribution

Most current P2P targeted at one application: File sharing



Users share files (e.g., music, video, software) and others download
Also often illegally shared (except BitTorrent)
Example


From Acadamic


Chord
Communication

7
BitTorrent, Napster, Gnutella, KaZaA
Skype
Napster
Napster launched in 1999 by Shawn Fanning




The term “P2P was coined by Napster.
In 2000:,25% of traffic out of Uni. of Wisconsin Madison, 60M users
Centralized real-time directory, distributed files, mostly MP3 music;
Based in USA; lawsuits put it out of business



RIAA sues Napster, asking $100K per download
Indirectly helping users to infringe copyright
Currently, paid service



8
Pay % to songwriters and music companies as copyright required
Napster protocol is open, people free to develop
Napster

Connect to Napster server

Upload list of music files
that you want to share

Server stores no files

Maintain a list of
Structure
<filename, ip_address, portnum>
9
Napster search




Send server keywords to search with
Server returns a list of hosts – <ip_address, portnum> tuples – to
client
Client pings each host in the list to find transfer rates
Client fetches file from best host
10
Napster Problem

Centralized server a source of congestion

Centralized server single point of failure

Napster.com declared to be responsible for users’
copyright violation “Indirect infringement”

Next system: Gnutella
11
Gnutella




Eliminate the servers
Client search and retrieve amongst themselves
Clients act as servers too, called servents
In 2000, release by AOL, 88K users by ’03’
12
How a peer join a network

To join the network,


peer needs the address of another peer that is
currently a member
New peer sends connect message to existing peer


13
GNUTELLA CONNECT
Reply is simply “OK”
Gnutella search


Gnutella routes different messages within the
overlay graph
Gnutella protocol has 5 main message types
Query (search)




14
QueryHit (response to query)
Ping (to probe network for other peers)
Pong (reply to ping, contains address of another
peer)
Push (used to initiate file transfer)
Gnutella Message Header Format
15
Flooding query message

Query message
16
How do search results come back?
17
Avoiding excessive traffic






To avoid duplicate transmissions, each peer maintains a list of
recently received messages
Query forwarded to all neighbors except peer from which
received
Each Query (identified by DescriptorID) forwarded only once
QueryHit routed back only to peer from which Query
received with same DescriptorID
Duplicates with same DescriptorID and Payload descriptor
(msg type) are dropped
QueryHit with DescriptorID for which Query not seen is
dropped
18
After receiving QueryHit messages

Requestor chooses “best” QueryHit responder


Initiates HTTP request directly to responder’s ip+port
Responder then replies with file packets after this
message:
19
Dealing with Firewalls

Requestor sends Push to responder asking for file
transfer

Responder establishes a TCP connection at
ip_address, port specified. Sends
Requestor then sends GET to responder (as
before) and file is transferred as explained earlier

20
PING-PONG



Peers initiate Ping’s periodically
Ping’s flooded out like Query’s, Pong’s routed along
reverse path like QueryHit’s
Pong replies used to update set of neighboring peers

21
To keep neighbor lists fresh in spite of peers joining, leaving and
failing
Problem

Flooding a query is extremely inefficient



Gnutella’s network management not efficient




Wastes lot of network and peer resources
Repeated searches with same keywords Solution:
Periodic PING/PONGs consume lot of resources
Ping/Pong constituted 50% traffic
Modem-connected hosts do not have enough bandwidth
for passing Gnutella traffic
Another solution:

22
FastTrack System
FastTrack




Hybrid between Gnutella and Napster
Takes advantage of “healthier” participants in the system
Underlying technology in Kazaa, KazaaLite, Grokster
Like Gnutella, but with some peers designated as
supernodes
23
FastTrack (2)



A supernode stores a directory listing a subset of nearby
(<filename,peer pointer>), similar to Napster servers
Supernode membership changes over time
Any peer can become (and stay) a supernode, provided it
has earned enough reputation



Kazaalite: participation level (=reputation) of a user between 0
and 1000, initially 10, then affected by length of periods of
connectivity and total number of uploads
More sophisticated Reputation schemes invented, especially
based on economics (See P2PEcon workshop)
A peer searches by contacting a nearby supernode
24
Strength

Combines good points from Napster and Gnutella




Efficient searching under each supernode
Flooding restricted to supernodes only
Result: Efficient searching with “low” resource usage
Most popular network


25
Lot of content, lot of users
Currently most file sharing networks adopted this architecture
BitTorrent

Developed by Bram Cohen in 2001


Written in Python, available on many platforms
BitTorrent is a new approach for sharing large files


distributed directories, distributed files
Each file divided as chunks



Each chunk contains 32 KB – 256 KB
Each chunks can traverse different paths
BitTorrent widely used also for legal content


26
For example, Linux distributions, software patches, Official movie
Currently lots of illegal content on BitTorrent too…
Topology of BitTorrent

Overlay graph




(1) physical
(2) neighboring peer
(3) peering relationship
A tracker



a server which tracks the currently active clients
serves as a centralized directory
Topology can be changed regularly

27
Tracker factors: content, distance, peer churn, randomization
BitTorrent: Players

Three entities needed to start distribution of a file

Terminology:



28
A “torrent” file: the metadata about the file
Seed: Client with a complete copy of the file
Leecher: Client still downloading the file
BitTorrent Start Up

New client gets torrent-file and gets peer list from
tracker
29
BitTorrent Operation
30
Summary of BitTorrent operation





A new peer A receives a.torrent file from one of the BitTorrent web servers,
including the name, size, and number of chunks of a particular file, together with
the IP address and port number of the corresponding tracker.
It then registers with the right tracker. It will also periodically send keep-alive
messages to the tracker.
The tracker sends to peer A a list of potential peers (peer set = 50 peers).
Peer A selects a subset (following the tit-for-tat and randomization rules) and
establishes connections with these five peers.
Peer A downloads chunks from peers in peer set and provides them with its own
chunks (possible to parallel)



Chunks typically 256 KB
Starting with the rarest chunks.
Every now and then, each peer updates its peer list.
31
Peering construction methods



Tracker suggests a set of 50 peers
Let new peer picks 5 peers (at
this time!) for exchanging chunks
Exchanging contents evenly
between them (Rarely chunk first)

Peer serves 4 peers in peer set simultaneously (tit-for-tat)




Seeks 4 best downloaders in last time slot if it’s a seed
Seeks 4 best uploaders in last time slot if it’s a leecher
The fifth peer selected at 50% randomly (randomization)
Choking: Limit number of neighbors to which concurrent
uploads or download <= a number
32
Strength

Tit-for-Tat

A peer serves peers that serve it


Rarely chunk first

Prefer early download of blocks that are least replicated
among neighbors


Encourages cooperation, discourage free-riding
Avoid the problem that most of peers have most of the chuck but all
must wait for the few rare chunks
Randomization

33
avoids unfairness of little upload capacity nodes
Weakness

File needs to be quite large



256 KB chunks
Rarest first needs large number of chunks
Everyone must contribute

34
Low-bandwidth clients have a disadvantage
How can BitTorrent be free?

It leverages peer uplink capacities to send chunks of files
to each other without deploying many media servers.
P2P is used for sharing content in BitTorrent.

Scalable?


35
Add many nodes as the network scale up without a bottleneck
P2P versus client-server architecture

Client-server architecture



Each client requests data from the server
Not help each other
P2P


36
Peer is both a sender and a receiver of a content
each peer helps each other in a distributed manner

Data transmission is distributed

Although control plane for signaling is centralized
Summary

Most existing P2P networks built on searching, however
Searching does not scale in same way




Either centralized system with all its problems
Distributed system with all its problems
Hybrid systems cannot guarantee discovery either
Alternatively, use addressing instead of searching


Distributed hash tables (DHTs) - efficient searching and object
location in P2P network
Example

37
Chord, CAN, Plaxton, Pastry, Tapestry
Reference


Kangasharju: Peer-to-Peer Networks
Brinton, Christopher; Chiang, Mung (2013-06-10).
Networks Life: 20 Questions and Answers