Transcript Tracker
NETE4631
Network Information Systems (NISs):
Peer-to-Peer (P2P)
Suronapee, PhD
[email protected]
1
P2P
Peer-to-peer:
“direct” connections between peers
Peers are all equal - both a sender and a receiver of a content
P2P core principle
Self-organizing
Large collection of resources
Millions of simultaneous users, voluntary participation
Scalability
2
no central management, peers are completely independent
scalability with respect to number of nodes
P2P principle
P2P is an overlay network (of internet)
a virtual network on top of the underlying IP network
Overlay graph
3
Overlay network
4
Peer-to-Peer (P2P) Systems
Old ideas
1979 - USENET news service (still in use)
Popular around 1999
Napster, Kazaa and Gnutella for sharing files, music..
‘01: Skype launched (Kazaa) ‘06’, ’10’: Acquired by eBay, Microsoft
‘01: BitTorrent launched – heavily used for file and music sharing Still
very popular today for sharing multimedia content
BitTorrent – 30% of internet traffic (mid 2000s)
Skype – 663M users (2010), 700 M minutes a day
Problem: Free Riders - only consume, not contribute
5
Current State of P2P
P2P networks going strong, all over the world
Currently P2P accounts for almost 70% of network traffic
P2P networks currently mostly used for illegal sharing of
copyrighted material
Music, videos, software, …
Content providers not so happy
6
Sue companies making P2P software (e.g., Napster), sue
software developers (Winny), sue users sharing material
P2P Application
P2P principle applicable to many kinds of systems
Content distribution
Most current P2P targeted at one application: File sharing
Users share files (e.g., music, video, software) and others download
Also often illegally shared (except BitTorrent)
Example
From Acadamic
Chord
Communication
7
BitTorrent, Napster, Gnutella, KaZaA
Skype
Napster
Napster launched in 1999 by Shawn Fanning
The term “P2P was coined by Napster.
In 2000:,25% of traffic out of Uni. of Wisconsin Madison, 60M users
Centralized real-time directory, distributed files, mostly MP3 music;
Based in USA; lawsuits put it out of business
RIAA sues Napster, asking $100K per download
Indirectly helping users to infringe copyright
Currently, paid service
8
Pay % to songwriters and music companies as copyright required
Napster protocol is open, people free to develop
Napster
Connect to Napster server
Upload list of music files
that you want to share
Server stores no files
Maintain a list of
Structure
<filename, ip_address, portnum>
9
Napster search
Send server keywords to search with
Server returns a list of hosts – <ip_address, portnum> tuples – to
client
Client pings each host in the list to find transfer rates
Client fetches file from best host
10
Napster Problem
Centralized server a source of congestion
Centralized server single point of failure
Napster.com declared to be responsible for users’
copyright violation “Indirect infringement”
Next system: Gnutella
11
Gnutella
Eliminate the servers
Client search and retrieve amongst themselves
Clients act as servers too, called servents
In 2000, release by AOL, 88K users by ’03’
12
How a peer join a network
To join the network,
peer needs the address of another peer that is
currently a member
New peer sends connect message to existing peer
13
GNUTELLA CONNECT
Reply is simply “OK”
Gnutella search
Gnutella routes different messages within the
overlay graph
Gnutella protocol has 5 main message types
Query (search)
14
QueryHit (response to query)
Ping (to probe network for other peers)
Pong (reply to ping, contains address of another
peer)
Push (used to initiate file transfer)
Gnutella Message Header Format
15
Flooding query message
Query message
16
How do search results come back?
17
Avoiding excessive traffic
To avoid duplicate transmissions, each peer maintains a list of
recently received messages
Query forwarded to all neighbors except peer from which
received
Each Query (identified by DescriptorID) forwarded only once
QueryHit routed back only to peer from which Query
received with same DescriptorID
Duplicates with same DescriptorID and Payload descriptor
(msg type) are dropped
QueryHit with DescriptorID for which Query not seen is
dropped
18
After receiving QueryHit messages
Requestor chooses “best” QueryHit responder
Initiates HTTP request directly to responder’s ip+port
Responder then replies with file packets after this
message:
19
Dealing with Firewalls
Requestor sends Push to responder asking for file
transfer
Responder establishes a TCP connection at
ip_address, port specified. Sends
Requestor then sends GET to responder (as
before) and file is transferred as explained earlier
20
PING-PONG
Peers initiate Ping’s periodically
Ping’s flooded out like Query’s, Pong’s routed along
reverse path like QueryHit’s
Pong replies used to update set of neighboring peers
21
To keep neighbor lists fresh in spite of peers joining, leaving and
failing
Problem
Flooding a query is extremely inefficient
Gnutella’s network management not efficient
Wastes lot of network and peer resources
Repeated searches with same keywords Solution:
Periodic PING/PONGs consume lot of resources
Ping/Pong constituted 50% traffic
Modem-connected hosts do not have enough bandwidth
for passing Gnutella traffic
Another solution:
22
FastTrack System
FastTrack
Hybrid between Gnutella and Napster
Takes advantage of “healthier” participants in the system
Underlying technology in Kazaa, KazaaLite, Grokster
Like Gnutella, but with some peers designated as
supernodes
23
FastTrack (2)
A supernode stores a directory listing a subset of nearby
(<filename,peer pointer>), similar to Napster servers
Supernode membership changes over time
Any peer can become (and stay) a supernode, provided it
has earned enough reputation
Kazaalite: participation level (=reputation) of a user between 0
and 1000, initially 10, then affected by length of periods of
connectivity and total number of uploads
More sophisticated Reputation schemes invented, especially
based on economics (See P2PEcon workshop)
A peer searches by contacting a nearby supernode
24
Strength
Combines good points from Napster and Gnutella
Efficient searching under each supernode
Flooding restricted to supernodes only
Result: Efficient searching with “low” resource usage
Most popular network
25
Lot of content, lot of users
Currently most file sharing networks adopted this architecture
BitTorrent
Developed by Bram Cohen in 2001
Written in Python, available on many platforms
BitTorrent is a new approach for sharing large files
distributed directories, distributed files
Each file divided as chunks
Each chunk contains 32 KB – 256 KB
Each chunks can traverse different paths
BitTorrent widely used also for legal content
26
For example, Linux distributions, software patches, Official movie
Currently lots of illegal content on BitTorrent too…
Topology of BitTorrent
Overlay graph
(1) physical
(2) neighboring peer
(3) peering relationship
A tracker
a server which tracks the currently active clients
serves as a centralized directory
Topology can be changed regularly
27
Tracker factors: content, distance, peer churn, randomization
BitTorrent: Players
Three entities needed to start distribution of a file
Terminology:
28
A “torrent” file: the metadata about the file
Seed: Client with a complete copy of the file
Leecher: Client still downloading the file
BitTorrent Start Up
New client gets torrent-file and gets peer list from
tracker
29
BitTorrent Operation
30
Summary of BitTorrent operation
A new peer A receives a.torrent file from one of the BitTorrent web servers,
including the name, size, and number of chunks of a particular file, together with
the IP address and port number of the corresponding tracker.
It then registers with the right tracker. It will also periodically send keep-alive
messages to the tracker.
The tracker sends to peer A a list of potential peers (peer set = 50 peers).
Peer A selects a subset (following the tit-for-tat and randomization rules) and
establishes connections with these five peers.
Peer A downloads chunks from peers in peer set and provides them with its own
chunks (possible to parallel)
Chunks typically 256 KB
Starting with the rarest chunks.
Every now and then, each peer updates its peer list.
31
Peering construction methods
Tracker suggests a set of 50 peers
Let new peer picks 5 peers (at
this time!) for exchanging chunks
Exchanging contents evenly
between them (Rarely chunk first)
Peer serves 4 peers in peer set simultaneously (tit-for-tat)
Seeks 4 best downloaders in last time slot if it’s a seed
Seeks 4 best uploaders in last time slot if it’s a leecher
The fifth peer selected at 50% randomly (randomization)
Choking: Limit number of neighbors to which concurrent
uploads or download <= a number
32
Strength
Tit-for-Tat
A peer serves peers that serve it
Rarely chunk first
Prefer early download of blocks that are least replicated
among neighbors
Encourages cooperation, discourage free-riding
Avoid the problem that most of peers have most of the chuck but all
must wait for the few rare chunks
Randomization
33
avoids unfairness of little upload capacity nodes
Weakness
File needs to be quite large
256 KB chunks
Rarest first needs large number of chunks
Everyone must contribute
34
Low-bandwidth clients have a disadvantage
How can BitTorrent be free?
It leverages peer uplink capacities to send chunks of files
to each other without deploying many media servers.
P2P is used for sharing content in BitTorrent.
Scalable?
35
Add many nodes as the network scale up without a bottleneck
P2P versus client-server architecture
Client-server architecture
Each client requests data from the server
Not help each other
P2P
36
Peer is both a sender and a receiver of a content
each peer helps each other in a distributed manner
Data transmission is distributed
Although control plane for signaling is centralized
Summary
Most existing P2P networks built on searching, however
Searching does not scale in same way
Either centralized system with all its problems
Distributed system with all its problems
Hybrid systems cannot guarantee discovery either
Alternatively, use addressing instead of searching
Distributed hash tables (DHTs) - efficient searching and object
location in P2P network
Example
37
Chord, CAN, Plaxton, Pastry, Tapestry
Reference
Kangasharju: Peer-to-Peer Networks
Brinton, Christopher; Chiang, Mung (2013-06-10).
Networks Life: 20 Questions and Answers