Transcript Lecture 10

INTERNET TECHNOLOGIES
Week 10 Peer to Peer Paradigm
1
Introduction
• Client-servers will be discussed next week
Server
Simultaneous
Server/Clients
Clients
• Peer to Peer this week.
2
Introduction
• First instance of peer-to-peer file sharing dates back
to December 1987
• Wayne Bell created WWIVnet
• Still exists:
• http://bbs.filenet.wwiv.net
• Other systems now exist.
3
P2P Networks
• Internet users that are ready to share their resources
become peers and form a network
• When a peer in the network has a file to share, it
makes it available to the rest of the peers
• An interested peer can connect itself to the
computer where the file is stored and download it.
4
Centralised Network
• Hybrid P2P Network
• Directory system (listing peers and what they offer)
located on a central server (client-server paradigm)
• Storage and downloading occurs via P2P paradigm
• Peer queries central server
• Server sends IP address of nodes holding files
• Peer then downloads files from those nodes
• Directory constantly updated as nodes join and leave
network.
5
Centralised Network
• Maintenance of directory very simple
• Drawbacks
• Directory vulnerable to attack
• Whole system fails if servers go down
• Original Napster used centralised Network
• Made them liable for copyright breaches
• New Napster a legal pay per music site.
6
Figure 29.1: Centralised network
7
Decentralised Network
• Peers arrange themselves into an overlay network
• Logical network on top of the physical network
• Can be classified as
• Unstructured Networks
• Structured Networks.
8
Unstructured Network
• Nodes linked randomly
• Queries need to flood network
• Can result in high traffic ie not efficient
• Examples include
• Gnutella
• Freenet.
9
Gnutella
• Unstructured decentralised P2P network
• Directory randomly distributed between nodes
• Node A sends query (request for file location) to a known
neighbour node (eg W)
• If Node W knows location of requested data
• Sends location of data back to Node A
• If Node W doesn’t know
• Sends queries to all its known neighbours
• Eventually info gets back to A (if it exists) and
Node A can get copy of file.
10
Gnutella
• Queries flood the network and can cause a large
amount of traffic
• NB each node must have at least 1 neighbour
• On initial software install, a list of peers are included
• Later the commands 'ping' and 'pong’ used to query
if nodes 'alive'
• Unstructured networks do not scale well
• Gnutella uses a tiered system (ultra nodes and
leaves) as well as Query Routing Protocol and
Dynamic Querying to reduce overhead.
11
Structured Network
• Predefined set of rules to link nodes
• Queries are resolved effectively and efficiently
• Distributed Hash Table (DHT) most common
technique used
• Domain Name System (DNS)
• BitTorrent.
12
Distributed Hash Table (DHT)
• Distributes data among a set of nodes according to
some predefined rules
• Each peer in a DHT-based network becomes
responsible for a range of data items
• DHT-based networks allow each peer to have partial
knowledge about whole network
• Avoids flooding overhead found in unstructured
P2P networks.
13
Address Space
• Each data item and responsible peer mapped to a
point in a large address space of size 2m
• Uses modular arithmetic
• Points in address space distributed evenly on a
circle with 2m points (from 0 to 2m – 1)
• Most DHT implementations use m = 160
(~1.5x1048 points)
• Textbook uses m = 5, 25 = 32 in examples for
simplification.
14
Figure 29.2: Address space
15
Hashing Identifiers
• Peers added to address space ring
• Usually use a hash function to encode IP address
• hash function is any function that can be used to
map digital data of arbitrary size to digital data of
fixed size
• node ID = hash (Peer IP address)
• Name of object (eg filename) also hashed and added
to address space ring
• key = hash (Object name)
16
Storing Objects
• Two strategies
• Direct
• Object stored (on original peer) closest to key
• Indirect
• Peer keeps object, reference to object stored
on another peer close to key
• Most common strategy.
17
Example 29.1
• For Figure 29.3, assume several peers already joined
• Node N5 (IP address 110.34.56.20) has file 'Liberty’ to
share with peers
• Node makes hash of filename, 'Liberty' to get key = 14
• Closest node to key 14 is node N17
• N5 creates reference to filename (key), its IP address, and
the port number etc, then sends reference to be stored
in node N17
• ie file stored in N5, key of file is k14 (a point in the DHT
ring), but reference to file stored in node N17.
18
Figure 29.3: Example 29.1
19
Distributed Hash Table (DHT)
• Main function is to route a query to node responsible
for storing reference to an object
• Different routing strategies are used by different
systems
• All involve nodes that have partial knowledge of the
ring to route queries to node closest to responsible
nodes
• All implementations need to handle departures and
arrivals of peers in their networks.
20
P2P Networks
• Three P2P protocols that use DHT
• Chord protocol
• Simple and elegant approach to routing queries
• Pastry protocol
• More complex than chord
• Kademila protocol
• Similar to Pastry, different distance measuring
protocol.
21
Chord
• Published by Stoker in 2001
• Used in several applications
• Collaborative File System (CFS)
• ConChord
• Distributive Domain Name System (DDNS).
22
Pastry
•
•
•
•
Another popular protocol in the P2P paradigm
Designed by Rowstron and Druschel in 2001
Uses DHT
Some fundamental differences between Pastry and
Chord in identifier space and routing process.
23
Pastry
• Used in some applications
• PAST
• Distributed file system
• SCRIBE
• Decentralised publish/subscribe system.
24
Kademlia
• Another DHT peer-to-peer network
• Designed by Maymounkov and Mazières in 2002
• Similar to Pastry, routes messages based on the
distance between nodes
• Address space based on a binary tree
• Interpretation of the distance metric uses bitwise
XOR function to measure distances.
25
Kademlia
26
BitTorrent
• Designed by Bram Cohen (2001) for sharing large
files among a set of peers
• Based on Kademlia
• Sharing different from other file-sharing protocols
• Instead of one peer allowing another peer to
download the whole file, a group of peers take part
in process to give all peers in the group a copy of file
• File sharing a collaborative process called a torrent.
27
BitTorrent with a Tracker
• Original BitTorrent
• Another entity in a torrent, called 'the tracker’
• Central server tracks seeds and peers in swarm
• Seeds
• Peer with whole file
• Leeches
• Peer with part data (downloading more).
28
Figure 29.12: Example of a torrent
29
Trackerless BitTorrent
• Original BitTorrent design
• If tracker fails, new peers cannot connect to
network and updating interrupted
• New implementations of BitTorrent eliminate need
for centralised tracker.
30
End
31