Transcript p2p

Peer-to-Peer Networks
Outline
Overview
Gnutella
Structured Overlays
BitTorrent
CS 640
1
Peer-to-Peer Networks Overview
• A peer-to-peer (P2P) network allows a community of
users to pool their resources
– Content
– Storage
– CPU,…
• A P2P network is both decentralized and self-organizing
– Just like the Internet itself!
• Why do we care about these networks?
– It is challenging to achieve decentralization and scalability at the
same time.
CS 640
2
Gnutella
• One of the first decentralized P2P networks for file sharing
• No central registry of objects.
• Example topology of a Gnutella P2P network
• Edges of the graph correspond to the relationship “A and B know
each other”
CS 640
3
Gnutella cont.
• The simple idea in Gnutella is to distribute the
method for finding data
– Great idea!
– Lots of fun architectural possibilities!
• Gnutella is a distributed search protocol with a
decentralized model
– Clients can issue/view query results
– Clients can serve/request data
– Clients accept queries and respond with matches from
their local data store
CS 640
4
Gnutella Protocol
• Protocol defines method of client communication
– Set of descriptors used for communicating data
– Set of rules governing inter-client exchange of descriptors
• Descriptors
–
–
–
–
Ping: active discovery of hosts on a network
Pong: response to Ping includes client address and metadata
Query: Ask for an object
Query Response: response to Query includes info necessary to get data
• A Gnutella client connects to network by establishing a connection
with another client on the network
– Finding another client is not part of Gnutella spec.
• Host cache services are the typical way this is done
CS 640
5
Gnutella Protocol
• New client then creates connection to the Gnutella client and
thereby becomes part of the network
– Gnutella client can reject the connect request
– Successful new client can then send/receive descriptors
• Pings/pongs are then sent to establish network
– No specification as to how much/often to probe
– Network data can/is cached
• Message routing should be well behaved
– Ping/Query descriptors should be sent to all directly connected clients
– Pong/Query Response descriptors should be sent back along same path
– TTL is mechanism to limit distance
• File downloads via HTTP/1.0 protocol via direct connect
CS 640
6
Gnutella Protocol
• New client then creates connection to the Gnutella client and
thereby becomes part of the network
– Gnutella client can reject the connect request
– Successful new client can then send/receive descriptors
• Pings/pongs are then sent to establish network
– No specification as to how much/often to probe
– Network data can/is cached
• Message routing should be well behaved
– Ping/Query descriptors should be sent to all directly connected clients
– Pong/Query Response descriptors should be sent back along same path
– TTL is mechanism to limit distance
• File downloads via HTTP/1.0 protocol via direct connect
CS 640
7
Gnutella’s Downside
• Flooding does not scale well!
• Alternatives:
– Forward queries randomly or according to probability of
success based on past results
– Proactively replicate objects to make them easier to find
– OR
– Structured Overlays
CS 640
8
Structured Overlays
• Designed to conform to a particular graph
structure that allows reliable and efficient object
location.
• However, with the cost of additional complexity in
overlay construction and maintenance.
• 2 questions to answer:
– How do we map objects to nodes that should serve them?
– How do we find an object?
CS 640
9
Mapping Objects to Nodes
• Hashing is used to map objects to n nodes
• Regular hashing:
Hash(x,n){
Return x % n
}
• Challenge:
– What if a node joins the network?
– What if a node leaves the network?
– How do we choose the proper n?
CS 640
10
Consistent Hashing
• Hash a set of objects x uniformly across a large
ID space
• Each object is maintained on the node whose ID is
closest
• Advantages:
– Distributes objects fairly evenly across nodes
– Only a small number of objects have to move when a node
joins or leaves
CS 640
11
Consistent Hashing cont.
• Example
0
14
Bucket/Node
4
12
8
CS 640
12
How to find an object?
• Idea: Route the query message to the node that
has the object
– Each node maintains a routing table and the IP addresses
of a small set of numerically larger and smaller node IDs.
– Forward the query message to the node that is closer
than you to the destination node
– This is repeated until you get to destination
CS 640
13
How to find an object: Cont.
• Structured Overlays provide a logarithmic bound
on the number of routing hops required to locate
an object
• However, Nodes might be arbitrarily far away
from each other in the Internet!
• Optimizations:
– Route to the physically closest node when possible
– Replicate data across the nodes
CS 640
14