Peer-to-Peer Overlay Networks

Download Report

Transcript Peer-to-Peer Overlay Networks

Peer-to-Peer Overlay
Networks
Outline
• Overview of P2P overlay networks
• Applications of overlay networks
• Classification of overlay networks
– Structured overlay networks
– Unstructured overlay networks
– Overlay multicast networks
Overview of P2P overlay networks
• What is P2P systems?
– P2P refers to applications that take advantage of
resources (storage, cycles, content, human presence)
available at the end systems of the internet.
• What is overlay networks?
– Overlay networks refer to networks that are
constructed on top of another network (e.g. IP).
• What is P2P overlay network?
– Any overlay network that is constructed by the
Internet peers in the application layer on top of the IP
network.
Overview of P2P overlay networks
•
P2P overlay network properties
– Efficient use of resources
– Self-organizing
• All peers organize themselves into an application layer network on top of IP.
– Scalability
• Consumers of resources also donate resources
• Aggregate resources grow naturally with utilization
– Reliability
•
•
•
No single point of failure
Redundant overlay links between the peers
Redundant data source
– Ease of deployment and administration
•
•
•
•
The nodes are self-organized
No need to deploy servers to satisfy demand.
Built-in fault tolerance, replication, and load balancing
No need any change in underlay IP networks
Applications of P2P overlay networks
• P2P file sharing
– Napster, Gnutella, Kaza, Emule, Edonkey, Bittorent,
etc.
•
•
•
•
•
•
•
Application layer multicasting
P2P media streaming
Content distribution
Distributed caching
Distributed storage
Distributed backup systems
Grid computing
Classification of overlay networks
• Structured overlay networks
– Are based on Distributed Hash Tables (DHT)
– the overlay network assigns keys to data items and
organizes its peers into a graph that maps each data
key to a peer.
• Unstructured overlay networks
– The overlay networks organize peers in a random
graph in flat or hierarchical manners.
• Overlay multicast networks
– The peers organize themselves into an overlay tree
for multicasting.
Structured overlay networks
• Overlay topology construction is based on NodeID’s that
are generated by using Distributed Hash Tables (DHT).
• In this category, the overlay network assigns keys to
data items and organizes its peers into a graph that
maps each data key to a peer.
• This structured graph enables efficient discovery of data
items using the given keys.
• Storing the objects in the networks is based on
• It Guarantees object detection in O(log n) hops.
• Examples: Content Addressable Network (CAN),
Chord, Pastry.
Unstructured P2P overlay networks
• An Unstructured system composed of peers
joining the network with some loose rules, without
any prior knowledge of the topology.
• Network uses flooding or random walks as the
mechanism to send queries across the overlay
with a limited scope.
• When a peer receives the flood query, it sends a
list of all content matching the query to the
originating peer.
• Examples: FreeNet, Gnutella,KaZaA, BitTorrent
Unstructured P2P File Sharing Networks
• Centralized Directory based P2P systems
• Pure P2P systems
• Hybrid P2P systems
Unstructured P2P File Sharing Networks
• Centralized Directory based P2P systems
– All peers are connected to central entity
– Peers establish connections between each
other on demand to exchange user data (e.g.
mp3 compressed data)
– Central entity is necessary to provide the
service
– Central entity is some kind of index/group
database
– Central entity is lookup/routing table
– Examples: Napster, Bittorent
Unstructured P2P File Sharing
Networks
• Pure P2P systems
– Any terminal entity can be
removed without loss of
functionality
– No central entities employed in
the overlay
– Peers establish connections
between each other randomly
• To route request and response messages
• To insert request messages into the overlay
– Examples: Gnutella, FreeNet
Unstructured P2P File Sharing Networks
•
Hybrid P2P systems
– Main characteristic,
compared to pure P2P:
Introduction of another
dynamic hierarchical layer
– Election process to select
an assign Superpeers
– Superpeers: high degree
(degree>>20, depending
on network size)
– Leafnodes: connected to
one or more Superpeers
(degree<7)
– Example: KaZaA
Superpeer
leafnode
P2P: centralized directory
original “Napster” design
1) when peer connects, it informs
central server:
Bob
centralized
directory server
1
peers
– IP address
– content
2) Alice queries for “Hey Jude”
3) Alice requests file from Bob
1
3
1
2
1
Alice
P2P: problems with centralized directory
• Single point of failure
• Performance bottleneck
• Copyright infringement
file transfer is
decentralized, but
locating content is
highly decentralized
Query flooding: Gnutella
• fully distributed
– no central server
• public domain
protocol
• many Gnutella clients
implementing protocol
overlay network: graph
• edge between peer X and
Y if there’s a TCP
connection
• all active peers and
edges is overlay net
• Edge is not a physical
link
• Given peer will typically
be connected with < 10
overlay neighbors
Gnutella: protocol
 Query message
sent over existing TCP
connections
 peers forward
Query message
 QueryHit
sent over
reverse
Query
path
Scalability:
limited scope
flooding
QueryHit
File transfer:
HTTP
Query
QueryHit
Gnutella: Peer joining
1. Joining peer X must find some other peer in
Gnutella network: use list of candidate peers
2. X sequentially attempts to make TCP with
peers on list until connection setup with Y
3. X sends Ping message to Y; Y forwards Ping
message.
4. All peers receiving Ping message respond with
Pong message
5. X receives many Pong messages. It can then
setup additional TCP connections
Peer leaving: see homework problem!
Exploiting heterogeneity: KaZaA
• Each peer is either a group
leader or assigned to a
group leader.
– TCP connection between
peer and its group leader.
– TCP connections between
some pairs of group leaders.
• Group leader tracks the
content in all its children.
ordinary peer
group-leader peer
neighoring relationships
in overlay network
KaZaA: Querying
• Each file has a hash and a descriptor
• Client sends keyword query to its group leader
• Group leader responds with matches:
– For each match: metadata, hash, IP address
• If group leader forwards query to other group
leaders, they respond with matches
• Client then selects files for downloading
– HTTP requests using hash as identifier sent to peers
holding desired file
KazaA tricks
•
•
•
•
Limitations on simultaneous uploads
Request queuing
Incentive priorities
Parallel downloading
Internet P2P Traffic Statistics
• Between 50 and 65 percent of all download traffic is
P2P related.
• Between 75 and 90 percent of all upload traffic is P2P
related.
• And it seems that more people are using p2p today
• So what do people download?
– 61,4 percent video
11,3 percent audio
27,2 percent is games/software/etc.
• Source: http://torrentfreak.com/peer-to-peer-trafficstatistics/
Overlay Multicasting
• Motivation
– IP multicast has not be deployed over the Internet
due to some fundamental problems in congestion
control, flow control, security, group management and
etc.
– For the new emerging applications such as
multimedia streaming, internet multicast service is
required.
– Solution: Overlay Multicasting
• Overlay multicasting (or Application layer multicasting) is
increasingly being used to overcome the problem of nonubiquitous deployment of IP multicast across heterogeneous
networks.
Overlay Multicasting
• Main idea
– Internet peers organize themselves into an
overlay tree on top of the Internet.
– Packet replication and forwarding are
performed by peers in the application layer
by using IP unicast service.
Overlay Multicasting
• Overlay multicasting benefits
– Easy deployment
• It is self-organized
• it is based on IP unicast service
• There is not any protocol support requirement by the Internet
routers.
– Scalability
• It is scalable with multicast groups and the number of
members in each group.
– Efficient resource usage
• Uplink resources of the Internet peers is used for multicast
data distribution.
• It is not necessary to use dedicated infrastructure and
bandwidths for massive data distribution in the Internet.
Overlay Multicasting
• Classification of overlay multicast
approaches
– DHT based
– Tree based
– Mesh-tree based
Overlay Multicasting
• DHT based
– Overlay tree is constructed on top of the DHT based
P2P routing infrastructure such as pastry, CAN,
Chord, etc.
– Example: Scribe in which the overlay tree is
constructed on a Pastry networks by using a multicast
routing algorithm (similar to core based tree (CBT)).
Overlay Multicasting
• Tree based
– Group members self-organize themselves into a tree by explicitly
picking a parent for each new group.
– Nodes on the tree may establish and maintain control links to
one another in addition to the links provided by the data tree. As
such,the tree, with these additional control links constitutes the
control topology in a tree structure.
– This approach is simple and is capable of building efficient data
delivery trees.
– The tree building algorithm must prevent loops and handle tree
partition as the failure of a single node may cause a partition of
the overlay topology.
– Examples: ALMA, ALMI, OMNI, NICE, ZIGZAG, BTP, Overcast,
…
Overlay Multicasting
• Mesh-tree based
– The mesh-tree approach is a two-step design to the overlay topology.
– It is common for group members to first distributedly organize
themselves into an overlay control topology called the mesh. A routing
protocol runs across this control topology and defines a unique overlay
path to each and every member.
– Data distribution trees rooted at any member is then built across this
mesh based on some multicast routing protocols, e.g. DVMRP.
– Compared to tree only design, mesh-tree approach is more complex.
– it has the advantages of avoiding replicating group management
functions across multiple (per-source) trees, providing more resilience to
failure of members, leveraging on standard routing algorithms thus
simplifying overlay construction and maintenance as loop avoidance
and detection are built-in mechanisms in routing algorithms.
– Examples: Narada, Kudos, Scattercast, Yoid