Peer to Peer Network with Application
Download
Report
Transcript Peer to Peer Network with Application
Peer-to-Peer (P2P) networks
and applications
What is P2P?
“the sharing of computer resources and
services by direct exchange of
information”
What is P2P?
“P2P is a class of applications that take
advantage of resources – storage, cycles,
content, human presence – available at the
edges of the Internet. Because accessing
these decentralized resources means
operating in an environment of unstable and
unpredictable IP addresses P2P nodes must
operate outside the DNS system and have
significant, or total autonomy from central
servers”
What is P2P?
“A distributed network architecture may
be called a P2P network if the participants
share a part of their own resources. These
shared resources are necessary to provide
the service offered by the network. The
participants of such a network are both
resource providers and resource
consumers”
What is P2P?
Various definitions seem to agree on
sharing of resources
direct communication between equals (peers)
no centralized control
What is a peer?
“…an entity with capabilities similar
to other entities in the system.”
Client/Server Architecture
Well known,
powerful, reliable
server is a data
source
Clients request data
from server
Server
Internet
Very successful
model
WWW (HTTP), FTP,
Web services, etc.
Client
Client
Client
* Figure from http://project-iris.net/talks/dht-toronto-03.ppt
Client
Client/Server Limitations
Scalability is hard to achieve
Presents a single point of failure
Requires administration
Unused resources at the network edge
P2P systems try to address these
limitations
P2P Architecture
All nodes are both
clients and servers
Provide and consume
data
Any node can initiate a
connection
Node
No centralized data
Internet
source
“The ultimate form of
democracy on the
Internet”
“The ultimate threat to
copy-right protection
on the Internet”
Node
Node
Node
* Content from http://project-iris.net/talks/dht-toronto-03.ppt
Node
P2P Network Characteristics
Clients are also servers and routers
Nodes contribute content, storage, memory, CPU
Nodes are autonomous (no administrative
authority)
Network is dynamic: nodes enter and leave the
network “frequently”
Nodes collaborate directly with each other (not
through well-known servers)
Nodes have widely varying capabilities
P2P Goals and Benefits
Efficient use of resources
Scalability
Nodes self-organize
Built-in fault tolerance, replication, and load balancing
Increased autonomy
not easy in a centralized system
Replicas
Geographic distribution
No single point of failure
Ease of administration
No central information, communication and computation bottleneck
Aggregate resources grow naturally with utilization
Reliability
Unused bandwidth, storage, processing power at the “edge of the network”
Anonymity – Privacy
Dynamism
highly dynamic environment
ad-hoc communication and collaboration
What is Peer-to-Peer (P2P)?
P2P Applications
File sharing (Napster, Gnutella, Kazaa)
Multiplayer games (Unreal Tournament, DOOM)
Collaborative applications (ICQ, shared whiteboard)
Distributed computation (Seti@home)
Ad-hoc networks
P2P System Taxonomy
Historic
Data-centric
Computation-centric
User-centric
Network-centric
Platforms
Computation-centric
User-centric
sendMessage
receiveMessage
sendMessage
receiveMessage
User-centric (Common
implementation)
sendMessage
receiveMessage
sendMessage
receiveMessage
Network-centric
Network-centric
Platforms
Gnutella
Find Peers
Instant Messaging
…
Send Messages
P2P Goals/Benefits
Cost sharing
Resource aggregation
Improved scalability/reliability
Increased autonomy
Anonymity/privacy
Dynamism
Ad-hoc communication
P2P Challenges
Decentralization
Scalability and Performance
Anonymity
Fairness
Dynamism
Security
Transparency
Fault Resilience and Robustness
Peer-to-Peer Content Sharing
Popular file sharing P2P Systems
Napster, Gnutella, Kazaa, Freenet
Large scale sharing of files.
User A makes files (music, video, etc.) on their
computer available to others
User B connects to the network, searches for
files and downloads files directly from user A
Issues of copyright infringement
Research Areas
Peer discovery and group management
Data placement and searching
Reliable and efficient file exchange
Security/privacy/anonymity/trust
Design Concerns
Group Management
Per-node state
Load balancing
Fault tolerance/resiliency
Search
Bandwidth usage
Time to locate item
Success rate
Fault tolerance/resiliency
Approaches
Centralized
Unstructured
Structured (Distributed Hash Tables)
Centralized index
original “Napster” design
1) when peer connects, it
informs central server:
Bob
centralized
directory server
1
peers
IP address
content
2) Alice queries for “Hey
Jude”
3) Alice requests file from
Bob
1
3
1
2
1
Alice
Centralized model
Bob
Alice
file transfer is
decentralized, but
locating content is
highly centralized
Judy
Jane
Centralized
Benefits:
Low per-node state
Limited bandwidth
usage
Short location time
High success rate
Fault tolerant
Drawbacks:
Single point of failure
Limited scale
Possibly unbalanced
load
copyright infringement
Bob
Judy
Alice
Jane
Napster
program for sharing files over the Internet
a “disruptive” application/technology?
history:
5/99: Shawn Fanning (freshman, Northeasten U.) founds
Napster Online music service
12/99: first lawsuit
3/00: 25% UWisc traffic Napster
2000: est. 60M users
2/01: US Circuit Court of
Appeals: Napster knew users
violating copyright laws
7/01: # simultaneous online users:
Napster 160K, Gnutella: 40K, Morpheus: 300K
Napster: how does it work
Application-level, client-server protocol over point-topoint TCP
Four steps:
Connect to Napster server
Upload your list of files (push) to server.
Give server keywords to search the full list with.
Select “best” of correct answers. (pings)
Napster
1. File list is
uploaded
napster.com
users
Napster
2. User
requests
search at
server.
napster.com
Request
and
results
user
Napster
3. User pings
hosts that
apparently
have data.
Looks for best
transfer rate.
napster.com
pings
pings
user
Napster
napster.com
4. User retrieves
file
Retrieves
file
user
Napster
Central Napster server
Can ensure correct results
Fast search
Bottleneck for scalability
Single point of failure
Susceptible to denial of service
• Malicious users
• Lawsuits, legislation
Hybrid P2P system – “all peers are equal but some are more equal
than others”
Search is centralized
File transfer is direct (peer-to-peer)
Unstructured
fully distributed
no central server
used by Gnutella
Each peer indexes the
files it makes available
for sharing (and no
other files)
overlay network: graph
edge between peer X
and Y if there’s a TCP
connection
all active peers and
edges form overlay net
edge: virtual (not
physical) link
given peer typically
connected with < 10
overlay neighbors
Gnutella: Query flooding
Query message
sent over existing TCP
connections
peers forward
Query message
QueryHit
sent over
reverse
Query
path
QueryHit
File transfer:
HTTP
Query
QueryHit
Gnutella: Peer joining
joining peer Alice must find another peer in
Gnutella network: use list of candidate peers
2. Alice sequentially attempts TCP connections with
candidate peers until connection setup with Bob
3. Flooding: Alice sends Ping message to Bob; Bob
forwards Ping message to his overlay neighbors
(who then forward to their neighbors….)
peers receiving Ping message respond to Alice
with Pong message
4. Alice receives many Pong messages, and can then
setup additional TCP connections
1.
Unstructured
Carl
Jane
Scalability:
limited scope
flooding
Bob
Alice
Judy
Unstructured
Carl
Jane
Gnutella model
Benefits:
Limited per-node state
Fault tolerant
Drawbacks:
High bandwidth usage
Long time to locate item
No guarantee on success rate
Possibly unbalanced load
Bob
Alice
Judy
Gnutella
Searching by flooding:
If you don’t have the file
you want, query 7 of your
neighbors.
If they don’t have it, they
contact 7 of their
neighbors, for a maximum
hop count of 10.
Requests are flooded, but
there is no tree structure.
No looping but packets may
be received twice.
Reverse path forwarding
* Figure from http://computer.howstuffworks.com/file-sharing.htm
Gnutella
fool.* ?
TTL = 2
Gnutella
X
fool.her
IPX:fool.her
TTL = 1
TTL = 1
TTL = 1
Gnutella
Y
fool.me
fool.you
IPY:fool.me
fool.you
Gnutella
IPY:fool.me
fool.you
Gnutella: strengths and weaknesses
pros:
flexibility in query processing
complete decentralization
simplicity
fault tolerance/self-organization
cons:
severe scalability problems
susceptible to attacks
Pure P2P system
Gnutella: initial problems and fixes
2000: avg size of reachable network only 400-800
hosts. Why so small?
modem users: not enough bandwidth to provide search
routing capabilities: routing black holes
Fix: create peer hierarchy based on capabilities
previously: all peers identical, most modem black holes
preferential connection:
• favors routing to well-connected peers
• favors reply to clients that themselves serve large number
of files: prevent freeloading
Structured
001
012
FreeNet, Chord, CAN,
Tapestry, Pastry model
212 ?
212 ?
332
212
305
Structured
001
012
FreeNet, Chord, CAN,
Tapestry, Pastry model
Benefits:
Manageable per-node state
Manageable bandwidth usage
and time to locate item
Guaranteed success
Drawbacks:
Possibly unbalanced load
Harder to support fault
tolerance
212 ?
212 ?
332
212
305
Unstructured vs Structured
P2P
The systems we described do not offer any
guarantees about their performance (or even
correctness)
Structured P2P
Scalable guarantees on numbers of hops to answer
a query
Maintain all other P2P properties (load balance,
self-organization, dynamic nature)
Approach: Distributed Hash Tables (DHT)
Distributed Hash Tables (DHT)
Distributed version of a hash table data structure
Stores (key, value) pairs
The key is like a filename
The value can be file contents, or pointer to location
Goal: Efficiently insert/lookup/delete (key, value) pairs
Each peer stores a subset of (key, value) pairs in the system
Core operation: Find node responsible for a key
Map key to node
Efficiently route insert/lookup/delete request to this node
Allow for frequent node arrivals/departures
DHT Desirable Properties
Keys should mapped evenly to all nodes in
the network (load balance)
Each node should maintain information
about only a few other nodes (scalability,
low update cost)
Messages should be routed to a node
efficiently (small number of hops)
Node arrival/departures should only affect
a few nodes
DHT Routing Protocols
DHT is a generic interface
There are several implementations of this interface
Chord [MIT]
Pastry [Microsoft Research UK, Rice University]
Tapestry [UC Berkeley]
Content Addressable Network (CAN) [UC Berkeley]
SkipNet [Microsoft Research US, Univ. of Washington]
Kademlia [New York University]
Viceroy [Israel, UC Berkeley]
P-Grid [EPFL Switzerland]
Freenet [Ian Clarke]
Basic Approach
In all approaches:
keys are associated with globally unique IDs
integers of size m (for large m)
key ID space (search space) is uniformly populated
- mapping of keys to IDs using (consistent) hashing
a node is responsible for indexing all the keys in a
certain subspace (zone) of the ID space
nodes have only partial knowledge of other node’s
responsibilities
Improvements: SuperPeers
KaZaA model
Hybrid centralized and unstructured
Advantages and disadvantages?
Hierarchical Overlay
between centralized
index, query flooding
approaches
each peer is either a
super node or assigned to
a super node
TCP connection between
peer and its super node.
TCP connections between
some pairs of super nodes.
Super node tracks content
in its children
ordinary peer
group-leader peer
neighoring relationships
in overlay network
Kazaa (Fasttrack network)
Hybrid of centralized Napster and decentralized Gnutella
hybrid P2P system
Super-peers act as local search hubs
Each super-peer is similar to a Napster server for a small
portion of the network
Super-peers are automatically chosen by the system based on
their capacities (storage, bandwidth, etc.) and availability
(connection time)
Users upload their list of files to a super-peer
Super-peers periodically exchange file lists
You send queries to a super-peer for files of interest
File Distribution: Server-Client vs P2P
Question : How much time to distribute file
from one server to N peers?
us: server upload
bandwidth
Server
us
File, size F
dN
uN
u1
d1
u2
ui: peer i upload
bandwidth
d2
Network (with
abundant bandwidth)
di: peer i download
bandwidth
File distribution time: server-client
server sequentially
sends N copies:
NF/us time
client i takes F/di
time to download
Server
F
us
dN
u1 d1 u2
d2
Network (with
abundant bandwidth)
uN
Time to distribute F
to N clients using = dcs = max { NF/us, F/min(di) }
i
client/server approach
increases linearly in N
(for large N)
File distribution time: P2P
server must send one
Server
F
u1 d1 u2
d2
copy: F/us time
us
client i takes F/di time
Network (with
dN
to download
abundant bandwidth)
uN
NF bits must be
downloaded (aggregate)
fastest possible upload rate: us + Sui
dP2P = max { F/us, F/min(di) , NF/(us + Sui) }
i
Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
Minimum Distribution Time
3.5
P2P
Client-Server
3
2.5
2
1.5
1
0.5
0
0
5
10
15
20
N
25
30
35
File distribution: BitTorrent
P2P file distribution
tracker: tracks peers
participating in torrent
obtain list
of peers
trading
chunks
peer
torrent: group of
peers exchanging
chunks of a file
BitTorrent (1)
file divided into 256KB chunks.
peer joining torrent:
has no chunks, but will accumulate them over time
registers with tracker to get list of peers,
connects to subset of peers (“neighbors”)
while downloading, peer uploads chunks to other
peers.
peers may come and go
once peer has entire file, it may (selfishly) leave or
(altruistically) remain
BitTorrent (2)
Pulling Chunks
at any given time,
different peers have
different subsets of
file chunks
periodically, a peer
(Alice) asks each
neighbor for list of
chunks that they have.
Alice sends requests
for her missing chunks
rarest first
Sending Chunks: tit-for-tat
Alice sends chunks to four
neighbors currently
sending her chunks at the
highest rate
re-evaluate top 4 every
10 secs
every 30 secs: randomly
select another peer,
starts sending chunks
newly chosen peer may
join top 4
“optimistically unchoke”
BitTorrent: Tit-for-tat
(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers
With higher upload rate,
can find better trading
partners & get file faster!
P2P: searching for information
Index in P2P system: maps information to peer location
(location = IP address & port number)
.
Instant messaging
File sharing (eg e-mule)
Index maps user
Index dynamically
names to locations.
tracks the locations of
files that peers share.
When user starts IM
application, it needs to
Peers need to tell
inform index of its
index what they have.
location
Peers search index to
Peers search index to
determine where files
determine IP address
can be found
of user.
P2P Case study: Skype
Skype clients (SC)
inherently P2P: pairs
of users communicate.
proprietary
Skype
login server
application-layer
protocol (inferred via
reverse engineering)
hierarchical overlay
with SNs
Index maps usernames
to IP addresses;
distributed over SNs
Supernode
(SN)
Anonymity
Napster, Gnutella, Kazaa don’t provide
anonymity
Users know who they are downloading from
Others know who sent a query
Freenet
Designed to provide anonymity among other
features
Freenet
Keys are mapped to IDs
Each node stores a cache of keys, and a routing table
for some keys
routing table defines the overlay network
Queries are routed to the node with the most similar
key
Data flows in reverse path of query
Impossible to know if a user is initiating or forwarding a query
Impossible to know if a user is consuming or forwarding data
Keys replicated in (some) of the nodes along the path
Freenet routing
N02
N03
K117 ?
N04
TTL = 8
N01
N05
K124 N02
K317 N08
K613 N05
N06
N07
N08
Freenet routing
K514 N01
sorry...
N02
N03
TTL = 7
K117 ?
N04
N01
N05
TTL = 6
K124 N02
K317 N08
K613 N05
N06
N07
N08
Freenet routing
K514 N01
N02
N03
TTL = 7
K117 ?
N04
N01
N05
TTL = 6
K124 N02
K317 N08
K613 N05
N06
K100 N07
K222 N06
K617 N05
N07
N08
TTL = 5
Freenet routing
K514 N01
N02
N03
TTL = 7
K117 ?
N04
N01
N05
TTL = 6
K124 N02
K317 N08
K613 N05
N06
117
K100 N07
K222 N06
K617 N05
N07
N08
TTL = 5
117
Freenet routing
K514 N01
N02
N03
TTL = 7
K117 ?
N04
N01
N05
TTL = 6
K124 N02
K317 N08
K613 N05
117
N06
117
K117
K100
K222
K617
N07
N07
N06
N05
117
N08
TTL = 5
N07
Freenet routing
K514 N01
N02
N03
N04
N01
K117
K124
K317
K613
117
N05
N07
N02
N08
N05
N06
117
K117
K100
K222
K617
N07
N07
N06
N05
117
N08
N07
Freenet routing
K514 N01
N02
N03
N04
N01
K117
K124
K317
K613
117
N05
N08
N02
N08
N05
N06
117
K117
K100
K222
K617
N08
N07
N06
N05
117
N07
N08
Inserts are performed similarly – they are unsuccessful queries
Freenet strengths and weaknesses
pros:
complete decentralization
fault tolerance/self-organization
anonymity
scalability (to some degree)
cons:
questionable efficiency & performance
rare keys disappear from the system
improvement of performance requires high overhead
(maintenance of additional information for routing)
P2P Review
Two key functions of P2P systems
Sharing content
Finding content
Sharing content
Direct transfer between peers
• All systems do this
Structured vs. unstructured placement of data
Automatic replication of data
Finding content
Centralized (Napster)
Decentralized (Gnutella)
Probabilistic guarantees (DHTs)
Issues with P2P
Free Riding (Free Loading)
Two types of free riding
• Downloading but not sharing any data
• Not sharing any interesting data
On Gnutella
• 15% of users contribute 94% of content
• 63% of users never responded to a query
– Didn’t have “interesting” data
No ranking: what is a trusted source?
“spoofing”