P2P_Computing

Download Report

Transcript P2P_Computing

Peer-to-Peer Computing
An Introduction
Ding Choon Hoong
Grid Computing and Distributed Systems (GRIDS) Lab.
The University of Melbourne
Melbourne, Australia
www.gridbus.org
WW Grid
Outline





What is Peer-to-Peer Computing?
P2P Topologies
Example P2P Applications
Some key issues
Conclusion
What is peer-to-peer (P2P)
computing?

Webster definition


Peer: one that is of equal standing with another
Computing between equals
Resource Sharing

Exploit idle resources available in the edges


Exploit plentiful resources among network
edges


E.g. network bandwidth
Federated cooperation among companies


E.g. CPU idle cycles, unused storage space, spare network
bandwidth,…
Sharing unavailable resources (e.g. databases)
……
Client-Server vs. P2P

Client-Server paradigm



The client is a dumb device
The server performs all computation, stores data, and
handle the control
Simple architecture, but introduces:

Performance bottlenecks, single point of failure, etc.
Each peer in P2P can be



Client
Server
Intermediate: relaying requests/responses
History of P2P





Origin of P2P dates back to ARPANET
Early P2P applications/servers is Usenet and
DNS
1990s: Shift in paradigm to client-server
1999: Napster => explosion of P2P usage
2000s: Gnutella, Kazaa, Audiogalaxy, etc.
Cluster, Grid, P2P:
Characteristics
Characteristic
Cluster
Grid
P2P
Population
Commodity
Computers
High-end computers
Edge of network
(desktop PC)
Ownership
Single
Multiple
Multiple
Discovery
Membership
Services
Centralised Index &
Decentralised Info
Decentralized
User Management
Centralised
Decentralised
Decentralised
Resource management
Centralized
Distributed
Distributed
Allocation/Scheduling
Centralised
Decentralised
Decentralised
Inter-Operability
VIA based?
No standards yet
No standards
Single System Image
Yes
No
No
Scalability
100s
1000?
Millions? [@Home]
Capacity
Guaranteed
Varies, but high
Varies
Throughput
Medium
High
Very High
Speed(Lat. Bandwidth)
Low, high
High, Low
High, Low
Types of P2P applications





Instant messaging
Managing and sharing information
Collaboration
Distributed Services
…more to come?
Generic P2P Topologies

Centralized Topology
Generic P2P Topologies (cont)

Ring Topology
Generic P2P Topologies (cont)

Hierarchical Topology
Generic P2P Topologies (cont)

Decentralized Topology
Generic P2P Topologies (cont)

Hybrid Topology

Centralized and Ring
Topology
Generic P2P Topologies (cont)

Hybrid Topology

Centralized and Centralized
Topology
Generic P2P Topologies (cont)

Hybrid Topology

Centralized and
Decentralized Topology
Example P2P Applications




SETI@home
Napster
Gnutella
FastTrack
SETI@home
SETI@home uses the National Astronomy
and Ionospheric Center's 305 meter
telescope at Arecibo, Puerto Rico.
A screenshot of the SETI@home
client program.
•2.4 mil volunteers as of Oct. 2000
Napster




Centralized
MP3 file sharing
Clients/Peers hold the files
Servers holds catalog and broker
relationships



Clients upload IP address, music file shared, and requests
Clients request locations where requests can be met
File transfer is P2P – proprietary protocol
Napster (cont)
Napster
Client
Napster Index
Server
Assigned
Index Server
Napster
Client
Napster Server
Cluster
Napster
Client
Query
Reply
2
3
Direct File
Transfer
Napster
Connection Host
Connect
1
4
Napster
Client
Napster
Client
Gnutella



Completely decentralized – no servers with
catalogs
Shares any files
Gnutella node ---- SERVENT


Issue the query and view search result
Accept the query from other SERVENTs and check the
match against its database and response with corresponding
result
Gnutella (cont)

Joining the network:



The new node connects to a well-known SERVENT
Then sends a PING message to discover other nodes
PONG message are sent in reply from hosts offering
connections with the new node
Direct connection are then made
Gnutella (cont)

Searching a file:


A node broadcasts its QUERY to all its peers who in turn
broadcasts to their peers
Nodes route back QUERYHITS along the QUERY path back
to the sender containing the location detail
To download the files a direct connection is made using
details of the host in the QUERYHIT message
Gnutella (cont)



Gnutella broadcasts its messages.
To prevent flooding -TTL is introduced.
To prevent forwarding same mesg. twice each servent maintains a list of recently seen
mesg.
Gnutella (cont)
F
GnuCache
(3)
(2)
(1)
C
(2)
G
(1)
A
B
(4)
(3)
(2)
(3)
(2)
(1)
J
E
(3)
(3)
H
I
(2)
(1)
D
User A connects to the GnuCache to get the list of available servents already
connected in the network
GnuCache sends back the list to the user A
User A sends the request message GNUTELLA CONNECT to the user B
User B replies with the GNUTELLA OK message granting user A to join the network
Gnutella (cont)

Typical query scenario:





A sends a query message to its neighbor, B
B first checks that the message is not an old one
Then checks for a match with its local data
If there is a match, it sends the queryHit message back to
user A
B then decrements TTL by 1 and forwards the query
message to users C, D, and E
C, D, and E performs the same steps as user B and forwards
the query message further to users F, G, H, and I
Gnutella (cont)

Problems


Broadcast mesg. congests the network
Lost of reply packets (dynamic environment)
FastTrack


Hybrid between centralized and
decentralized
Has 2 tiers of control:

Ordinary nodes that connect to super nodes in a centralized
fashion
Super nodes that connect to each other in a decentralized
manner
FastTrack (cont)
FastTrack (cont)

Joining the network? - Bootstrapping node
Querying?

Problems (Like Gnutella)



Broadcast mesg. between Super Nodes
Lost of reply packets
Some key issues

Scalability




Availability



Networks can grow to millions of nodes
Challenge in achieving efficient peer and resource discovery
High amount of query/response traffic
Potential for commercial content provision
Such services require high availability and accessibility
Anonymity

What is the right level of anonymity?
Some key issues (cont)

Security


Due to open nature, have to assume environment is hostile
Concerns include:




Privacy and anonymity
File authenticity
Threats like worms and virus
Fault Resilience

The system must still be able to function even though
several important nodes goes off-line.
Some key issues (cont)

Standards and Interoperability



Lack of standards lead to poor interoperability between
applications
Can be improved by using common protocols
Copyright / Access Control



Classic case of Napster being shut down
Other applications have learned to get around the law
Possibility of paid access in future
Some key issues (cont)

Quality of Service (QoS)



Complexity of Queries



Metrics to be used is not clearly defined
Tradeoff between achieving QoS and costs
Must be able to support query languages of varying degree
of expressiveness
Simple keywords to SQL-like searches
Search Mechanism

Different search algorithms are used to reduced search
time and maximize search space

Load Balancing

existence of hot-spots (overloaded nodes) due to:





uneven node distribution throughout logical space
uneven object distribution among nodes
uneven demand distribution among objects
query and routing hot-spots
Self-organization


Ability to adapt itself to the dynamic nature of the
Internet
Depends on the architecture of the system
Conclusion




Different P2P network topologies
Examples of different P2P applications
Key issues related to P2P
Further reading:
http://www.gridbus.org/~raj/papers/P2PbasedContentShar
ing.pdf