Peer-to-Peer Algorithms and Systems

Download Report

Transcript Peer-to-Peer Algorithms and Systems

Peer-to-Peer Algorithms and System
CS600 Assignment #5
Nov. 22. 2006
Byungchul Park
[email protected]
DPNM Lab., Dept. of CSE, POSTECH
- 1/25 -
DP&NM Lab.
CSE, POSTECH
Presentation Outline
 History of P2P
 Motivation
 Definition of P2P
 Most popular P2P application - Skype
 Overlay Network
– Unstructured P2P
– Structured P2P
 Comparison of P2P models
 Application Services & technology
 Open Issues
 Conclusion
- 2/25 -
DP&NM Lab.
CSE, POSTECH
History of P2P
 Origin of Internet
– ARPANET was a P2P system (not client/server model)
– FTP : killer application of early Internet
– Usenet and DNS
 Revolution of network model – Napster
– Developed by Shawn Fanning
– MP3 file sharing
– Centralized P2P concept
 Obstacles of P2P
– Firewall
– Dynamic IP
– NAT
 Success of Skype
- 3/25 -
DP&NM Lab.
CSE, POSTECH
Motivation
 Requirements of Internet application
– Scalability
– Security & Reliability
– Flexibility & QoS
• Client/Server model can not satisfies the requirements
=>DOS attack, Bottleneck
 P2P Network
– Self-organizing
– More flexible than Client/Server model
- 4/25 -
DP&NM Lab.
CSE, POSTECH
Definition of P2P [1]
 Significant autonomy from central servers
 Exploits resources at the edges of the Internet
– Storage and contents
– Computing power
– Connectivity and presence
 Resources at edge have intermittent connectivity, being
added & removed
- 5/25 -
DP&NM Lab.
CSE, POSTECH
Skype [12]
 One of the most popular VoIP
application
 Skype provide VoIP and IM services
that works behind NAT and firewalls
 50 million users
 Valued at $2.6 billon
 Encrypted protocol & proprietary
network
 How P2P VoIP traffic in Skype
differs from other P2P traffic and
from traffic in traditional voicecommunication?
 Resource consumption?
- 6/25 -
DP&NM Lab.
CSE, POSTECH
Skype Overview
 Skype’s three services
–
–
–
–
VoIP : two-way audio stream, conferences of up to 4user
IM (instant messaging) : small text message
File-transfer
Paid service : initiate & receive calls via regular telephone
number through VoIP-PSTN gateway
 Related to KaZaA – Supernode-based P2P network
 Inherent Structure
 Supernode : plenty of
spare bandwidth &
publicly reachable
- 7/25 -
DP&NM Lab.
CSE, POSTECH
Overlay Network
 Figure 1. from [1]
- 8/25 -
DP&NM Lab.
CSE, POSTECH
Overlay Network (cont)
 Virtual edge
– TCP connection or a pointer to an IP address
 Maintenance of Overlay networks (typical)
–
–
–
–
Periodically ping to make sure neighbor is still alive
verify liveness while messaging
If neighbor goes down, may establish new edge
New node needs to bootstrap
- 9/25 -
DP&NM Lab.
CSE, POSTECH
Overlay Network (cont)
 Design flexibility
– Topology
– Protocol
– Messaging over TCP, UDP,
ICMP
 Underlying physical net is
transparent to developer
- 10/25 -
DP&NM Lab.
CSE, POSTECH
Unstructured P2P [2]
 Unstructured P2P
– No relations between peer and contents
– Searching : query flooding to central server or neighbor peer
 Centralized Server model
– Central server keeps peer information
– Scalability problem and copyright problem
 Pure P2P
– No central server (or directory)
– Searching : query flooding to neighbor peer
– excessive traffic generation
 Hybrid P2P
– Super node
– Hierarchical architecture
- 11/25 -
DP&NM Lab.
CSE, POSTECH
Unstructured P2P Applications
 Napster (Centralized model) [6]
 Application-level, clientserver point-to-point TCP
 Steps:
–
–
–
–
connect to Napster
upload list of files
give server search query
receive “best” of correct
answers (ping)
- 12/25 -
DP&NM Lab.
CSE, POSTECH
Unstructured P2P Applications (cont)
 Gnutella [7,8]
 Focus: decentralized method of searching for files
– Central directory server no longer problem (bottleneck)
– Harder control of the system
 Each application instances serves to:
– store/serve files
– route queries to its neighbors
– respond to request queries
 Limited scope query
– if you don’t have the file you want, query 7 of your neighbors.
– if they don’t have it, they contact 7 of their neighbors, for a
maximum hop count of 10.
– reverse path forwarding for responses (not files)
- 13/25 -
DP&NM Lab.
CSE, POSTECH
Unstructured P2P Applications (cont)
 Overlay management of Pure P2P
 New node uses bootstrap node to get IP addresses of
existing Gnutella nodes
 New node establishes neighboring relations by sending
join messages
- 14/25 -
DP&NM Lab.
CSE, POSTECH
Unstructured P2P Applications (cont)
 KaZaA & Skype [3]
 Every peer is either a
supernode or is assigned to
a supernode
 Each supernode knows
where the other
supernodes are (Mesh
structure)
 Cross between Napster
and Gnutella
- 15/25 -
DP&NM Lab.
CSE, POSTECH
Structured P2P
 Structured P2P
– Contents-peer mapping
– Content-addressing tech.
– Searching complexity : O(logN) -> scalability
 What is DHT?
– Distributed Hash Table
– files are associated with a key (produced, by hashing the file
name) and each node in the system is responsible for storing a
certain range of keys [3]
 Example
– Hash (“The Devil Wears Prada”) = 8045
– Pastry, Tapestry, Chord, CAN, Viceroy
- 16/25 -
DP&NM Lab.
CSE, POSTECH
Comparison of P2P models [5]
- 17/25 -
DP&NM Lab.
CSE, POSTECH
Application Service & Tech.
1.
2.
3.
4.
Information Sharing
File Sharing
Bandwidth efficiency
Computing power sharing
- 18/25 -
DP&NM Lab.
CSE, POSTECH
Information Sharing
 Presence information
– Basic operation of P2P application
– e.g. buddy list of Skype or MSN
 Document Management
– Most documents are stored with personal PCs
– P2P networking provides the repository that can coordinates
distributed local data
– e.g. NextPage-NXT 4 [9]
 Collaboration
– P2P networking provides collaboration service without central
management system
– e.g. Groove Virtual Office [10] (ad-hoc collaboration system)
• Instance messaging, File sharing
• Real-time DB synchronization, etc
- 19/25 -
DP&NM Lab.
CSE, POSTECH
File Sharing
 Representative P2P application service
 Over 70% of Internet traffic generated by P2P file sharing
 File searching mode
– Flooding model (e.g. Gnutella)
– Central directory model (e.g. Napster)
– Document routing model (e.g. Freenet)
 Document routing model
– A file saved on anonymous peer
– An identifier assigned to each file and peer
– A file saved on a peer that has similar identifier
- 20/25 -
DP&NM Lab.
CSE, POSTECH
Bandwidth Efficiency
 Client/Server model involved with bottleneck problem and
queuing delay
 Load balancing
– Multimedia streaming service and VoD service
– A contents copied to numerous peers
– e.g. PeerCast, Peer-to-Peer-Radio
 Improving download speed
– If file is found in multiple nodes, user can select parallel
downloading
– Most likely HTTP byte-range header used to request different
portions of the file from different nodes
– Automatic recovery when server peer stops sending file
- 21/25 -
DP&NM Lab.
CSE, POSTECH
Computing Power Sharing
 seti@home [11]
–
–
–
–
–
Search for ET intelligence
Central site collects radio telescope data
Data is divided into work chunks of 300Kbytes
User obtains client, which runs in background
Peer sets up TCP connection to central computer, downloads
chunk
– Peer does FFT on chunk, uploads results, gets new chunk
» Not peer to peer, but exploits
» resources at network edge
- 22/25 -
DP&NM Lab.
CSE, POSTECH
Open Issues
 Copyright issues
 Avoid Firewall or detection
– Dynamic port allocation
– NAT traversal
- 23/25 -
DP&NM Lab.
CSE, POSTECH
Conclusion
 P2P networking is a promising paradigm for services
operating at the edges of the network [4]
 Significant autonomy from central servers
 Scalability, Security & Reliability, Flexibility & QoS
 P2P networks still have a long way to go
- 24/25 -
DP&NM Lab.
CSE, POSTECH
References
 [1] K.W. Ross and Dan Rubenstein, "Tutorial on P2P Systems " presented at Infocom
2003, http://cis.poly.edu/~ross/p2pTheory/P2Preading.htm.
 [2] Peer-to-peer information systems: concepts and models, state-of-the-art, and future
systems, Karl Aberer, Manfred Hauswirth Tutorial at the 18th International Conference
on Data Engineering, February 26-March 1, 2002, San Jose, California.
 [3]Nathaniel Leibowitz, Matei Ripeanu, and Adam Wierzbicki, "Deconstructing the
Kazaa Network", 3rd IEEE Workshop on Internet Applications (WIAPP'03), June 2003.
 [4] Management of Peer-to-Peer Networks, K. Tutschku., http://www3.informatik.uniuerzburg.de/ITG/2001/vortrag/vortrag13.pdf.
 [5] 전자통신동향분석 제 21권 제 5호, ETRI, 2006년 10월
 [6] Napster, “Napster Messages,” http://opennap.sourceforge.net/napster.txt, 2000. M.
Ripeanu, I. Foster, and A. Iamnitchi, “Mapping
 [7]the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and
Implications for System Design,” IEEE Internet Computing Journal, Vol.6, No.1, 2002.
 [8] M. Ripeanu, “Peer-to-Peer Architecture Case Study: Gnutella Network,” In Proc. of
IEEE 1st Int’l Conf. on Peer-to-Peer Computing, 2001.
 [9] Next Page Inc, http://www.nextpage.com/pdfs/collateral/wp/nxt4 build
wp090903lo.pdf, 2004.
 [10] Groove Networks, http://www.groove.net, 2004.
 [11] D. Anderson, SETI@home, chapter 5, O’Reilly, Sebastopol, 2001, pp.67–76.
 [12] Saikat Guha, Neil Daswani, Ravi Jain, “An Experimental Study of the Skype Peerto-Peer VoIP System”, IPTPS 2006.
- 25/25 -
DP&NM Lab.
CSE, POSTECH