A lecture on the evolution of P2P networks.

Download Report

Transcript A lecture on the evolution of P2P networks.

DSRG Presentation
●
Presenter: Bradley Momberger
●
Date: April 29, 2004
●
Topic: peer-to-peer network survey
Overview
●
P2P* basics
●
Structure of p2p filesharing networks
●
–
Features
–
Search query structure
–
Drawbacks
Other p2p systems
* P2P = “peer-to-peer”
P2P Basics
●
What is P2P?
–
Internet: large P2P network of the hardware
(using TCP/IP), but software and services are
client/server (HTTP, SMTP, NNTP, FTP, etc...p?)
–
Clients “ask for” a service which a server
“provides”
–
P2P exists when:
●
Neither side of a connection is client or server
–
●
DCC on IRC, traffic type is the same in each direction
Each side has both client and server capability
–
SMB, ICMP echo service, mobile ad hoc routing, etc.
A P2P Application: Filesharing
●
SMB on Windows
–
One of the archetypal examples of filesharing
–
“Share” a drive or folder to make it accessible in
a workgroup or domain on the local network.
–
Access other users' shared folders at any time,
given proper credentials.
–
SMB can also be used in the client/server
model, such as with toaster.wpi.edu
Pre-Napster era
●
Music file sharing grew exponentially in
popularity from c.1996-1997
–
Coincided with popularized use of MPEG-1
Layer 3 audio compression (MP3)
–
First MP3 offerings were usually on websites.
●
–
Cease & desist letter campaign from RIAA shortly
followed.
After websites were taken down, MP3s offered
on FTP sites
●
●
Can't search FTP sites through traditional engines.
FTP search engines (like oth.net) were unreliable &
sites were highly transient or overloaded.
Napster
●
Napster was first released to the public in
early 1999
–
Indexed a user's collection of MP3 files and
uploaded index to central server
–
A network of official Napster servers existed,
but users could connect to and search only one
at a time.
–
The Napster protocol was reverse engineered
and became the OpenNap project.
●
●
OpenNap still exists today with applications such
as WinMX (which also supports its own protocol)
Some networks still follow the Napster model.
Napster Search Structure
●
●
●
Search queries on filenames and ID3 tags
(track information contained inside MP3 files)
were sent to central server
Server processed query and returned results to
user, including locations of represented files
Users connected directly with other users for
file transfer requests
–
This was the P2P aspect of Napster
Napster: P2P in 3 nodes
●
Essentially, file trading over Napster requires
at least 3 nodes in the network.
–
Seeker
–
Host
–
Index Server.
Drawbacks of Napster
●
Single point of attack
–
–
A Napster-protocol network can be completely
undone by:
●
Power failure
●
Attack
●
Government intrusion
The network cannot operate without the central
server.
●
●
An easy fallback is to let users of the network host
the servers instead of the company, the “many
targets” approach used by Direct Connect and
WinMX.
A better solution exists.
Decentralized p2p
●
●
●
The original Internet was designed to
operate even if large portions of it were
destroyed.
Later generations of p2p were built around
the same goal, namely, to be immune from
outside attack.
Decentralized, or “pure” p2p apps are
sometimes called “second-generation” p2p
because they were a generational shift away
from the Napster paradigm.
Gnutella
●
The original Gnutella client was released in
early 2000 by Nullsoft, the AOL subsidiary
responsible for Winamp.
–
●
●
Just as quickly, AOL forced its removal from
download.
The Gnutella protocol was reverse engineered
by several organizations and became a number
of interoperable clients.
Gnutella is an example of decentralized, or
“pure” p2p.
Gnutella Search Structure
●
A node on a Gnutella network may start a
search query, transmit it, or answer it.
–
Since a Gnutella node is usually connected to
more than one other node, a query that comes
in on one channel should be transmitted to the
others.
–
Queries are given a TTL (time-to-live) of five to
seven hops, after which they are discarded.
–
Answers to recent queries will move backwards
along the same path as the query itself.
–
File transfer is again done directly between two
nodes, with a “push” provision for firewalls.
Gnutella: P2P in 2 nodes
●
●
Without index servers, the original Gnutella
protocol was able to perform all of its functions
in as few as two nodes.
–
Seeker
–
Host
Here, two file hosts are shown on the network,
in addition to the seeker.
Gnutella: drawbacks
●
●
Citing little distinction made between users
with large and small network resources, the
original Gnutella network collapsed under its
own weight.
–
With a TTL of 7, on a saturated network each
node would service queries from between
75,000 and 1.2 billion nodes.
–
Clearly all but the most endowed nodes would
choke their bandwidth under these conditions.
With transient nodes, losing connection during
download usually means incomplete files.
Gnutella: remedies
●
●
Ultrapeers
–
Concentration of query passing between a
smaller number of nodes with high bandwidth.
–
A leaf node connects to two or three ultrapeers,
while each ultrapeer connects to several
hundred leaves.
–
Immediate relief of the problems of network
saturation.
Swarming
–
Addresses lost transfer connections by allowing a
file to be downloaded piecemeal from multiple
hosts.
Gnutella: derivatives
●
Gnutella2 (Shareaza, Morpheus, giFT)
–
●
FastTrack (KaZaA, Grokster, iMesh)
–
●
Created by the developers of Shareaza,
Gnutella2 adds new features to the protocol for
better network and download performance.
The most popular p2p network in the world.
eDonkey (eDonkey2000, eMule)
–
Not a direct derivative, but network structure is
much the same. Root nodes in eDonkey
routinely have 30,000 leaf nodes (up to 500,000!)
as opposed to Gnutella2's 300-500.
Freenet
●
●
●
●
A p2p network protocol designed for
anonymity and freedom of the press over
anything else.
Files are hosted by the entire network instead
of individual nodes.
Users host a portion of file fragments in an
encrypted local store with no knowledge of
what is contained therein.
Freenet can contain files, websites, regularly
updated material, message boards, even other
protocols.
Freenet: network structure
●
●
●
Freenet servers connect to each other in a
manner similar to Gnutella, except the
addresses of other nodes are not
communicated.
There is no search feature; files are indexed by
keys. A key request is passed from node to
node with no information identifying the
source.
Matching file fragments are returned along the
query path. Commonly accessed fragments
are stored on the path.
Freenet: p2p in 1 node?
●
Locally, the server and client functions of
FreeNet are separate applications.
–
The client makes a file request to the local
server by key, which fills it or passes it on to
other nodes.
–
The current generation of Freenet has a Web
interface and a command-line Java client.
Freenet: drawbacks
●
Lack of complete anonymity
–
●
Lack of search capability
–
●
While Freenet provides a reasonable level of
anonymity, it can be broken through concerted
detection.
The inability to search keys stems directly from
the anonymity features of Freenet. While there
is no inherent search, external tools allow the
user to look up keys based on keywords.
Inefficiency
–
This is practically impossible to remedy.
Freenet: similar projects
●
GNUnet
–
●
MNet
–
●
“Official” censorship-resistant p2p application
from the GNU project
The developers of MNet consider it to be a
“distributed file store.”
MuteNet
–
Distributed p2p application with search, but
with file transfer relayed over search paths,
masked host addresses, and encrypted
communication between nodes.
Other p2p networks
●
Ad hoc p2p networks
–
●
P2P server load balancing
–
●
Akamai
P2P Internet radio streams
–
●
BitTorrent
Peercast, Allcast, Streamer
P2P flow of information
–
Wiki
BitTorrent
●
An ad hoc p2p network centered on the
propagation of a single file.
–
A BitTorrent network has three types of nodes
●
●
●
–
The index server, which maintains and distributes a
list of connected users.
Seeders, which have the complete file being
transferred.
Leechers, who have some or no parts of the file
BitTorrent networks are meant to be transient
●
●
A file is expected to be most popular to download
for a short time after initial release.
A BitTorrent network works best during this period,
with the most number of active users.
Akamai
●
P2P load balancing between Web servers
–
For each client that accesses an Akamai enabled
Web site, the request is redirected to one of a
large pool of servers.
–
Example:
>nslookup www.microsoft.com
Name:
www2.microsoft.akadns.net
Addresses: 207.46.249.252, 207.46.250.222, 207.46.250.252,
207.46.134.221, 207.46.144.188, 207.46.156.252,
207.46.244.188, 207.46.245.156
Aliases:
–
www.microsoft.com, www.microsoft.akadns.net
Use of Akamai servers allowed Microsoft's Web
site to survive the DDoS attack caused by
proliferation of the MyDoom worm.
P2P Internet radio
●
●
Internet radio is usually client/server in nature.
–
Shoutcast and RealMedia Server, for example,
publish audio streams on the Internet.
–
The server must send a separate stream to each
client, a bandwidth-intensive process.
P2P solutions, such as Peercast and Streamer,
allow audio stream publication with less
bandwidth.
–
Nodes accepting an audio stream also relay the
stream to other nodes.
–
Convenient method to skirt statutory royalties,
since the real number of listeners is unknown.
Wiki
●
Wiki is a method for creating information
resources.
–
Main features
●
●
●
Open contribution by anyone
Attack resistance by archiving of each revision of
each page.
Wiki is not p2p in terms of the network
structure. However, the flow of information
being any-to-any means that Wiki is a p2p
information network.
–
The Wiki software is hosted on a Web server,
but anyone may be an editor to the content.
Summary
●
P2P is...
–
Decentralization of services and data
●
–
A blanket term used for different approaches to
sharing resources.
●
–
Allows for network fault tolerance and efficient
propagation of commonly accessed data.
Operates quite often in tandem with the
client/server model.
An exciting and underdeveloped field with
applications across the realm of computer
science.
●
Database systems are no exception!
Questions
?
More information
●
http://en.wikipedia.org/wiki/Peer-to-peer
●
http://www.darkridge.com/~jpr5/doc/gnutella.html
●
http://www.freenetproject.org/
●
http://en.wikipedia.org/wiki/GnuFU
●
http://wikibooks.org/wiki/P2P_File_sharing