Transcript 04gnutella

Introduction
Widespread unstructured P2P network

Currently between 200,000 & 300,000 hosts
Ideal as a research test bed

Large scale network demonstrates the need
for scalable P2P protocols
A Gnutella client has 4-10 TCP connections to
other peers
For signaling traffic UDP is used and to make
use of the benefits of server based networks
a ”ultra-peer” state was created
Introduction (Cont.)
”Ultra-peer” status is self assigned by powerful peers and
provides some extra functionality compared to ordinary
nodes
There exist many freely available Gnutella clients
Some of the most popular are:




Limewire
Bearshare
Morpheus
Shareaza
It has the most increasing number of users
It has a very pleasant GUI and connects also to eDonkey and
BitTorrent
Its Main Features
This protocol underlies much of the current
file-sharing activity on the Internet.
It is based on TCP/IP and http!
A file sharing network (fsn) is a bunch of
machines that exchange files using gnutella.
To connect to a gnutella network, you need
the IP address of one single machine that is
already part of the network.
Gnutella
Peer-to-peer indexing and searching service.
Peer-to-peer point-to-point file downloading
using HTTP.
A gnutella node needs a server (or a set of
servers) to “start-up”… gnutellahosts.com
provides a service with reliable initial
connection points
But introduces a new single point of failure!
Gnutella vs. Napster
Like Napster, distributed file storage and
transmission
Added the ability to distribute file discovery


Ask your direct peers who else they know
Query those machines directly
Concepts of Unstructured
Services
There are many interesting ideas being explored;

Breaking shared files into many parts to both increase
bandwidth (parallel I/O) and increase security of content
as no one site can access files without cooperation from
its peers
This type of technology makes censorship very hard.


MojoNation has a load balancing and scheduling algorithm
in the form of micro payments to reward those who
contribute most to the community of peers.
Gnutella - which is a family of related products -- is
usually described as a P2P search engine as its interface is
nearer that of a search engine than a Web file system
Characteristics

Gnutella is a distributed system for file
sharing
provide means for network discovery
provide means for file searching and sharing


Defines a network at the application level
Employs the concept of peer-to-peer
all hosts are equal (symmetry)
there is no central point

anonymous search, but reveal the IP addresses
when downloading
connection
Once you establish connection to the first
servent, you announce your presence.
The first servent will pass on that message to
all the servents that it is connected to, and so
on.
These servents all reply with data about
themselves


how many files it is sharing
how many kilo bytes the files take up
This already adds up to a lot of traffic!
Gnutella File Sharing model



Users register files with network neighbors
Search across the network to find files to copy
Does not require a centralized broker (as Napster)
Bob
Where is Final Fantasy 4?
Copying Final Fantasy 4
Carol
Carol has Final Fantasy 4
Ted
Where is Final Fantasy 4?
Carol has it
Alice
Resource Discovery
Decentralized File-sharing
Model
Peers have same capability and responsibility
The communication between peers is symmetric
There is no central directory server Index on the
metadata of shared files is stored locally among all
peers



Gnutella
FreeServe
MojoNation
Resource Discovery
Decentralized (Cont.)
every user acts as a client, a server or
both (servent)
User connects to framework and becomes a
member of the community, allowing others
to connect through him/her
Users speak directly to other users with no
intermediate or central authority
No one entity controls the information
that passes through the community
Resource Discovery
Advantages and Disadvantages
Advantages:



Inherent scalability
Avoidance of “single point of litigation”
problem
Fault Tolerance
Disadvantages:


Slow information discovery
More query traffic on the network
Unstructured Decentralized
Services
There some 200 available Napster clones to support this area
http://www.ultimateresourcesite.com/napster/main.htm
Currently the most popular is Imesh [http://www.imesh.com], which
has some 2 million users and can share any type of file.
Some of the best known file sharing systems are



MojoNation [http://www.mojonation.net]
Freenet [http://freenet.sourceforge.net/]
Gnutella [http://gnutella.wego.com/]
These three are not server based like Napster but rather support
waves of software agents expressing resource availability and
interest propagating among an informal dynamic networks of peers
DFS Variations
DFS: Distributed File Sharing
Napster
Gnutella
Freenet
Shawn Fanning
Gene Kan @ AOL
Ian Clark
Remote file
sharing
(portal)
File-sharing
community
(portal)
Decentralized
file sharing
community
Decentralized
anonymous file
sharing
Yes
Yes
Yes
No
No
Yes
Yes
No
No
No
No
Search
Serverbased
Serverbased
Serverbased
Serverbased
p2p
p2p
File transfer
Client/
server
Client/
server
Client/
server
p2p
p2p
p2p
nfs
http,
caching
http
Proprietary,
encrypted,
caching
FTP
NFS
Web
Purpose
Remote file
sharing
Local file
sharing
Moderated?
Yes
Access control?
File transfer
protocol
ftp
proprietary
P2P File Sharing Benefits
Cost sharing
Resource aggregation
Improved scalability/reliability
Anonymity/privacy
Dynamism
Management/Placement
Challenges
Per-node state
Bandwidth usage
Search time
Fault tolerance/resiliency
Gnutella in Details
Share any type of files
(not just music)
Decentralized search
unlike Napster
You ask your neighbors
for files of interest
Neighbors ask their
neighbors, and so on

TTL field quenches
messages after a
number of hops
Users with matching
files reply to you
Figure from http://computer.howstuffworks.com/file-sharing.htm
The Gnutella protocol (v0.4)
PING – Notify a peer of your existence
PONG – Reply to a PING request
QUERY – Find a file in the network
RESPONSE – Give the location of a file
PUSHREQUEST – Request a server behind a
firewall to push a file out to a client.
Joining Gnutella Network
The new node connects to a
well known ‘Anchor’ node.
Then sends a PING message to
discover other nodes.
PONG messages are sent in
reply from hosts offering new
connections with the new node.
Direct connections are then
made to the newly discovered
nodes.
Gnutella Network
New
PING
PING
PONG
PING
A
PING
PONG
PING
PING
Properties of the Flooding
Searching by flooding:
If you don’t have the file you want, query 7 of
your partners.
If they don’t have it, they contact 7 of their
partners, for a maximum hop count of 10.
Requests are flooded, but there is no tree
structure.
No looping but packets may be received twice
Note: Play gnutella animation at:
http://www.limewire.com/index.jsp/p2p
Query flooding
Gnutella
no hierarchy
use bootstrap node to learn
about others
join message
Send query to neighbors
Neighbors forward query to all
attached neighbors (floods)
If queried peer has object, it
sends message back to
querying peer
query
join
More on query flooding
Pros
peers have similar
responsibilities: no
group leaders
highly decentralized
no peer maintains
directory info
Cons
excessive query traffic
query radius: may not
have content when
present
bootstrap node still
required
maintenance of overlay
network
About the Flooding
There is nothing that stops a servant flooding its
network region with messages.
Cost of maintaining Network
Cost of searching file
Breadth-First Search (BFS)
= source
= forward
query
= processed
query
= found
result
= forward
response
Resource Discovery
Pros and Cons
Benefits:

Peers speak directly with no central authority
Nobody owns the Gnutella Network and nobody can shut it down
No central point of failure
Limited per-node state Isolated node failure can quickly and
automatically be worked around

Free loading

Scalability
Drawbacks:

Searches are less effective and can be slow

Bandwidth intensive
Gnutella network evolving to include “controlled decentralization”
(limewire, bearshare, toadnode)

Searching for a File
A node broadcasts its QUERY
to all its peers who in turn
broadcast to their peers.
Nodes route QUERYHITs
along the QUERY path back to
the sender containing file
location details.
To download files a direct
connection is made using
details of the host in the
QUERYHIT messages.
Gnutella Network
QUERY
HIT
QUERY
QUERY
QUERY
QUERY
HIT
The Cooperation Spectrum
Free Riding
File sharing networks rely on users sharing data
Two types of free riding


Downloading but not sharing any data
Not sharing any interesting data
On Gnutella



15% of users contribute 94% of content
63% of users never responded to a query
Didn’t have “interesting” data
Data from E. Adar and B.A. Huberman (2000), “Free Riding on Gnutella”
Example: GNUTELLA
Summary of the Gnutella’s
Features
Decentralized



No single point of failure
Not as susceptible to denial of service
Cannot ensure correct results
Flooding queries

Search is now distributed but still not scalable
Initials Problems and Fixes
Freeloading: WWW sites offering search/retrieval
from Gnutella network without providing file sharing
or query routing

Block file-serving to browser-based non-file-sharing
users
Prematurely terminated downloads:




Software bugs
long download times over modems
modem users run gnutella peer only briefly (Napster
problem also!) or any users becomes overloaded
fix: peer can reply “I have it, but I am busy. Try again
later”
Initials Problems and Fixes 2
2000: avg size of reachable network only 400-800
hosts

Why so small?
modem users: not enough bandwidth to provide search
routing capabilities: routing black holes
Fix: create peer hierarchy based on capabilities


previously: all peers identical, most modem blackholes
connection preferencing:
favors routing to well-connected peers
favors reply to clients that themselves serve large number of
files: prevent freeloading

Limewire gateway functions as Napster-like central
server on behalf of other peers
for searching purposes
Gnutella Enhancements
Pings/Pongs can consume up to 50% of
bandwidth
Solutions:



Pong Limiting
Pong Caching
Ping Multiplexing
http://www.limewire.com/index.jsp/pingpong
Gnutella enhancements 2
Cache query responses
Results
Evolving Protocol

Gnutella Developer
Forum
UltraPeers
Alternative query routing
algorithms
Can Heterogeneity Make Gnutella
Scale?
Ideas


Replace query flooding with multiple random
walks
Proactive replication
#replicas proportional to sqrt(request rate)

Result: Two orders of magnitude improvement
in terms of query-time, per node load and
message traffic
Can Heterogeneity Make Gnutella
Scale? 2
Gnutella assumption:




All peers are equal
Not true! Heterogeneity among P2P peers
(dial-up users vs. college users)
Evolve topology to match node capacities
Use random walks over this topology
Can Heterogeneity Make Gnutella
Scale? 3
Solution outline




C_i, node capacity in[j,i] messages from j->i, out[i,j]
messages i->j
Init in[i,j]=out[i,j]=0, OutMax[i,j]=c_i/d_I
Update according the messages received/sent
Check if overloaded
If so redirect high-input neighbor to neighbor with high
OutMax (spare capacity)
Intuitively, take yourself out of the loop
If node cannot be found ask neighbor to throttle back
Result: Average query length reduces from 70 to 2-9
hops

depending on topology
Measurement Results
Who is sharing what?
August 2000
The top
Share
As percent of whole
333 hosts (1%)
1,142,645
37%
1,667 hosts (5%)
2,182,087
70%
3,334 hosts (10%)
2,692,082
87%
5,000 hosts (15%)
2,928,905
94%
6,667 hosts (20%)
3,037,232
98%
8,333 hosts (25%)
3,082,572
99%
Problems With Gnutella
Protocol scalability


Message broadcast technique imposes limitations on
the network size
TTL
i
packets per message = ∑noPeers
i=0
In November 2000 dial-up bandwidth
barrier reached
Overlay network efficiency


Random selection of peers results in inefficient use of
the underlying network
Redundant traffic generated on the Internet
Heterogeneous connection
qualities of the Gnutella
35% have upstream bottleneck bandwidth of
at least 100Kbps
only 8% have at least 10Mbps bandwidth
22% have bandwidth 100kbps or less
Number of Shared Files
Why Look at Gnutella
Widespread unstructured P2P network

Currently between 200,000 & 300,000 hosts

2006: still heavily in use by about 2 million users

Gnutella clients (among others):
LimeWire
Morpheus
BearShare
OpenCola
Shareaza


It has the most increasing number of users
It has a very pleasant GUI and connects also to eDonkey
and BitTorrent
Ideal as a research test bed

Large scale network demonstrates the need for scalable P2P
protocols
Limewire: Improvement on Gnutella
Creation peer hierarchy based on capabilities


previously: all peers identical, most modem blackholes
connection preferencing:
favors routing to well-connected peers
favors reply to clients that themselves serve large
number of files: prevent freeloading
Limewire gateway functions as Napster-like central
server on behalf of other peers

for searching purposes
Limewire
The Limewire P2P file sharing program connects to
the Gnutella P2P network
Limewire client software is widely recognized for its
clean user interface that does not contain adware
Sometimes billed as the „fastest file sharing
program”
Limewire claims to offer relatively good search and
download performance
Free Limewire software downloads are available for
Windows, Linux and Macintosh operating systems
Limewire Pro pay clients also exist
BearShare
The BearShare P2P file sharing program is a
popular free software client for the Gnutella
P2P network
Both free and pay downloads of BearShare
file sharing programs exist
Shareaza
Shareaza is an up-and-coming P2P file sharing
program
This client offers an extremely powerful search
engine capable of connecting to multiple popular P2P
networks including eDonkey, BitTorrent and Gnutella
Shareaza file sharing software includes intelligence
for detecting fake and/or corrupted files
The free Shareaza download also contains no ads or
spyware
As the installed base of Shareaza client users grows
 expect Shareaza to become an even better P2P file
sharing program
Anonymous?
The person you are getting the file from knows who
you are

That’s not anonymous.
Other protocols exist where the owner of the files
doesn’t know the requester.
Peer-to-peer anonymity exists
Summary
peer-to-peer networking: applications connect to peer applications
focus: decentralized method of searching for files
each application instance serves to:



store selected files
route queries (file searches) from and to its neighboring
peers
respond to queries (serve file) if file stored locally
Gnutella history:



3/14/00: release by AOL, almost immediately withdrawn
too late: 23K users on Gnutella at 8 am this AM
many iterations to fix poor initial design (poor design turned
many people off)
What we care about:




How much traffic does one query generate?
how many hosts can it support at once?
What is the latency associated with querying?
Is there a bottleneck?