Gnutella2: A Better Gnutella?

Download Report

Transcript Gnutella2: A Better Gnutella?

Gnutella2: A Better
Gnutella?
COMP 5204: Data Networks
Julie Thorpe
School of Computer Science
Carleton University
Introduction

Gnutella and Gnutella2 are P2P application protocols.

Gnutella has an interesting history – essentially a
reverse engineered Beta version.
 Changes

governed by Gnutella Developers Forum (GDF).
Gnutella2 is a completely different protocol than
Gnutella, claiming it's what Gnutella should have been.
 The
GDF reject this claim, and refuse to call Gnutella2 by its
name – they instead call it “Mike's Protocol”.

Unclear whether Gnutella2 better than Gnutella.
 My
project goal is to compare these protocols to determine
which is theoretically better.
Presentation Outline

Gnutella review

Gnutella's problems

Gnutella2 review

Comparison:
 Network
architecture
 Searching
algorithms
 Cooperation
incentives
 Security

Concluding remarks
Gnutella Review (1)

Purely decentralized, simple protocol for file
sharing.

Runs over TCP/IP connections.

New node enters Gnutella network by connecting
to a known server (sends Ping), and when server
responds (sends Pong) they are now connected
peers.

Learns of other nodes if server forwards Ping to its
peers (and gets a Pong back in response).
Gnutella Review (2)

Has no way to advertise
files.

Peers find files by “flooding”
with query requests (stop
after TTL hops).

Query responses are
routed back along the same
path as the request arrived
upon.

Addressed by GUIDs.
Gnutella's Problems

Scalability.

Performance.

Lack of cooperation incentives.

Abuse by servents (Gnutella client/server program).
Gnutella's Problems - Scalability
• For a node to be reached by a query, it (and all other
nodes on the path to it) must be forwarded the
request!
• The reach is determined by n (# connections to other
hosts) and TTL (# hops each request is permitted to
take):
TTL
 (n  1)
t 1
.n
t 1
• Assumption: nodes all have the same n and TTL.
• Ritter [1] estimated that for reasonable parameters, to
achieve a reach of 106 nodes, Gnutella nodes must
have a bandwidth between 19.2 and 64 Gbps!
Gnutella's Problems - Performance

Messages are forwarded
through many other peers on the
network.

Connection speeds are
effectively restricted to the
bandwidth of the slowest peer
along the route.

If A and B have high-speed
connections, and C has a
modem connection, the
download rate between A and B
is limited to C's speed.
Gnutella's Problems – Lack of
Cooperation Incentives

One study found that over 70% of users shared no
files, and 50% of all responses are returned by the
top 1% [2] .

Implications of free-riders on Gnutella:
 Increased
search horizon (farthest set of hosts reachable
by a search request, directly related to its TTL).
 The
top 1% that is providing most files reaches
connection saturation.

This generally a difficult problem to solve for purely
decentralized systems.
Gnutella's Problems – Abuse by
Servents

Since it's a protocol, implementations can implement
the Gnutella protocol as they please (in theory).

Servents can act selfishly to improve their performance
 Increasing
TTL to increase search horizon (generating
geometrically higher # messages).
 Frequent
network).
re-querying (generating more messages, degrading
Gnutella2

GDF recommended (in version
0.6) “ultrapeers” to improve
Gnutella's performance and
scalability.
 Gnutella2
enforces a variation of
this to improve performance and
scalability.

Decentralized, 2-tier hierarchy
of peers (“leaf nodes” and
“hubs”).

Other important differences will
come out in comparison.
Comparison: Gnutella vs. Gnutella2

Network architecture

Searching algorithms

Cooperation incentives

Security
Gnutella's Network Architecture

Recall it is completely
decentralized.
Gnutella2's Network Architecture

Decentralized, 2-tier.

This architecture is
recommended for Gnutella in
v0.6.

New node enters by connecting
to a known hub (almost identical
to Gnutella's handshake).

Hubs typically accept 300-500
leaves, and connect to 5-30
other hubs.

Leaves typically connect to 3
hubs.
Comparison – Network Architecture

Gnutella's purely decentralized is much simpler, but
not as scalable.

No real difference between Gnutella's v0.6
“ultrapeer” structure and Gnutella2's “hub”
structure.
 Either
of these strategies should reduce searching traffic
as explained in searching algorithms.
Gnutella's Searching Algorithm

Recall that using the purely decentralized version, packets are
flooded throughout the network.

If the v0.6 ultrapeer recommendation is implemented, searching
is optimized using Query hash tables(QHTs).
 A QHT
is maintained by each node, and describes the
content it is sharing.
 An
ultrapeer maintains an aggregate of its leaf's QHTs and its
own QHT.
 Searches
are performed by forwarding a query to an
ultrapeer, who checks its aggregate QHT for a match.

If there is a match, the query is forwarded to the
appropriate leaf, otherwise the query is forwarded to
neighbouring ultrapeers by “flooding”.
Gnutella2's Searching Algorithm

Ultrapeers are called “hubs”.

Uses a QHT like Gnutella, but if a hub cannot match a
query to its aggregate QHT, it checks a set of caches:
 Each
hub maintains a cached copy of each neighbouring
hub's aggregate QHT.
 Upon
a search miss, a hub will try to match the query against
its cached copies of its neighbours QHTs.
 If
the query matches, it will forward the query once, and the
node that receives the query processes it and directly sends
the result back to the client.
 If
no match is made, the searching client will continue at
another untried hub.
Comparison – Searching Algorithms

Both Gnutella with Ultrapeers and Gnutella2
significantly reduce the number of messages.
 Should

increase performance and scalability.
Gnutella2's method further reduces the number of
messages sent for a query:
 Less
query request messages due to caching neighbour's
QHTs.
 Less
response messages since Gnutella2's method allows
the responding node to directly contact the requesting node,
rather than sending the message back through the path to
get there.
Comparison: Cooperation Incentives

Neither Gnutella nor Gnutella2 specify cooperation
incentives.

Implementations often will not allow connections to
network unless you share something, and by default
make downloads shared.

The problem of cooperation incentives in a
decentralized environment is interesting, since nodes
can avoid connecting to those that have a profile of
their behaviour.
Gnutella's Security

Gnutella's query messages are routed through peers,
and the query does not contain the querying node's IP
address, but a Globally Unique Identifier (GUID).
 Provides
anonymity by masking requester's identity.

Denial-of-service (DOS) attacks are possible by
flooding the network with many requests with a fake
GUID.

Another node could be similarly DOS'ed if a GUID for
one of their request GUIDs is known.

Response IP addresses could be spoofed and
malicious content provided.
Gnutella2's Security

Gnutella2 does not use GUIDs for queries
 Sends
the response directly back to the requesting node.

The QHTs do not contain information about the content
stored on a neighbouring node, providing privacy.

Queries make use of query keys (to verify the query
return address is that of the original sender).
 Prevents
malicious users from sending out queries for the
purpose of flooding the network with spoofed requests.
 Search
clients only permitted to query a hub after obtaining a
“query key”, which are unique to each (hub, search client
return address) from it to include in the transmission.
Comparison - Security

Both Gnutella with ultrapeers and Gnutella2 provide
privacy through their caching.

Both Gnutella and Gnutella2 are suceptible to spoofed
response IP addresses.

For Gnutella2:
 Gnutella
does not provide authentication of nodes for
querying, thus it is susceptible to request flooding attacks.
 Gnutella
ultrapeers cannot block certain hosts (do not have
query keys or unique request addresses).

For Gnutella:
 Gnutella2
does not provide anonymous queries.
Concluding Remarks

Gnutella has some serious flaws (scalability,
performance, lack of cooperation incentives and
servent abuse).
 Gnutella2
solves all but cooperation incentives.
 Gnutella
with ultrapeers solves scalability and performance,
but the searching algorithm and caching is less sophisticated.

Gnutella2 has many more features outside of this
comparison, primarily being more extensible (yet
specific) to support applications other than file sharing.

Although they are different protocols, Gnutella2 is in
essence, an improved version of Gnutella with
ultrapeers.
Open Problems

Is it possible to create useful cooperation incentives in
a large, truly distributed environment like Gnutella
where peers may reconnect to different hubs upon
connection?
References
1. Jordan Ritter, “Why Gnutella Can't Scale. No, Really.”, http://www.tch.org/gnutella.htm.
2. Eytan Adar and Bernardo A. Huberman, “Free-Riding on Gnutella”,
http://www.firstmonday.org/issues/issue5_10/adar/index.html.
3. Farhad Manjoo, “Gnutella Bandwidth Bandits”, Aug. 8, 2002.
4. RFC-Gnutella 0.6, http://rfc-gnutella.sourceforge.net/developer/testing/index.html.
5. Anurag Singla and Christopher Rohrs, “Ultrapeers: Another Step Towards Gnutella
Scalability”, Version 1.0, http://rfc-gnutella.sourceforge.net/src/Ultrapeers_1.0.html
6. Gnutella vs. Gnutella2, Part 1, http://www.mp3newswire.net/stories/2003/gnutella.html.
7. The Gnutella2 Developers Network. http://www.gnutella2.com.
8. “LimeWire: Network Improvements”,
http://www.limewire.com/developer/net_improvements.html
9. P2P Networking Technologies. URL: http://ntrg.cs.tcd.ie/undergrad/4ba2.02-03/Intro.html.