Gnutella2: A Better Gnutella?
Download
Report
Transcript Gnutella2: A Better Gnutella?
Gnutella2: A Better
Gnutella?
COMP 5204: Data Networks
Julie Thorpe
School of Computer Science
Carleton University
Introduction
Gnutella and Gnutella2 are P2P application protocols.
Gnutella has an interesting history – essentially a
reverse engineered Beta version.
Changes
governed by Gnutella Developers Forum (GDF).
Gnutella2 is a completely different protocol than
Gnutella, claiming it's what Gnutella should have been.
The
GDF reject this claim, and refuse to call Gnutella2 by its
name – they instead call it “Mike's Protocol”.
Unclear whether Gnutella2 better than Gnutella.
My
project goal is to compare these protocols to determine
which is theoretically better.
Presentation Outline
Gnutella review
Gnutella's problems
Gnutella2 review
Comparison:
Network
architecture
Searching
algorithms
Cooperation
incentives
Security
Concluding remarks
Gnutella Review (1)
Purely decentralized, simple protocol for file
sharing.
Runs over TCP/IP connections.
New node enters Gnutella network by connecting
to a known server (sends Ping), and when server
responds (sends Pong) they are now connected
peers.
Learns of other nodes if server forwards Ping to its
peers (and gets a Pong back in response).
Gnutella Review (2)
Has no way to advertise
files.
Peers find files by “flooding”
with query requests (stop
after TTL hops).
Query responses are
routed back along the same
path as the request arrived
upon.
Addressed by GUIDs.
Gnutella's Problems
Scalability.
Performance.
Lack of cooperation incentives.
Abuse by servents (Gnutella client/server program).
Gnutella's Problems - Scalability
• For a node to be reached by a query, it (and all other
nodes on the path to it) must be forwarded the
request!
• The reach is determined by n (# connections to other
hosts) and TTL (# hops each request is permitted to
take):
TTL
(n 1)
t 1
.n
t 1
• Assumption: nodes all have the same n and TTL.
• Ritter [1] estimated that for reasonable parameters, to
achieve a reach of 106 nodes, Gnutella nodes must
have a bandwidth between 19.2 and 64 Gbps!
Gnutella's Problems - Performance
Messages are forwarded
through many other peers on the
network.
Connection speeds are
effectively restricted to the
bandwidth of the slowest peer
along the route.
If A and B have high-speed
connections, and C has a
modem connection, the
download rate between A and B
is limited to C's speed.
Gnutella's Problems – Lack of
Cooperation Incentives
One study found that over 70% of users shared no
files, and 50% of all responses are returned by the
top 1% [2] .
Implications of free-riders on Gnutella:
Increased
search horizon (farthest set of hosts reachable
by a search request, directly related to its TTL).
The
top 1% that is providing most files reaches
connection saturation.
This generally a difficult problem to solve for purely
decentralized systems.
Gnutella's Problems – Abuse by
Servents
Since it's a protocol, implementations can implement
the Gnutella protocol as they please (in theory).
Servents can act selfishly to improve their performance
Increasing
TTL to increase search horizon (generating
geometrically higher # messages).
Frequent
network).
re-querying (generating more messages, degrading
Gnutella2
GDF recommended (in version
0.6) “ultrapeers” to improve
Gnutella's performance and
scalability.
Gnutella2
enforces a variation of
this to improve performance and
scalability.
Decentralized, 2-tier hierarchy
of peers (“leaf nodes” and
“hubs”).
Other important differences will
come out in comparison.
Comparison: Gnutella vs. Gnutella2
Network architecture
Searching algorithms
Cooperation incentives
Security
Gnutella's Network Architecture
Recall it is completely
decentralized.
Gnutella2's Network Architecture
Decentralized, 2-tier.
This architecture is
recommended for Gnutella in
v0.6.
New node enters by connecting
to a known hub (almost identical
to Gnutella's handshake).
Hubs typically accept 300-500
leaves, and connect to 5-30
other hubs.
Leaves typically connect to 3
hubs.
Comparison – Network Architecture
Gnutella's purely decentralized is much simpler, but
not as scalable.
No real difference between Gnutella's v0.6
“ultrapeer” structure and Gnutella2's “hub”
structure.
Either
of these strategies should reduce searching traffic
as explained in searching algorithms.
Gnutella's Searching Algorithm
Recall that using the purely decentralized version, packets are
flooded throughout the network.
If the v0.6 ultrapeer recommendation is implemented, searching
is optimized using Query hash tables(QHTs).
A QHT
is maintained by each node, and describes the
content it is sharing.
An
ultrapeer maintains an aggregate of its leaf's QHTs and its
own QHT.
Searches
are performed by forwarding a query to an
ultrapeer, who checks its aggregate QHT for a match.
If there is a match, the query is forwarded to the
appropriate leaf, otherwise the query is forwarded to
neighbouring ultrapeers by “flooding”.
Gnutella2's Searching Algorithm
Ultrapeers are called “hubs”.
Uses a QHT like Gnutella, but if a hub cannot match a
query to its aggregate QHT, it checks a set of caches:
Each
hub maintains a cached copy of each neighbouring
hub's aggregate QHT.
Upon
a search miss, a hub will try to match the query against
its cached copies of its neighbours QHTs.
If
the query matches, it will forward the query once, and the
node that receives the query processes it and directly sends
the result back to the client.
If
no match is made, the searching client will continue at
another untried hub.
Comparison – Searching Algorithms
Both Gnutella with Ultrapeers and Gnutella2
significantly reduce the number of messages.
Should
increase performance and scalability.
Gnutella2's method further reduces the number of
messages sent for a query:
Less
query request messages due to caching neighbour's
QHTs.
Less
response messages since Gnutella2's method allows
the responding node to directly contact the requesting node,
rather than sending the message back through the path to
get there.
Comparison: Cooperation Incentives
Neither Gnutella nor Gnutella2 specify cooperation
incentives.
Implementations often will not allow connections to
network unless you share something, and by default
make downloads shared.
The problem of cooperation incentives in a
decentralized environment is interesting, since nodes
can avoid connecting to those that have a profile of
their behaviour.
Gnutella's Security
Gnutella's query messages are routed through peers,
and the query does not contain the querying node's IP
address, but a Globally Unique Identifier (GUID).
Provides
anonymity by masking requester's identity.
Denial-of-service (DOS) attacks are possible by
flooding the network with many requests with a fake
GUID.
Another node could be similarly DOS'ed if a GUID for
one of their request GUIDs is known.
Response IP addresses could be spoofed and
malicious content provided.
Gnutella2's Security
Gnutella2 does not use GUIDs for queries
Sends
the response directly back to the requesting node.
The QHTs do not contain information about the content
stored on a neighbouring node, providing privacy.
Queries make use of query keys (to verify the query
return address is that of the original sender).
Prevents
malicious users from sending out queries for the
purpose of flooding the network with spoofed requests.
Search
clients only permitted to query a hub after obtaining a
“query key”, which are unique to each (hub, search client
return address) from it to include in the transmission.
Comparison - Security
Both Gnutella with ultrapeers and Gnutella2 provide
privacy through their caching.
Both Gnutella and Gnutella2 are suceptible to spoofed
response IP addresses.
For Gnutella2:
Gnutella
does not provide authentication of nodes for
querying, thus it is susceptible to request flooding attacks.
Gnutella
ultrapeers cannot block certain hosts (do not have
query keys or unique request addresses).
For Gnutella:
Gnutella2
does not provide anonymous queries.
Concluding Remarks
Gnutella has some serious flaws (scalability,
performance, lack of cooperation incentives and
servent abuse).
Gnutella2
solves all but cooperation incentives.
Gnutella
with ultrapeers solves scalability and performance,
but the searching algorithm and caching is less sophisticated.
Gnutella2 has many more features outside of this
comparison, primarily being more extensible (yet
specific) to support applications other than file sharing.
Although they are different protocols, Gnutella2 is in
essence, an improved version of Gnutella with
ultrapeers.
Open Problems
Is it possible to create useful cooperation incentives in
a large, truly distributed environment like Gnutella
where peers may reconnect to different hubs upon
connection?
References
1. Jordan Ritter, “Why Gnutella Can't Scale. No, Really.”, http://www.tch.org/gnutella.htm.
2. Eytan Adar and Bernardo A. Huberman, “Free-Riding on Gnutella”,
http://www.firstmonday.org/issues/issue5_10/adar/index.html.
3. Farhad Manjoo, “Gnutella Bandwidth Bandits”, Aug. 8, 2002.
4. RFC-Gnutella 0.6, http://rfc-gnutella.sourceforge.net/developer/testing/index.html.
5. Anurag Singla and Christopher Rohrs, “Ultrapeers: Another Step Towards Gnutella
Scalability”, Version 1.0, http://rfc-gnutella.sourceforge.net/src/Ultrapeers_1.0.html
6. Gnutella vs. Gnutella2, Part 1, http://www.mp3newswire.net/stories/2003/gnutella.html.
7. The Gnutella2 Developers Network. http://www.gnutella2.com.
8. “LimeWire: Network Improvements”,
http://www.limewire.com/developer/net_improvements.html
9. P2P Networking Technologies. URL: http://ntrg.cs.tcd.ie/undergrad/4ba2.02-03/Intro.html.