Napster & Gnutella

Download Report

Transcript Napster & Gnutella

Napster & Gnutella
An Overview
1
About Napster
Distributed application allowing users to search and exchange MP3
files.
Written by Shawn Fanning in mid 1999, and remained operational
for two years.
Most Innovative Software (Computer Shopper, Dec 2000)
Best Cyber Tech 2000 (Time Magazine)
Sparked off a host of lawsuits for copyright infringement.
2
Napster Principles

The actual MP3 files are
stored on client machines.

A central server holds metainformation about online
Index
Server
clients along with the files held
by them.

Clients contact Napster Server
to locate files, and contact
each other to download them
3
Remarks

Simple but elegant

Central index server is a bottleneck and a point
of failure

Replication / chaining of servers eased the
bottleneck to some extent.
4
What is KaZaA?
Extension of Napster - uses the FastTrack technology. Powerful nodes
(supernodes) act as local index servers, and client queries are propagated
to other supernodes. Step in the right direction. Second generation P2P
Supernode
Supernode
Supernode
5
What is Gnutella?
?
?
?
Gnutella is a protocol
for distributed search
• peer-to-peer comm
• decentralized model
• No third party lookup
Two stages :
1.
2.
Join Network … later
Use Network
1. Discover other peers
2. Search other peers
6
The Jargon
Each servent is both a client
and a server
Servent: A Gnutella node.
2 Hops
Hops: a hop is a
pass through an
intermediate node
1 Hop
Horizon/TTL: how many hops a
packet can go before it dies
(default setting is 7 in Gnutella)
7
Gnutella Messages
•Ping: used to actively discover hosts on the network. A servent receiving a Ping
descriptor is expected to respond with one or more Pong descriptors.
•Pong: the response to a Ping. Each Pong packet contains a Globally Unique Identifier
(GUID) plus address of servent and information regarding the amount of data it is making
available to the network)
•Query: the primary mechanism for searching the distributed network. A servent receiving
a Query descriptor will respond with a QueryHit if a match is found against its local data
set.
•QueryHit: the response to a Query: contains IP address, GUID and search results
•Push: allows downloading from firewalled servents
8
Gnutella Scenario
Step 0: Join the network
Step 1: Determining who is on the network
• "Ping" packet is used to announce your presence on the network.
• Other peers respond with a "Pong" packet.
• Also forwards your Ping to other connected peers
• A Pong packet also contains:
• an IP address
• port number
• amount of data that peers is sharing
• Pong packets come back via same route
Step 2: Searching
•Gnutella "Query" ask other peers if they have the file you desire (and have an
acceptably fast network connection).
• A Query packet might ask, "Do you have any content that matches the string
‘Homer"?
• Peers check to see if they have matches & respond (if they have any matches) &
send packet to connected peers
• Continues for TTL
Step 3: Downloading
• Peers respond with a “QueryHit” (contains contact info)
• File transfers use direct connection using HTTP protocol’s GET method
• When there is a firewall a "Push" packet is used – reroutes via Push path
9
Remarks
Simple idea - But lacks scalability, since
bandwidth is wasted.
Sometimes, objects that are present may not
be located due to limited TTL.
Various improved search strategies have been
proposed.
10
Efficient search: Gia approach
To improve the scalability of Gnutella on a heterogeneous
Platform, Gia used the following approach:
Flooding was replaced by (biased) random walk
Topology adaptation: high-capacity nodes maintain
more clues via one hop replication. They should be
easily reachable
Flow Control to prevent overloading
11
Search via random walk
Gnutella is a protocol
for distributed search
• peer-to-peer
• decentralized model
Two stages :
1.
2.
Join Network … later
Use Network
1. Discover other peers
2. Search other peers via random walk
12
Topology control
Goals are
(1) high capacity nodes have high degrees and
(2) low capacity nodes are within a short distance
from the high capacity nodes.
S = satisfaction level.
S=0 means dissatisfied (# neighbors < max # according to its capacity),
S=1 means satisfied. To increase the level of satisfaction, each host
picks a node from its cache as neighbor if it is not already a
neighbor, or not marked dead.
13
Topology Control
{To add a neighborY}
If # of neighbors + 1 < max then accept Y as a neighbor
{To drop an existing neighbor for the sake of adding Y}
pick the highest degree current neighbor Z;
If Y has higher capacity or Y has fewer neighbors than Z
then
drop Z, accept Y
else
reject Y
{This ensures we do not drop poorly connected neighbors}
14
One hop replication
Each node keeps track of the indices of the files
at its immediate neighbors. As a result, high capacity/
high degree nodes can provide useful response to a
a large number of queries.
15
Biased random walk
P=2/10
P=5/10
P=3/10
How does it help?
16
Analysis of random walk
One can deploy k random walkers to expedite the search
p =
number of nodes that have the desired object
O =
overhead, i.e. distance traveled by the walkers
to complete the search (that can be successful
or unsuccessful)
T=
TTL (time to live for each walker)
Expected overhead =
1-(1-p)T-1 + (1-p)T-1 .k
p
17
Analysis of random walk
Probability of success = 1-(1-p)
Expected overhead =
kT
1-(1-p)T-1 + (1-p)T-1 . k
p
As k increases, the probability of success increases,
the overhead increases, but the delay decreses!
There is a tradeoff here.
18