Transcript P2P Search

P2P Search
COP5711
P2P Search Techniques

Centralized P2P systems


Decentralized & unstructured P2P systems


e.g. Gnutella
Hybrid - partially decentralized


e.g. Napster, SETI@home
e.g., Freenet
Structured P2P systems

DHT

CAN
2
P2P Network

P2P network is an overlay
network built on top of a real
physical network (e.g., Internet)

In a P2P network, peers are
network nodes connected by
virtual or logical links

A logical link is a path through
many physical links in the
underlying network
3
Napster: Publish a File
Users upload their IP address and
music titles they wish to share
192.1.2.3
Napster server
(Central Catalog)
4
Napster: Query for a File

Users search for peers to download
desired files
192.1.2.3
Central
Napster server
5
Napster: Transfer Requested File
File transfer is P2P, using a proprietary
protocol
192.1.2.3
Central
Napster server
6
Disadvantage of Centralized
Directory

Performance bottleneck

Single point of failure
Can we do it without a directory ?
7
Decentralized P2P - Gnutella

No catalog

Pings network to locate Gnutella
peers

File requests are broadcast to peers
 Flooding

or breadth-first research
When provider is located, the file is
transferred via HTTP
8
Gnutella: Join the Network
Peers are
Internet
edges
Special peer
maintained by
Gnutella
Who are
my
neighbors ?
9
Gnutella: Broadcast Request to Peers
xyz.mp3 ?
10
Gnutella: Flood the Request
(Breadth-first research)
I have
it.
11
Gnutella: Reply with the File
(via HTTP)
I have
it.
12
Gnutella - Disadvantages

Network flooding - unnecessary
network traffic

Using TTL - some files might not
be found

Alternatively,
 using
ultranodes (or supernodes)
 using depth-first search, i.e., Freenet
13
Morpheus, Kazaa
Flooding only the Supernodes
Supernode
Layer
Center
Index for
its cluster
as
A
h
rH
Pee
ly: “ X”
file
s
E
I
Rep
o ha
Wh
ry: “
Que file X”
F
D
C
G
H
Cluster
Cluster
Download file X from Peer H
B
Cluster
14
Using Ultranodes

Queries flood only the network of
ultranodes

Other peer nodes shielded from query
traffic

Combine the benefits of centralized
and decentralized search;

Take advantage of the heterogeneity in
peer capabilities;
15
Freenet - Depth-First Search
Download file X from Peer E
Query: “Who has file X”
A
E
Pe
ht
ig
m
C le X
er fi
Pe v e
E
ha
er
Pe ”
: “ le X
y
pl s fi
Re ha
mi g
: “I
ply
ht h
e fi
file
le X
X
”
Pe e
rD
m
hav
e fil ight
eX
Rep
ly :
has “Peer
E
file
X”
a ve
hav
B
er E
Re
C
D
16
Freenet – File not Found
A
I have
file X
E
Pe
ht
ig
m
C le X
er fi
Pe a v e
h
er E
NOT FOUND !
mi g
F
ht h
a ve
file
Pe e
rD
m
hav
e fil ight
eX
X
C
B
D

The requested file not found due to a poor
routing decision made at peer D

In this case, query backs out of the dead-end,
and tries another peer in depth-first manner
17
Using Distributed Directory

Data objects are everywhere

Distribute subsets of the data directory
among peers

If we can find the relevant sub-directory,
we can locate the data object
Directory
Sub-directory
Data
Objects
18
How to Bound Search Space ?
Basic Idea - Hashing
Publish (H(y))
P2P Network
Object “y”
Objects have
hash keys
Join (H(x))
Peer “x”
H(y)
H(x)
y
Hash key
Peer nodes also
x have hash keys
in the same
hash space
Place location information about an object at the peer
with closest hash keys (i.e., a distributed directory)
19
Viewed as a Distributed Hash Table
0
Hash table
2128-1
Peer nodes
• Each peer node is responsible for a range of the
hash table, according to the peer hash key
• Location information about Objects are placed in the
peer with the closest key (information redundancy)
20
How to Find an Object ?
Looks for a peer /w the corresponding peer hash key
A
peer knows its logical neighbors
 Find peer X based on multihop routing
 X knows who has the object
Hash
table
Peer
node
0
2128-1
X
Peer Y
has the file
21
Dynamic Hash Table (DHT)
in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
22
DHT in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
23
DHT in action: put()
K V
K V
K V
K V
K V
Want to
share a
file
K V
K V
K V
K V
insert(K1,V1)
K V
K V
Operation: Route message, “I have the file,” to node holding key K1
24
DHT in action: put()
(K1,V1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Operation: take key as input; route messages to node holding key
25
DHT in action: get()
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
retrieve (K1)
Operation: Retrieve message V1 at node holding key K1
26
DHT in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Retrieve file according to V1
27
Still Flooding

Still flood the network although
intermediate nodes do not need to search

Can we avoid flooding ?
28
CAN – Content Addressable Network

Each peer is
responsible for one
zone, i.e., stores all
(key, value) pairs of
the zone

Each peer knows the
neighbors of its zone

Random assignment
of peers to zones at
startup – split zone if
not empty

Dimensional-ordered
multihop routing
29
CAN: Object Publishing
node I::publish(K,V)
I
30
CAN: Object Publishing
x=a
node I::publish(K,V)
I
(1) a = hx(K)
31
CAN: Object Publishing
x=a
node I::publish(K,V)
(1) a = hx(K)
b = hy(K)
I
y=b
32
CAN: Object Publishing
node I::publish(K,V)
(1) a = hx(K)
b = hy(K)
I
J
(2) route (K,V) -> J
33
CAN: Object Publishing
node I::publish(K,V)
(1) a = hx(K)
b = hy(K)
I
J
(K,V)
(2) route (K,V) -> J
(3) J stores (K,V)
34
CAN: Object Retrieval
node I::retrieve(K)
(1) a = hx(K)
b = hy(K)
J
(K,V)
(2) route “retrieve(K)” to J
that is in charge of (a,b)
I
35
Maintenance

Inform neighbors that you are alive at
discrete time interval t

If your neighbor does not send alive
message in time t, takeover its zone
36
P2P Benefits

Efficient use of resources
 Use
unused bandwidth, storage, and processing
power at the edge of the network

Scalability
 Consumers

of resources also donate resources
Reliability
 Replicas,
geographic distribution  No single point of
failure

Ease of administration
 Self
organized nodes
 Built-in reliability and load balancing
37
Some Prototypes at UCF

iSEE (Internet-scale Sensor Exploration
Environement)
 Publishing
 Browsing

real-time sensor data
and querying real-time sensor data
P2P Video Streaming for VoD and Live
Broadcast Applications
38