Transcript P2P Search
P2P Search
COP5711
P2P Search Techniques
Centralized P2P systems
Decentralized & unstructured P2P systems
e.g. Gnutella
Hybrid - partially decentralized
e.g. Napster, SETI@home
e.g., Freenet
Structured P2P systems
DHT
CAN
2
P2P Network
P2P network is an overlay
network built on top of a real
physical network (e.g., Internet)
In a P2P network, peers are
network nodes connected by
virtual or logical links
A logical link is a path through
many physical links in the
underlying network
3
Napster: Publish a File
Users upload their IP address and
music titles they wish to share
192.1.2.3
Napster server
(Central Catalog)
4
Napster: Query for a File
Users search for peers to download
desired files
192.1.2.3
Central
Napster server
5
Napster: Transfer Requested File
File transfer is P2P, using a proprietary
protocol
192.1.2.3
Central
Napster server
6
Disadvantage of Centralized
Directory
Performance bottleneck
Single point of failure
Can we do it without a directory ?
7
Decentralized P2P - Gnutella
No catalog
Pings network to locate Gnutella
peers
File requests are broadcast to peers
Flooding
or breadth-first research
When provider is located, the file is
transferred via HTTP
8
Gnutella: Join the Network
Peers are
Internet
edges
Special peer
maintained by
Gnutella
Who are
my
neighbors ?
9
Gnutella: Broadcast Request to Peers
xyz.mp3 ?
10
Gnutella: Flood the Request
(Breadth-first research)
I have
it.
11
Gnutella: Reply with the File
(via HTTP)
I have
it.
12
Gnutella - Disadvantages
Network flooding - unnecessary
network traffic
Using TTL - some files might not
be found
Alternatively,
using
ultranodes (or supernodes)
using depth-first search, i.e., Freenet
13
Morpheus, Kazaa
Flooding only the Supernodes
Supernode
Layer
Center
Index for
its cluster
as
A
h
rH
Pee
ly: “ X”
file
s
E
I
Rep
o ha
Wh
ry: “
Que file X”
F
D
C
G
H
Cluster
Cluster
Download file X from Peer H
B
Cluster
14
Using Ultranodes
Queries flood only the network of
ultranodes
Other peer nodes shielded from query
traffic
Combine the benefits of centralized
and decentralized search;
Take advantage of the heterogeneity in
peer capabilities;
15
Freenet - Depth-First Search
Download file X from Peer E
Query: “Who has file X”
A
E
Pe
ht
ig
m
C le X
er fi
Pe v e
E
ha
er
Pe ”
: “ le X
y
pl s fi
Re ha
mi g
: “I
ply
ht h
e fi
file
le X
X
”
Pe e
rD
m
hav
e fil ight
eX
Rep
ly :
has “Peer
E
file
X”
a ve
hav
B
er E
Re
C
D
16
Freenet – File not Found
A
I have
file X
E
Pe
ht
ig
m
C le X
er fi
Pe a v e
h
er E
NOT FOUND !
mi g
F
ht h
a ve
file
Pe e
rD
m
hav
e fil ight
eX
X
C
B
D
The requested file not found due to a poor
routing decision made at peer D
In this case, query backs out of the dead-end,
and tries another peer in depth-first manner
17
Using Distributed Directory
Data objects are everywhere
Distribute subsets of the data directory
among peers
If we can find the relevant sub-directory,
we can locate the data object
Directory
Sub-directory
Data
Objects
18
How to Bound Search Space ?
Basic Idea - Hashing
Publish (H(y))
P2P Network
Object “y”
Objects have
hash keys
Join (H(x))
Peer “x”
H(y)
H(x)
y
Hash key
Peer nodes also
x have hash keys
in the same
hash space
Place location information about an object at the peer
with closest hash keys (i.e., a distributed directory)
19
Viewed as a Distributed Hash Table
0
Hash table
2128-1
Peer nodes
• Each peer node is responsible for a range of the
hash table, according to the peer hash key
• Location information about Objects are placed in the
peer with the closest key (information redundancy)
20
How to Find an Object ?
Looks for a peer /w the corresponding peer hash key
A
peer knows its logical neighbors
Find peer X based on multihop routing
X knows who has the object
Hash
table
Peer
node
0
2128-1
X
Peer Y
has the file
21
Dynamic Hash Table (DHT)
in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
22
DHT in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
23
DHT in action: put()
K V
K V
K V
K V
K V
Want to
share a
file
K V
K V
K V
K V
insert(K1,V1)
K V
K V
Operation: Route message, “I have the file,” to node holding key K1
24
DHT in action: put()
(K1,V1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Operation: take key as input; route messages to node holding key
25
DHT in action: get()
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
retrieve (K1)
Operation: Retrieve message V1 at node holding key K1
26
DHT in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Retrieve file according to V1
27
Still Flooding
Still flood the network although
intermediate nodes do not need to search
Can we avoid flooding ?
28
CAN – Content Addressable Network
Each peer is
responsible for one
zone, i.e., stores all
(key, value) pairs of
the zone
Each peer knows the
neighbors of its zone
Random assignment
of peers to zones at
startup – split zone if
not empty
Dimensional-ordered
multihop routing
29
CAN: Object Publishing
node I::publish(K,V)
I
30
CAN: Object Publishing
x=a
node I::publish(K,V)
I
(1) a = hx(K)
31
CAN: Object Publishing
x=a
node I::publish(K,V)
(1) a = hx(K)
b = hy(K)
I
y=b
32
CAN: Object Publishing
node I::publish(K,V)
(1) a = hx(K)
b = hy(K)
I
J
(2) route (K,V) -> J
33
CAN: Object Publishing
node I::publish(K,V)
(1) a = hx(K)
b = hy(K)
I
J
(K,V)
(2) route (K,V) -> J
(3) J stores (K,V)
34
CAN: Object Retrieval
node I::retrieve(K)
(1) a = hx(K)
b = hy(K)
J
(K,V)
(2) route “retrieve(K)” to J
that is in charge of (a,b)
I
35
Maintenance
Inform neighbors that you are alive at
discrete time interval t
If your neighbor does not send alive
message in time t, takeover its zone
36
P2P Benefits
Efficient use of resources
Use
unused bandwidth, storage, and processing
power at the edge of the network
Scalability
Consumers
of resources also donate resources
Reliability
Replicas,
geographic distribution No single point of
failure
Ease of administration
Self
organized nodes
Built-in reliability and load balancing
37
Some Prototypes at UCF
iSEE (Internet-scale Sensor Exploration
Environement)
Publishing
Browsing
real-time sensor data
and querying real-time sensor data
P2P Video Streaming for VoD and Live
Broadcast Applications
38