Centralized P2P systems

Download Report

Transcript Centralized P2P systems

P2P Search
COP6731
Advanced Database Systems
P2P Computing

Powerful personal computer
Share computing resources
P2P Computing

Advantages:

Shared infrastructure costs

Highly scalable

No SPOF

censorship-resistance
P2P Search Techniques

Centralized P2P systems


Decentralized & unstructured P2P systems


e.g. Gnutella
Hybrid - partially decentralized


e.g. Napster, SETI@home
e.g., Freenet
Structured P2P systems

DHT systems (CAN/Chord/Pastry/Tapestry)

Skip-list based systems
Napster

MP3 file sharing with a centralized
catalog

Peers hold files

Napster Inc’s servers hold catalog

File transfer is P2P, using a
proprietary protocol
Napster: Publish a File
Users upload their IP address and
music titles they wish to share
Central
Napster server
192.1.2.3
Napster: Query for a File

Users search for peers to download
desired files
192.1.2.3
Central
Napster server
Napster: Transfer Requested File
File transfer is P2P, using a
proprietary protocol
192.1.2.3
Central
Napster server
Disadvantage of Centralized Directory

Performance bottleneck

Single point of failure
Can we do it without a directory ?
Gnutella



No catalog
Pings network to locate Gnutella
peers
File requests are broadcast to peers


Flooding or breadth-first research
When provider is located, the file is
transferred via HTTP
Gnutella: Issue a Request
xyz.mp3 ?
Gnutella: Flood the Request
Gnutella: Reply with the File
Gnutella - Disadvantages



Network flooding - unnecessary
network traffic
Using TTL - some files might not
be found
Alternatively,


using ultranodes (or supernodes)
using depth-first search, i.e.,
Freenet
Morpheus, Kazaa
Supernode
Layer
Center
Index for
its cluster
as
A
h
rH
Pee
ly: “ e X”
fil
s
E
I
Rep
o ha
Wh
ry: “
Que file X”
F
D
C
G
H
Cluster
Cluster
Download file X from Peer H
B
Cluster
Using Ultranodes




Queries flood only the network of
ultranodes
Other peer nodes shielded from query
traffic
Combine the benefits of centralized
and decentralized search;
Take advantage of the heterogeneity
in peer capabilities;
Freenet - Depth-First Search
Download file X from Peer E
Query: “Who has file X”
A
E
Em
: “I
ply
t ha
s fi
le X
”
le X
Pee
rD
h a s mi g h t
file
X
Rep
ly :
has “Peer
E
file
X”
igh
e fi
hav
B
er
Pe
Re
ht
ig
m
C eX
er fil
Pe has
E
er
Pe ”
: “ le X
y
pl s fi
Re ha
C
D
Freenet – File not Found
Download file X from Peer E
A
E
Em
NOT FOUND !
er
Pe
t
igh
m
C eX
er fil
Pe has
I HAVE FILE X !
igh
F
t ha
s fi
Pee
rD
h a s mi g h t
file
X
le X
C
B
D


The requested file not found due to a poor
routing decision made at peer D
In this case, query backs out of the deadend, and tries another peer in depth-first
manner
Structured P2P Systems

DHT-based




Skip-list based


Chord / Pastry / Tapestry: hashbased into single dimensional space
CAN: hash-based into multidimensional space
P-grid: hash-based into virtual
binary search tree
Skipgraph / SkipNet
Index Tree-based

BATON
DHT Design Goals

An “overlay” network with:








Flexible mapping of keys to physical nodes
 Data Independence
Small network diameter
Small degree (fan-out)
Local routing decisions
Robustness to churn
Routing flexibility
Proximity
A “storage” or “memory” mechanism with


No guarantees on persistence
Maintenance via soft state
Metrics

Searching/Lookup



Number of hops in searching
Number of messages
Database related metrics:
Total disk I/O
 Response Time
 Accuracy


Maintenance


Number of hops
Number of messages
How to Bound Search Space ?
Work on
placement!
Network
Basic Idea - Hashing
Publish (H(y))
P2P Network
Object “y”
Objects have
hash keys
Join (H(x))
Peer “x”
H(y)
H(x)
y
Hash key
Peer nodes also
x have hash keys
in the same
hash space
Place object to the peer with closest hash keys
Viewed as a Distributed Hash
Table
0
Hash table
Peer
nodes
Each is responsible for a range of the hash table,
according to the peer hash key
Objects
Note
that are placed in the peer with the closest key
peers are
Internet
edges
Internet
2128-1
How to Find an Object?
Hash
table
0
2128-1
Peer
node
Want to keep onlyone hop to
a few entries! find the object
Simplest idea:
Everyone knows everyone else!
Using Distributed Hash Table (DHT)


Hash
table
Peer
node
0
A peer only needs to know its logical
neighbors
Search based on multihop routing
2128-1
DHT in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Operation: take key as input; route messages to node holding key
DHT in action: put()
K V
K V
K V
K V
K V
K V
K V
K V
K V
insert(K1,V1)
K V
K V
Operation: take key as input; route messages to node holding key
DHT in action: put()
K V
K V
K V
K V
K V
K V
K V
K V
K V
insert(K1,V1)
K V
K V
Operation: take key as input; route messages to node holding key
DHT in action: put()
(K1,V1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Operation: take key as input; route messages to node holding key
DHT in action: get()
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
retrieve (K1)
Operation: take key as input; route messages to node holding key
DHT in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
retrieve (K1)
CAN – Content Addressable Network




Each peer is
responsible for one
zone, i.e., stores all
(key, value) pairs of
the zone
Each peer knows the
neighbors of its zone
Random assignment
of peers to zones at
startup
Dimensional-ordered
multihop routing
CAN: Object Publishing
node I::publish(K,V)
I
CAN: Object Publishing
x=a
node I::publish(K,V)
(1) a = hx(K)
I
CAN: Object Publishing
x=a
node I::publish(K,V)
(1) a = hx(K)
b = hy(K)
y=b
I
CAN: Object Publishing
node I::publish(K,V)
(1) a = hx(K)
b = hy(K)
(2) route (K,V) -> J
I
J
CAN: Object Publishing
node I::publish(K,V)
(1) a = hx(K)
b = hy(K)
(2) route (K,V) -> J
(3) J stores (K,V)
I
J
(K,V)
CAN: Object Retrieval
node I::retrieve(K)
J
(1) a = hx(K)
b = hy(K)
(2) route “retrieve(K)” to J
that is in charge of (a,b)
I
(K,V)
Some Research Topics

Content-based Image Retrieval in
P2P

Location Management in P2P

Security Considerations for DHT

P2P Backup

Wireless P2P