Centralized P2P systems
Download
Report
Transcript Centralized P2P systems
P2P Search
COP6731
Advanced Database Systems
P2P Computing
Powerful personal computer
Share computing resources
P2P Computing
Advantages:
Shared infrastructure costs
Highly scalable
No SPOF
censorship-resistance
P2P Search Techniques
Centralized P2P systems
Decentralized & unstructured P2P systems
e.g. Gnutella
Hybrid - partially decentralized
e.g. Napster, SETI@home
e.g., Freenet
Structured P2P systems
DHT systems (CAN/Chord/Pastry/Tapestry)
Skip-list based systems
Napster
MP3 file sharing with a centralized
catalog
Peers hold files
Napster Inc’s servers hold catalog
File transfer is P2P, using a
proprietary protocol
Napster: Publish a File
Users upload their IP address and
music titles they wish to share
Central
Napster server
192.1.2.3
Napster: Query for a File
Users search for peers to download
desired files
192.1.2.3
Central
Napster server
Napster: Transfer Requested File
File transfer is P2P, using a
proprietary protocol
192.1.2.3
Central
Napster server
Disadvantage of Centralized Directory
Performance bottleneck
Single point of failure
Can we do it without a directory ?
Gnutella
No catalog
Pings network to locate Gnutella
peers
File requests are broadcast to peers
Flooding or breadth-first research
When provider is located, the file is
transferred via HTTP
Gnutella: Issue a Request
xyz.mp3 ?
Gnutella: Flood the Request
Gnutella: Reply with the File
Gnutella - Disadvantages
Network flooding - unnecessary
network traffic
Using TTL - some files might not
be found
Alternatively,
using ultranodes (or supernodes)
using depth-first search, i.e.,
Freenet
Morpheus, Kazaa
Supernode
Layer
Center
Index for
its cluster
as
A
h
rH
Pee
ly: “ e X”
fil
s
E
I
Rep
o ha
Wh
ry: “
Que file X”
F
D
C
G
H
Cluster
Cluster
Download file X from Peer H
B
Cluster
Using Ultranodes
Queries flood only the network of
ultranodes
Other peer nodes shielded from query
traffic
Combine the benefits of centralized
and decentralized search;
Take advantage of the heterogeneity
in peer capabilities;
Freenet - Depth-First Search
Download file X from Peer E
Query: “Who has file X”
A
E
Em
: “I
ply
t ha
s fi
le X
”
le X
Pee
rD
h a s mi g h t
file
X
Rep
ly :
has “Peer
E
file
X”
igh
e fi
hav
B
er
Pe
Re
ht
ig
m
C eX
er fil
Pe has
E
er
Pe ”
: “ le X
y
pl s fi
Re ha
C
D
Freenet – File not Found
Download file X from Peer E
A
E
Em
NOT FOUND !
er
Pe
t
igh
m
C eX
er fil
Pe has
I HAVE FILE X !
igh
F
t ha
s fi
Pee
rD
h a s mi g h t
file
X
le X
C
B
D
The requested file not found due to a poor
routing decision made at peer D
In this case, query backs out of the deadend, and tries another peer in depth-first
manner
Structured P2P Systems
DHT-based
Skip-list based
Chord / Pastry / Tapestry: hashbased into single dimensional space
CAN: hash-based into multidimensional space
P-grid: hash-based into virtual
binary search tree
Skipgraph / SkipNet
Index Tree-based
BATON
DHT Design Goals
An “overlay” network with:
Flexible mapping of keys to physical nodes
Data Independence
Small network diameter
Small degree (fan-out)
Local routing decisions
Robustness to churn
Routing flexibility
Proximity
A “storage” or “memory” mechanism with
No guarantees on persistence
Maintenance via soft state
Metrics
Searching/Lookup
Number of hops in searching
Number of messages
Database related metrics:
Total disk I/O
Response Time
Accuracy
Maintenance
Number of hops
Number of messages
How to Bound Search Space ?
Work on
placement!
Network
Basic Idea - Hashing
Publish (H(y))
P2P Network
Object “y”
Objects have
hash keys
Join (H(x))
Peer “x”
H(y)
H(x)
y
Hash key
Peer nodes also
x have hash keys
in the same
hash space
Place object to the peer with closest hash keys
Viewed as a Distributed Hash
Table
0
Hash table
Peer
nodes
Each is responsible for a range of the hash table,
according to the peer hash key
Objects
Note
that are placed in the peer with the closest key
peers are
Internet
edges
Internet
2128-1
How to Find an Object?
Hash
table
0
2128-1
Peer
node
Want to keep onlyone hop to
a few entries! find the object
Simplest idea:
Everyone knows everyone else!
Using Distributed Hash Table (DHT)
Hash
table
Peer
node
0
A peer only needs to know its logical
neighbors
Search based on multihop routing
2128-1
DHT in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Operation: take key as input; route messages to node holding key
DHT in action: put()
K V
K V
K V
K V
K V
K V
K V
K V
K V
insert(K1,V1)
K V
K V
Operation: take key as input; route messages to node holding key
DHT in action: put()
K V
K V
K V
K V
K V
K V
K V
K V
K V
insert(K1,V1)
K V
K V
Operation: take key as input; route messages to node holding key
DHT in action: put()
(K1,V1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Operation: take key as input; route messages to node holding key
DHT in action: get()
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
retrieve (K1)
Operation: take key as input; route messages to node holding key
DHT in action
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
retrieve (K1)
CAN – Content Addressable Network
Each peer is
responsible for one
zone, i.e., stores all
(key, value) pairs of
the zone
Each peer knows the
neighbors of its zone
Random assignment
of peers to zones at
startup
Dimensional-ordered
multihop routing
CAN: Object Publishing
node I::publish(K,V)
I
CAN: Object Publishing
x=a
node I::publish(K,V)
(1) a = hx(K)
I
CAN: Object Publishing
x=a
node I::publish(K,V)
(1) a = hx(K)
b = hy(K)
y=b
I
CAN: Object Publishing
node I::publish(K,V)
(1) a = hx(K)
b = hy(K)
(2) route (K,V) -> J
I
J
CAN: Object Publishing
node I::publish(K,V)
(1) a = hx(K)
b = hy(K)
(2) route (K,V) -> J
(3) J stores (K,V)
I
J
(K,V)
CAN: Object Retrieval
node I::retrieve(K)
J
(1) a = hx(K)
b = hy(K)
(2) route “retrieve(K)” to J
that is in charge of (a,b)
I
(K,V)
Some Research Topics
Content-based Image Retrieval in
P2P
Location Management in P2P
Security Considerations for DHT
P2P Backup
Wireless P2P