10-3_p2p CAN_cs218

Download Report

Transcript 10-3_p2p CAN_cs218

Peer-to-Peer Networks and
Distributed Hash Tables
2006
Peer-peer networking
file sharing:
- files are stored at the end user machines (peers)
rather than at a server (C/S), files are transferred
directly between peers.
 leverage:
- P2P is a way to leverage vast amounts of computing
power, storage, and connectivity from personal
computers (PC) distributed around the world.
• Q: What are the new technical challenges?
• Q: What new services/applications enabled?
• Q: Is it just “networking at the application-level”? 2
• Everything old is new again?
•
Napster



•
•
Naptser -- free music over the Internet
Key idea: share the content, storage and bandwidth of
individual (home) users
Model: Each user stores a subset of files; Each user has
access (can download) files from all users in the system
Application-level, client-server protocol (index server)
over point-to-point TCP
How does it work -- four steps:
• Connect to Napster index server
• Upload your list of files (push) to server.
• Give server keywords to search the full list with.
• Select “best” of correct answers. (pings)
Internet
3
Napster: Example
m5
E
m6
F
E?
E
E?
m5
m1
m2
m3
m4
m5
m6
m4
C
A
m1
(machine)
D
A
B
C
D
E
F
B
m3
m2
4
Napster characteristics

•
•
Advantages:
- Simplicity, easy to implement sophisticated
search engines on top of the index system
centralized index server:
• single logical point of failure
• can load balance among servers using DNS
rotation
• potential for congestion
• Napster “in control” (freedom is an illusion)
no security:
• passwords in plain text
• no authentication
5
• no anonymity
Main Challenge


Find where a particular file is stored
Scale: up to hundred of thousands or millions
of machines
- 7/2001: # simultaneous online users:
Napster-160K, Gnutella-40K, Morpheus-300K

Dynamicity: machines can come and go any
time
E
F
D
E?
A
C
B
6
Gnutella





peer-to-peer networking: peer applications
Focus: decentralized method of searching for
files
How to find a file: flood the request
- Send request to all neighbors
- Neighbors recursively multicast the request
- Eventually a machine that has the file
receives the request, and it sends back the
answer
Advantages:
- Totally decentralized, highly robust
Disadvantages:
- Not scalable; the entire network can be
swamped with request (to alleviate this
problem, each request has a TTL)
7
Gnutella: Example

Assume: m1’s neighbors are m2 and m3;
m3’s neighbors are m4 and m5;…
m5
E
m6
F
E
D
E?
E?
m4
E?
E?
C
A
m1
B
m2
m3
8
Gnutella

What we care about:
- How much traffic does one query
generate?
- how many hosts can it support at once?
- What is the latency associated with
querying?
- Is there a bottleneck?

late 2000: only 10% of downloads succeed
- 2001: more than 25% downloads
successful (is this success or failure?)
9
BitTorrent



BitTorrent (BT) is new generation p2p. It can make
download more faster
- The file to be distributed is split up in pieces and an
SHA-1 hash is calculated for each piece
Swarming: Parallel downloads among a mesh of
cooperating peers
- Scalable - capacity increases with increase in
number of peers/downloaders
- Efficient - it utilises a large amount of available
network bandwidth
Tracker
- a central server keeping a list of all peers
participating in the swarm (Handles peer discovery)
10
BitTorrent…. A picture..
Uploader/downloader
Uploader/downloader
Tracker
Uploader/downloader
Uploader/downloader
Uploader/downloader
11
BitTorrent…. A picture..
12
Freenet


Addition goals to file location:
- Provide publisher anonymity, security
- Resistant to attacks – a third party
shouldn’t be able to deny the access to a
particular file (data item, object), even if it
compromises a large fraction of machines
Architecture:
- Each file is identified by a unique identifier
- Each machine stores a set of files, and
maintains a “routing table” to route the
individual requests
13
Data Structure

…
Each node maintains a common stack
- id – file identifier
- next_hop – another node id next_hop file
that store the file id
- file – file identified by id
being stored on local node
Forwarding:
- Each message contains
the file id it is referring to
- If file id stored locally, then stop;
- If not, search for the “closest” id in the
stack, and forward the message to the
corresponding next_hop
…

14
Query



API: file = query(id);
Upon receiving a query for document id
- Check whether the queried file is stored locally
• If yes, return it
• If not, forward the query message
Notes:
- Each query is associated a TTL that is decremented
each time the query message is forwarded; to
obscure distance to originator:
• TTL can be initiated to a random value within
some bounds
• When TTL=1, the query is forwarded with a finite
probability
- Each node maintains the state for all outstanding
queries that have traversed it  help to avoid cycles
- When file is returned, the file is cached along the
15
reverse path
Query Example
query(10)
n2
n1
4 n1 f4
12 n2 f12
5 n3
1
9 n3 f9
4’
4
2
n3
n4
14 n5 f14
13 n2 f13
3 n6
n5
5
4 n1 f4
10 n5 f10
8 n6
3 n1 f3
14 n4 f14
5 n3

Note: doesn’t show file caching on the
reverse path
16
Insert




API: insert(id, file);
Two steps
- Search for the file to be inserted
- If not found, insert the file
Searching: like query, but nodes maintain state
after a collision is detected and the reply is
sent back to the originator
Insertion
- Follow the forward path; insert the file at all
nodes along the path
- A node probabilistically replace the
originator with itself; obscure the true
originator
17
Insert Example

Assume query returned failure along
“blue” path; insert f10
insert(10, f10)
n1
4 n1 f4
12 n2 f12
5 n3
n2
9 n3 f9
n3
3 n1 f3
14 n4 f14
5 n3
n4
n5
14 n5 f14
13 n2 f13
3 n6
4 n1 f4
11 n5 f11
8 n6
18
Insert Example

insert(10, f10)
n1
10 n1 f10
4 n1 f4
12 n2
n2
orig=n1
9 n3 f9
n3
3 n1 f3
14 n4 f14
5 n3
n4
n5
14 n5 f14
13 n2 f13
3 n6
4 n1 f4
11 n5 f11
8 n6
19
Insert Example

n2 replaces the originator (n1) with itself
insert(10, f10)
n1
10 n1 f10
4 n1 f4
12 n2
n2
10 n2 f10
9 n3 f9
orig=n2
n3
10 n2 f10
3 n1 f3
14 n4
n4
n5
14 n5 f14
13 n2 f13
3 n6
4 n1 f4
11 n5 f11
8 n6
20
Insert Example

n2 replaces the originator (n1) with itself
Insert(10, f10)
n1
n2
10 n1 f10
4 n1 f4
12 n2
10 n2 f10
9 n3 f9
n4
n3
10 n4 f10
14 n5 f14
13 n2
n5
10 n4 f10
4 n1 f4
11 n5
10 n2 f10
3 n1 f3
14 n4
21
Freenet Summary


Advantages
- Provides publisher anonymity
- Totally decentralize architecture 
robust and scalable
- Resistant against malicious file deletion
Disadvantages
- Does not always guarantee that a file is
found, even if the file is in the network
23
Solutions to the Location Problem

Goal: make sure that an item (file) identified is
always found
- indexing scheme: used to map file names to their
location in the system
- Requires a scalable indexing mechanism

Abstraction: a distributed hash-table data strctr
- insert(id, item);
- item = query(id);
- Note: item can be anything: a data object, document,
file, pointer to a file…

Proposals
- CAN, Chord, Kademlia, Pastry, Viceroy,
Tapestry, etc
24
Internet-scale hash tables

Hash tables
- essential building block in software systems

Internet-scale distributed hash tables
- equally valuable to large-scale distributed systems?
- peer-to-peer systems
- Napster, Gnutella, Groove, FreeNet,
MojoNation…
- large-scale storage management systems
- Publius, OceanStore, PAST, Farsite, CFS ...
- mirroring on the Web
- Content-Addressable Network (CAN)
- scalable
- operationally simple
- good performance
25
Content Addressable Network (CAN):
basic idea
Interface
- insert(key,value)
- value = retrieve(key)

key (id), value (item)
K V
K V
K V
K V
K V
K V
K V
K V
insert
(K1,V1)
K V
K V
K V
26
CAN: basic idea
(K1,V1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
retrieve (K1)
27
CAN: basic idea



Associate to each node and item a unique id
in an d-dimensional Cartesian space
- key (id) - node/point – zone (d)
Goals
- Scales to hundreds of thousands of nodes
- Handles rapid arrival and failure of nodes
Properties
- Routing table size O(d)
- Guarantees that a file is found in at most
d*n1/d steps, where n is the total number of
nodes
28
CAN: solution




virtual d-dimensional Cartesian coordinate
space
entire space is partitioned amongst all the
nodes
- every node “owns” a zone in the overall
space
abstraction
- can store data at “points” in the space
- can route from one “point” to another
point = node that owns the enclosing zone
29
CAN Example:
Two Dimensional Space




Space divided between
nodes
All nodes cover the entire
space
Each node covers either a
square or a rectangular area
of ratios 1:2 or 2:1
Example:
- Node n1:(1, 2) first node
that joins  cover the
entire space
7
6
5
4
3
n1
2
1
0
0
1
2
3
4
5
6
30
7
CAN Example:
Two Dimensional Space

Node n2:(4, 2) joins 
space is divided
between n1 and n2
7
6
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
31
7
CAN Example:
Two Dimensional Space

Node n3:(3, 5) joins 
space is divided
between n1 and n3
7
6
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
32
7
CAN Example:
Two Dimensional Space

Nodes n4:(5, 5) and
n5:(6,6) join
7
6
n4
n3
5
n5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
33
7
Simple example: To store a pair (K1,V1)
 key K1 is mapped onto a point P in the coordinate
space using a uniform hash function
 The corresponding (key,value) pair is then stored at the
node that owns the zone within which the point P lies
 Data stored in the CAN is addressed by name (i.e. key),
not location (i.e. IP address)
Node I::insert(K,V)
(1) a = hx(K)
b = hy(K)
(2) route(K,V) --> (a,b)
(3) (a,b) stores (K,V)
I
y=b
34
x=a
CAN Example:
Two Dimensional Space

Each item is stored by
the node who owns its
mapping in the space
7
6
5


Nodes: n1:(1, 2); n2:(4,2);
n3:(3, 5); n4:(5,5);n5:(6,6)
Items: f1:(2,3); f2:(5,0);
f3:(2,1); f4:(7,5);
n4
n3
n5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
35
7
Simple example: To retrieve key K1
 Any node can apply the same deterministic hash
function to map K1 onto point P and then retrieve the
corresponding value from the point P
 If the point P is not owned by the requesting node, the
request must be routed through the CAN infrastructure
until it reaches the node in whose zone P lies
node J::retrieve(K)
(1) a = hx(K)
b = hy(K)
(2) route
“retrieve(K)” to (a,b)
(K,V)
y=b
J
36
x=a
CAN: Query/Routing Example





Each node knows its
neighbors in the d-space
Forward query to the
neighbor that is closest
to the query id
Example: assume n1
queries f4
A node only maintains
state for its immediate
neighboring nodes
Can route around some
failures
7
6
n4
n3
5
n5
f4
4
f1
3
n1
2
n2
f3
1
f2
0
0
1
2
3
4
5
6
37
7
CAN: node insertion
Inserting a new node affects only a single other
node and its immediate neighbors
1) discover some node “I” already in CAN
2) pick random point in space
3) I routes to (p,q),
discovers node J
4) split J’s zone in half…
new owns one half
(p,q)
J new
I
new node
38
CAN: Node Failure Recovery



Simple failures
- Know your neighbor’s neighbors
- When a node fails, one of its neighbors
takes over its zone
More complex failure modes
- Simultaneous failure of multiple adjacent
nodes
- Scoped flooding to discover neighbors
- Hopefully, a rare event
Only the failed node’s immediate neighbors
are required for recovery
39
Evaluation

Scalability

Low-latency

Load balancing

Robustness
40
CAN: scalability



For a uniformly partitioned space with n
nodes and d dimensions
- per node, number of neighbors is 2d
- average routing path is (dn1/d)/4 hops
- simulations show that the above results
hold in practice
Can scale the network without increasing
per-node state
Chord/Plaxton/Tapestry/Buzz
- log(n) neighbors with log(n) hops
41
CAN: low-latency


Problem
- latency stretch = (CAN routing delay)
(IP routing delay)
- application-level routing may lead to high
stretch
Solution
- increase dimensions, realities (reduce the
path length)
- Heuristics (reduce the per-CAN-hop latency)
• RTT-weighted routing
• multiple nodes per zone (peer nodes)
• deterministically replicate entries
42
CAN: low-latency
#dimensions = 2
180
160
w/o heuristics
140
w/ heuristics
120
100
80
60
40
20
0
16K
32K
#nodes
65K
131K
43
CAN: low-latency
#dimensions = 10
10
8
w/o heuristics
6
w/ heuristics
4
2
0
16K
32K
#nodes
65K
131K
44
CAN: load balancing

Two pieces
- Dealing with hot-spots
• popular (key,value) pairs
• nodes cache recently requested entries
• overloaded node replicates popular
entries at neighbors
- Uniform coordinate space partitioning
• uniformly spread (key,value) entries
• uniformly spread out routing load
45
CAN: Robustness



Completely distributed
- no single point of failure ( not
applicable to pieces of database when
node failure happens)
Not exploring database recovery (in
case there are multiple copies of
database)
Resilience of routing
- can route around trouble
46
Strengths





More resilient than flooding
broadcast networks
Efficient at locating information
Fault tolerant routing
Node & Data High Availability (w/
improvement)
Manageable routing table size &
network traffic
47
Weaknesses





Impossible to perform a fuzzy search
Susceptible to malicious activity
Maintain coherence of all the indexed
data (Network overhead, Efficient
distribution)
Still relatively higher routing latency
Poor performance w/o improvement
48
Suggestions



Catalog and Meta indexes to
perform search function
Extension to handle mutable content
efficiently for web-hosting
Security mechanism to defense
against attacks
49
Ongoing Work
Topologically-sensitive CAN construction
- Distributed Binning



Goal
- bin nodes such that co-located nodes land
in same bin
Idea
- well known set of landmark machines
- each CAN node, measures its RTT to
each landmark
- orders the landmarks in order of
increasing RTT
CAN construction
- place nodes from the same bin close
50
together on the CAN
Distributed Binning
- 4 Landmarks (placed at 5 hops away
from each other)
- naïve partitioning
#dimensions=2
20
#dimensions=4
w/o binning
 w/ binning
w/o binning
w/ binning
15
10
5
256
1K
4K
256
number of nodes
1K
4K
51
Ongoing Work (cont’d)
CAN Security (Petros Maniatis - Stanford)

-
spectrum of attacks
-
appropriate counter-measures
CAN Usage

-
Application-level Multicast (NGC 2001)
-
Grass-Roots Content Distribution
-
Distributed Databases using CANs
(J.Hellerstein, S.Ratnasamy, S.Shenker,
I.Stoica, S.Zhuang)
52
Summary




CAN
- an Internet-scale hash table
- potential building block in Internet
applications
Scalability
- O(d) per-node state
- average routing path is (dn1/d)/4 hops
Low-latency routing
- simple heuristics help a lot
Robust
- decentralized, can route around trouble
53