Chord-Oxygen - people.csail.mit.edu

Download Report

Transcript Chord-Oxygen - people.csail.mit.edu

Secure Peer-to-Peer File Sharing
Frans Kaashoek, David Karger, Robert
Morris, Ion Stoica, Hari Balakrishnan
http://www.pdos.lcs.mit.edu/chord
MIT Laboratory for Computer Science
1
SFS: a secure global file system
Client
H21
client
H21
Oxygen
Server
MIT
/global/mit/kaashoek/sfs
• One name space for all files
• Global deployment
• Security over untrusted networks
2
SFS results
• Research: how to do server authentication?
– Self-certifying pathnames
– flexible key management
• Complete system available
– www.fs.net
– 65,000 lines of C++ code
– Toolkit for file system research
• System used inside and outside MIT
• Ported to iPAQ
3
New direction:
peer-to-peer file sharing
• How to build distributed systems without
centrally-managed servers?
• Many Oxygen technologies are peer-to-peer
– INS, SFS/Chord, Grid
• Chord is a new elegant primitive for building peerto-peer applications
4
Peer-to-Peer Properties
• Main advantage: decentralized
– No single point of failure in the system
– More robust to random faults or adversaries
– No need for central adminstration
• Main disadvantage: decentralized
– All failures equally important---no “clients”
– Difficult to coordinate use of resources
– No opportunity for central administration
5
Peer-to-Peer Challenges
• Load balancing
– No node should be overloaded
• Coordination
– Agree globally on who responsible for what
• Dynamic network/fault tolerance
– Readjust responsibility as peers come and go
• Scalability
– Rousources per peer must be negligible
6
Peer-to-peer sharing example
Internet
• Internet users share music files
– Share disk storage and network bandwidth
– 10Gb for 1 hour/day
continuous 400Mb
7
Key Primitive: Lookup
insert
find
8
Chord: a P2P Routing Primitive
• Lookup is the key problem
– Given identifier, find responsible machine
• Lookup is not easy:
– GNUtella scales badly---too much lookup work
– Freenet is imprecise---lookups can fail
• Chord lookup provides:
– Good naming semantics and efficiency
– Elegant base for layered features
9
Chord Architecture
• Interface:
–
–
–
–
Lookup(ID)  IP-address
ID might be node name, document ID, etc.
Get IP address of node responsible for ID
Application decides what to do with IP address
• Chord consists of
– Consistent hashing to assign IDs to nodes
– Efficient routing protocol to find right node
– Fast join/leave protocol
10
Chord Properties
• Log(n) lookup messages and table space.
– Log(1,000,000)  20
• Well-defined location for each ID
– No search required
• Natural load balance
• Minimal join/leave disruption
• Does not store documents
– But document store layers easily on top
11
Assignment of Responsibility
12
Consistent Hashing
13
Consistent Hashing
0
60
6
51
47
13
Each node picks
random point on
identifier circle
18
22
42
36
31
14
Consistent Hashing
0
60
6
51
49
47
13
Hash document
ID to identifier
circle
18
22
42
36
31
15
Consistent Hashing
0
60
51
Assign doc
with hash 49
to node 51
49
47
6
Assign ID to
“successor”
node on circle
13
18
22
42
36
31
16
Load Balance
• Each node responsible
for circle segment
between it and
previous node
• But random node
positions mean
previous node close
• So no node
responsible for too
much
Segment for
node 31
22
31
17
Dynamic Network
• To know appropriate successor, must know
identifiers of all nodes on circle
• Requires lots of state per node
• And state must be kept current
• Requires huge number of messages when a
node joins or leaves
18
Successor Pointers
• Each node keeps track
of successor on circle
• To find objects, walk 51
around circle using
successor pointers
• When node joins, 47
notify one node to
update successor
42
• Problem: slow!
60
0
6
13
18
22
36
31
19
Fingers
• Each node keeps
carefully chosen
“fingers”---shortcuts 51
around circle
• For distant ID,
shortcut covers
47
much distance
• Result:
42
– fast lookups
– small tables
60
0
6
13
18
22
36
31
20
Powers of 2
• Node at ID n stores fingers to nodes at IDs
n+1/2, n+1/4, n+1/8, n+1/16….
• log(n) fingers needed into n nodes
• Key fact: whatever current node, some
power of 2 is halfway to target
• Distance to target halves in each step
• log(n) steps suffice to reach target
• log(1,000,000) ~ 20
21
Chord Lookups
60
0
51
6
13
18
47
42
22
36
31
22
Node Join Operations
• Integrate into routing mechanism
– New node finds successor (via lookup)
– Determines fingers (more lookups)
– Total: O(log2(n)) time to join network
• Takes responsibility for certain objects from
successor
– Upcall for application dependent reaction
– E.g., may copy documents from other node
23
Fault Tolerance
• Node failures have 2 problems:
– Lost data
– Corrupted routing (fingers cut off)
• Data solution: replicate:
– Place copies of data at adjacent nodes
– If successor fails, next node becomes successor
• Finger solution: alternate paths
– If finger lost, use different (shorter) finger
– Lookups still fast
24
File sharing with Chord
Client App
(e.g. Browser)
get(key)
put(k, v)
Key/Value
Key/Value
Key/Value
Chord
Chord
Chord
Client
Server
Server
lookup(id)
• Fault tolerance: store values at r successors
• Hot documents: cache values along Chord lookup path
25
• Authentication: self-certifying names (SFS)
Chord Status
•
•
•
•
Working Chord implementation
SFSRO file system layered on top
Prototype deployed at 12 sites around world
Understand design tradeoffs
26
Open Issues
•
•
•
•
•
Network proximity
Malicious data insertion
Malicious Chord table information
Anonymity
Keyword search and indexing
27
Chord Summary
• Chord provides distributed lookup
– Efficient, low-impact join and leave
• Flat key space allows flexible extensions
• Good foundation for peer-to-peer systems
http://www.pdos.lcs.mit.edu/chord
28