Transcript cfs

Wide-area cooperative storage
with CFS
Overview
•
•
•
•
CFS = Cooperative File System
Peer-to-peer read-only file system
Distributed hash table for block storage
Lookup performed by Chord
Design Overview
• CFS clients contain 3 layers:
– A file system client
– Dhash storage layer
– Chord lookup layer
• CFS servers contain 2 layers:
– Dhash storage layer
– Chord lookup layer
Overview con’t
• Disk blocks:DHash blocks::Disk
Addresses:Block Identifiers
• CFS file systems are read-only as far as
clients are concerned
– can be modified by its publisher
File System Layout
• Insert file system blocks into the CFS system using a
content hash as its identifier
• Then signs the root block with its private key
• Inserts the root block into CFS using the corresponding
public key as its identifier
Publisher Updates
• Updating the file system’s root block to
point to the new data
• Authentication by checking to make sure
that the same key signed both old and new
block
– Timestamps prevent replays of old data
– File systems are updated without changing the
root blocks identifier
CFS properties
•
•
•
•
•
•
•
Decentralized control
Scalability
Availability
Load balance
Persistence
Quotas
Efficiency
Chord Layer
• Same Chord protocol as mentioned earlier
with a few modifications
– Server Selection
– Node ID authentication
Quick Chord Overview
• Consistent Hashing
– Node joins/leaves
• Successor lists
– O(N)
• finger tables
– O(log N)
Server Selection
• Chooses next node to contact from finger table
– Gets the closest node to the destination
– What about network latency?
• Introduced measuring and storing latency in the
finger tables
– Calculated when acquiring finger table entries
– Reasoning: RPCs to different nodes will incur varying
latency so you want to choose on that minimizes this
Node ID authentication
• Idea: network security
• All chord IDs must be in the form h(x)
– H = SHA-1 has function
– x = the nodes IP + virtual node index
• When a new node joins the system
– Existing node will send a message to the new node
– The ID must match the claimed IP + virtual node index
hash to be accepted
DHash Layer
• Handles
–
–
–
–
Storing and retrieving blocks
Distribution
Replication
Caching of blocks
• Uses Chord to locate blocks
• Key CFS design: split each file system into blocks
and distribute those across many servers
Replication
• DHash replicates each block on k servers
immediately after the blocks sucessor
– Why? …even if the block’s successor fails, the
block is still available
– Server independence guaranteed because
location on the ring is determined by hash of IP
not by physical location
Replication con’t
• Could save space by storing coded pieces of
blocks but….storage space is not expected
to be a highly-constrained resource
• Placement of replicas allows a client to
select the replica with the fastest download
– Result from Chord lookup will be the
immediate predecessor to the node with X
– This node’s successor table contains entries for
the latencies of the nearest Y nodes
Caching
• Caching blocks prevents overloading servers with
popular data
• Using Chord…
– Clients contact server closer and closer to the desired
location
– Once source is found or an intermediate cached copy is
found all servers just contacted receive file to cache
• Replaces cached blocks in least-recently-used
order
Load Balance
• Virtual servers…1 real server acting as several
“virtual” servers
– Administrator can configure the number of based on
server’s storage and network capabilities
• Possible Side-effect: creating more hops in Chord
algorithm
– More nodes = more hops
– Solution: allow virtual server’s to look at each other’s
tables
Quotas
• Control amount of data a publisher can inject
– Based on reliable ID of publishers won’t work because
it requires centralized administration
– CFS uses quotas based on IP address of publishers
– Each server imposes a 0.1% limit…so as the capacity
grows the total data amount grows
• Not easy to subvert this system because publishers
must respond to initial confirmation requests
Updates and Deletion
• Only allows the publisher to modify data
– 2 conditions for acceptance
• Marked as a content-hash block  supplied key = SHA-1 hash
of the blocks content
• Marked as a signed block  signed by public key = SHA-1
hash is the block’s CFS key
• No explicit delete
– Publishers must refresh blocks if they want them stored
– CFS deletes blocks that have not been refreshed
recently
Experimental Results – Real Life
• 12 machines over the internet – US, the
Netherlands, Sweden and South Korea
Lookup
• Range of servers
• 10,000 blocks
• 10,000 lookups for
random blocks
• Distribution is
roughly linear on
log plot
…so O(log N)
Load Balance
• Theoretical:
– 64 physical servers
– 1,6 and 24 virtual
servers each
• Actual:
– 10,000 blocks
– 64 actual
– 6 virtual each
Caching
• Single block (1)
• 1,000 server system
• Average without:5.7
• Average: 3.2
– (10 look-ups)
Storage Space Control
• Varying number of
virtual servers
• 7 physical servers
• 1, 2, 4, 8, 16, 32, 64
and 128 virtual
• 10,000 blocks
Effect of Failure
• 1,000 blocks
• 1,000 server system
• Each block has 6
replicas
• Fraction of servers fail
before the stabilization
algorithm is run
Effect of Failure con’t
• Same set-up as before
• X% of servers fail
Conclusion
• Highly scalable, available, secure read-only
file system
• Uses peer-to-peer Chord protocol for
lookup
• Uses replication and caching to achieve
availability and load balance
• Simple but effective protection against
inserting large amount of malicious data