Erasure Codes - Department of Electrical Engineering and

Transcript Erasure Codes - Department of Electrical Engineering and

Project Mimir
• A Distributed Filesystem
• Uses Rateless Erasure Codes for
Reliability
• Uses Pastry’s Multicast System Scribe for
Resource discovery and Utilization
Erasure Codes
• an erasure code transforms a message of
n blocks into a message with more than n
blocks, such that the original message can
be recovered from a subset of those
blocks. The fraction of the blocks required
is called the rate, denoted r.
Optimal Erasure Codes
Rateless Erasure Codes
• Also called Fountain Codes
• Are rateless because they can produce an
infinite stream of encoded data.
• Most rateless codes are sub-optimal, a file
of n blocks can be decoded from any m
encoded blocks where m ≥ (1+ε)n.
Luby Tranform
• Low Density
• First rateless erasure code discovered
• Encodes blocks by randomly selecting a
degree d (1≤d≤n) and then uses the XOR
operation on d random un-encoded
blocks.
• Decodes blocks by the principle that
– A XOR B = C
– A XOR C = B
– B XOR C = A
Encoded Block Identification
The decoder must be able to identify an encoded
block’s degree and which blocks were used to
encode it.
• Pass the seed used in the random number
generator to regenerate an identical encoding.
– Additional decoder overhead
– All encoders and decoders must use the same
random number generators
• Attach binary headers to encoded blocks where
each bit represents whether a specific block was
used in the encoding.
– Additional network overhead
– Header bit length equals n
Common Uses of LT Codes
• One way communication protocols over
noisy channels
– File streaming (IPTV & other media)
– Long distance communication
– Satellite communications
– Mobile phone
– High latency network
LT Codes in Mimir
• Encoded blocks striped evenly across a large
network of computers.
• Generate xn encoded data and guaranteed
successful decoding when 1-(2/x) of all network
nodes fail.
• Distributed disk space equals real disk space /x
– 50 computers *100GB each = 5TB
– X=4; distributed disk space =1.25TB
– 100% reliability while at least 25 computers are online
Challenges
• Cannot encode new data unless file is
reconstructed first, high churn network
requires a lot of computation
• Decoding still probabilistic, although
probability is extremely high: greater than
99.99999 at failure limit
Modifications to LT Code
• Modified the LT Code to guarantee each block is
encoded an equal number of times (evenly
saturated). RobuStore also does this.
• Evenly distribute according to block degree
• Modified distribution
– spiking
– n unique blocks with degree 1
– offset distribution
Application level any/multicast
• Issues with Network level multicast
• Uses for multicast
– Publish Subscribe Architecture
– Resource Advertisement/ Discovery
– Mass Content Distribution
Issues with network level
multicast
• Difficult to set up
• Does not handle large numbers of
multicast and anycast groups
• Does not handle very dynamic networks
• Often will not work over the Internet
Content Based Publish
Subscribe
• Allows for expandable network
architectures
• Allows for conditional matching and event
notification
• Allows for fault tolerant networks
• Needs Distributed Multidimensional
Matching algorithm to match publish
subscribe problem to one of
multidimensional indexing
Distributed Multidimensional
Matching
• Requires
– each attribute has a known domain
– a known finest granularity
– a known global order of the attributes
• The mapping
– a d dimensional space S where d = num attributes
– every attribute ai mappes to a dimesion di in S
– S is managed by bst that is a recursive subdivision of S
into regions through (d-1) – dimensional hyperplanes
– each hyperplane divides a region in half.
– each region has a corresponding node n(r) in search tree
DMM Continued
• Each region is addressed by a bit string
called a z-code and is assocated with one
node in the tree
• a subscription s is stored at all leaf nodes
n(ri) in the search tree such that ri
intersects s.
• the information of each node in the tree is
stored at the peer p(r) in the DHT .
• Subscriptions are sent to the root peer and
flow down to appropriate leaf nodes.
Resource Discovery (Topic
Based Matching)
• Manage dynamic distributed resources
• Nodes join groups when they have a
desired resource leave when they no
longer are avalible
• other nodes can request nearby resources
by anycasting/multicasting messages to
the appropriate group.
Implementation
• each group has a key called groupId which maps to a
DHT's ID space.
• Create a group
– send a message to that group id
– nearest node becomes root of spanning group tree
– root then adds the requesting node
• Join
– send a message to the root
– if an intermediate node is part of that spanning tree
add the requesting node as a child and stop
– otherwise keep forwarding the message
Continued
• Leaving the group
– If a node has entries in the children table
• mark as not a member and stop
– otherwise
• send a leave message to it's parent in the tree
• parent then removes node from the children table
• Anycast
– Implemented as a DFS of the group tree
– Load balanced since different requests start
at different nodes
Multicast
• Like anycast but sends to all group
members
• Uses bandwidth on O(N)
• Useful to request data (values) when most
members will have some data to
contribute.
Pastry/Scribe in Mimir
• Pastry's Id routing is used to provide a
network independent routing scheme.
• We have three Topics
– Metadata Controller Topic
• provides security and path to fileid mapping
– Storage Node Topic
• provides a way to list avalible resources (file
storage)
– Client Topic
• provides information on client nodes
File Storage Request
• Send Multi cast request to MDC to add a
file to the system
• MDC sends back to requesting client the
new files id
• Client then multi casts a store request
message
• all storage nodes respond with ip/port data
• Client then connects directly to each
storage node and stripes the encoded
blocks over them evenly.
File Retrieval Request
• Multi cast a GET FILE <FILE Id> message
to storage nodes
• Storage nodes look up what data they
have for that file and send it back to the
requesting Id.
• Client then rebuilds the file as data is
being received
• If the file can not be rebuilt print some
error message.
Advantages of Pastry for Mimir
• P2P network provides a reliable way to
handle a dynamic set of nodes and keep
communication open with no central
communication point
• Multicast helps us quickly attempt to get
and save files

Erasure Codes - Department of Electrical Engineering and

Transcript Erasure Codes - Department of Electrical Engineering and

Directory