Web Service Composition
Download
Report
Transcript Web Service Composition
Scalable ContentAddressable Networks
Prepared by
Kuhan Paramsothy
March 5, 2007
High-Level Overview
Hash tables (map keys to values) are heavily used in building
software applications
The concept of a Content-Addressable Network (CAN)
provides hash table-like functionality on Internet-like scales.
CAN is:
Scalable
Robust/Fault-tolerant
Self-organizing
Low-latency
ECE 1770 – Content-Addressable Networks
Hash Tables and CAN
A data structure that efficiently maps keys onto values
CANs are a form of distributed, Internet-scale hash tables.
ECE 1770 – Content-Addressable Networks
What CAN would do for us
CAN would improve peer-to-peer systems
Napster: the process of locating a file is centralized
Gnutella: decentralized the file location process (network self-organizes
into an application layer mesh)
Requests for files are done through flooding, not scalable, may not find
content
Conclusion: P2P systems need a scalable indexing mechanism
CAN would improve large data repositories
Expensive to scale the central repository, single point of failure
These systems need efficient insertion and retrieval
CAN would create large-scale name resolution services that
don’t use a naming scheme (ie. Not DNS)
No more location-dependent naming schemes
ECE 1770 – Content-Addressable Networks
Basic Operations Performed
On CANs
Basic Operations
Each CAN stores
1.
2.
A piece (called a zone) of the entire hash table
Holds information about a small number of adjacent zones in the table
Routing in a CAN
Insertion (of key,value pairs)
Lookup (of key,value pairs)
Deletion (of key,value pairs)
Done by intermediate CAN nodes towards the CAN node whose zone contains that
key
CAN Design is
Distributed (requires no centralized control or coordination)
Scalable (nodes hold only a small about of information that doesn’t grow with the
network)
Fault-tolerant (nodes can route around failures)
Doesn’t require a naming hierarchy
Is entirely Application Layer
ECE 1770 – Content-Addressable Networks
CAN Design
Centers around a
virtual d-dimensional
Cartesian coordinate
space on a d-torus
At any time, the entire
coordinate space is
dynamically partitioned
among all the nodes in
the system
Each node owns a
distinct zone
ECE 1770 – Content-Addressable Networks
CAN Design (2)
1.
2.
3.
To store a pair, key K1 is mapped to
P via a uniform hash function
The pair is then stored at the node
that owns the zone where P lies
To retrieve an entry corresponding
to K1, any node can apply the same
hash function to map K1 to P and
get the corresponding value
A node learns and maintains the IP
addresses of those nodes that hold
adjoining coordinate zones
Efficient routing is critical to a useful
CAN
ECE 1770 – Content-Addressable Networks
Routing in a CAN
Routing in a Content Addressable Networks works by following the straight line
path through the Cartesian space from source to destination coordinates.
A CAN node maintains a coordinate routing table that holds the IP address and
virtual coordinate zone of each of its immediate neighbors in the coordinate
space.
Average Path Length = (d/4)(n1/d)
Individual Nodes Have 2d Neighbors
Average Path Length Grows As O(n1/d)
ECE 1770 – Content-Addressable Networks
Construction of a CAN Overlay
The entire CAN space is divided amongst the nodes currently in the system
Incremental construction process takes three steps
The new node finds a node already in the CAN
Using the CAN routing mechanisms, finds a node whose zone will be split
The neighbors of the split zone must be notified so that routing can include the new
node
Bootstrapping: There are CAN bootstrap nodes associated to a DNS domain
name
Node Insertion Affects Only O(number of dimensions) existing nodes
ECE 1770 – Content-Addressable Networks
Maintenance of a CAN Overlay
Node Graceful Departure: node explicitly hands over its zone and the
associated (key,value) database to one of its neighbors
Node Abrupt Disappearance: An immediate takeover algorithm ensures
one of the “failed” node’s neighbors takes over the zone
Under normal conditions, a node sends periodic update messages to
each of its neighbors and a list of neighbors and their zone coordinates.
Prolonged absence of an update message from a neighbor signals it’s
failure
ECE 1770 – Content-Addressable Networks
Design Improvements
Basic CAN algorithm provides
Low per-node state (O(d) for a d-dimensional
space)
Short path lengths (O(dn1/d) hops for d
dimensions and n nodes)
The problem is that there are applicationlayer hops, not IP-layer hops
Latency of each hop might be substantial
ECE 1770 – Content-Addressable Networks
Design Improvements (2)
Improvement: Multi-dimensioned Coordinate Spaces
Improvement: Multiple Coordinate Spaces (a.k.a. Multiple Realities)
Increasing the dimensions of the CAN coordinate space reduces the routing path length and path latency for a small increase in the
size of the coordinate routing table
Path Length scales as O(d(n1/d))
Fault-tolerance improves
Maintain multiple independent coordinate spaces with each node in the system being assigned a different zone in the coordinate
space (each coordinate space is a reality)
Fault-tolerance improves
Low per-node state (O(d) for a d-dimensional space)
Short path lengths (O(dn1/d) hops for d dimensions and n nodes)
Which is better?
Increasing the dimensions
ECE 1770 – Content-Addressable Networks
Design Improvements (3)
Improvement: Better CAN Routing Metrics
Have each node measure the network-level round-trip-time RTT to each of
its neighbors. Then route messages accordingly.
Favors lower latency paths and avoids unnecessarily long hops
Improvement: Caching and Replication
A CAN node can maintain a cache of the data keys it recently accessed
A CAN node can replicate the data key at each of its neighboring nodes
Both schemes need an associated time-to-live field, to eventually expire
from the cache
ECE 1770 – Content-Addressable Networks
Related Systems
Domain Name System
CANs are more general than the DNS because DNS closely ties the naming
scheme to the manner in which a name is resolved to an IP address
Peer-to-Peer
A simple example is keys being analogous to a URL
Will improve robustness
Key difference is that content within the CAN can always be located by any
other node because there is a clear “home” (point) in the CAN for that
content and every other node knows what the home is how to reach it
ECE 1770 – Content-Addressable Networks
Discussion
Security?
Better or worse with CAN?
Any Other Design Improvement?
Is The Communication Overhead Significant?
ECE 1770 – Content-Addressable Networks
References
A Scalable Content-Addressable Network, Ratnasamy, University of
California – Berkeley, http://www.sigcomm.org/sigcomm2001/p13ratnasamy.pdf
ECE 1770 – Content-Addressable Networks