Web Service Composition

Download Report

Transcript Web Service Composition

Scalable ContentAddressable Networks
Prepared by
Kuhan Paramsothy
March 5, 2007
High-Level Overview



Hash tables (map keys to values) are heavily used in building
software applications
The concept of a Content-Addressable Network (CAN)
provides hash table-like functionality on Internet-like scales.
CAN is:




Scalable
Robust/Fault-tolerant
Self-organizing
Low-latency
ECE 1770 – Content-Addressable Networks
Hash Tables and CAN

A data structure that efficiently maps keys onto values

CANs are a form of distributed, Internet-scale hash tables.
ECE 1770 – Content-Addressable Networks
What CAN would do for us

CAN would improve peer-to-peer systems

Napster: the process of locating a file is centralized


Gnutella: decentralized the file location process (network self-organizes
into an application layer mesh)



Requests for files are done through flooding, not scalable, may not find
content
Conclusion: P2P systems need a scalable indexing mechanism
CAN would improve large data repositories


Expensive to scale the central repository, single point of failure
These systems need efficient insertion and retrieval
CAN would create large-scale name resolution services that
don’t use a naming scheme (ie. Not DNS)

No more location-dependent naming schemes
ECE 1770 – Content-Addressable Networks
Basic Operations Performed
On CANs

Basic Operations




Each CAN stores
1.
2.

A piece (called a zone) of the entire hash table
Holds information about a small number of adjacent zones in the table
Routing in a CAN


Insertion (of key,value pairs)
Lookup (of key,value pairs)
Deletion (of key,value pairs)
Done by intermediate CAN nodes towards the CAN node whose zone contains that
key
CAN Design is





Distributed (requires no centralized control or coordination)
Scalable (nodes hold only a small about of information that doesn’t grow with the
network)
Fault-tolerant (nodes can route around failures)
Doesn’t require a naming hierarchy
Is entirely Application Layer
ECE 1770 – Content-Addressable Networks
CAN Design


Centers around a
virtual d-dimensional
Cartesian coordinate
space on a d-torus
At any time, the entire
coordinate space is
dynamically partitioned
among all the nodes in
the system

Each node owns a
distinct zone
ECE 1770 – Content-Addressable Networks
CAN Design (2)
1.
2.
3.

To store a pair, key K1 is mapped to
P via a uniform hash function
The pair is then stored at the node
that owns the zone where P lies
To retrieve an entry corresponding
to K1, any node can apply the same
hash function to map K1 to P and
get the corresponding value
A node learns and maintains the IP
addresses of those nodes that hold
adjoining coordinate zones
Efficient routing is critical to a useful
CAN
ECE 1770 – Content-Addressable Networks
Routing in a CAN


Routing in a Content Addressable Networks works by following the straight line
path through the Cartesian space from source to destination coordinates.
A CAN node maintains a coordinate routing table that holds the IP address and
virtual coordinate zone of each of its immediate neighbors in the coordinate
space.
Average Path Length = (d/4)(n1/d)
Individual Nodes Have 2d Neighbors
Average Path Length Grows As O(n1/d)
ECE 1770 – Content-Addressable Networks
Construction of a CAN Overlay


The entire CAN space is divided amongst the nodes currently in the system
Incremental construction process takes three steps




The new node finds a node already in the CAN
Using the CAN routing mechanisms, finds a node whose zone will be split
The neighbors of the split zone must be notified so that routing can include the new
node
Bootstrapping: There are CAN bootstrap nodes associated to a DNS domain
name
Node Insertion Affects Only O(number of dimensions) existing nodes
ECE 1770 – Content-Addressable Networks
Maintenance of a CAN Overlay


Node Graceful Departure: node explicitly hands over its zone and the
associated (key,value) database to one of its neighbors
Node Abrupt Disappearance: An immediate takeover algorithm ensures
one of the “failed” node’s neighbors takes over the zone


Under normal conditions, a node sends periodic update messages to
each of its neighbors and a list of neighbors and their zone coordinates.
Prolonged absence of an update message from a neighbor signals it’s
failure
ECE 1770 – Content-Addressable Networks
Design Improvements

Basic CAN algorithm provides



Low per-node state (O(d) for a d-dimensional
space)
Short path lengths (O(dn1/d) hops for d
dimensions and n nodes)
The problem is that there are applicationlayer hops, not IP-layer hops

Latency of each hop might be substantial
ECE 1770 – Content-Addressable Networks
Design Improvements (2)

Improvement: Multi-dimensioned Coordinate Spaces




Improvement: Multiple Coordinate Spaces (a.k.a. Multiple Realities)





Increasing the dimensions of the CAN coordinate space reduces the routing path length and path latency for a small increase in the
size of the coordinate routing table
Path Length scales as O(d(n1/d))
Fault-tolerance improves
Maintain multiple independent coordinate spaces with each node in the system being assigned a different zone in the coordinate
space (each coordinate space is a reality)
Fault-tolerance improves
Low per-node state (O(d) for a d-dimensional space)
Short path lengths (O(dn1/d) hops for d dimensions and n nodes)
Which is better?

Increasing the dimensions
ECE 1770 – Content-Addressable Networks
Design Improvements (3)

Improvement: Better CAN Routing Metrics



Have each node measure the network-level round-trip-time RTT to each of
its neighbors. Then route messages accordingly.
Favors lower latency paths and avoids unnecessarily long hops
Improvement: Caching and Replication



A CAN node can maintain a cache of the data keys it recently accessed
A CAN node can replicate the data key at each of its neighboring nodes
Both schemes need an associated time-to-live field, to eventually expire
from the cache
ECE 1770 – Content-Addressable Networks
Related Systems

Domain Name System


CANs are more general than the DNS because DNS closely ties the naming
scheme to the manner in which a name is resolved to an IP address
Peer-to-Peer



A simple example is keys being analogous to a URL
Will improve robustness
Key difference is that content within the CAN can always be located by any
other node because there is a clear “home” (point) in the CAN for that
content and every other node knows what the home is how to reach it
ECE 1770 – Content-Addressable Networks
Discussion

Security?



Better or worse with CAN?
Any Other Design Improvement?
Is The Communication Overhead Significant?
ECE 1770 – Content-Addressable Networks
References

A Scalable Content-Addressable Network, Ratnasamy, University of
California – Berkeley, http://www.sigcomm.org/sigcomm2001/p13ratnasamy.pdf
ECE 1770 – Content-Addressable Networks