Towards a Novel Architecture for Wide-Area Data

Download Report

Transcript Towards a Novel Architecture for Wide-Area Data

Towards a Novel Architecture
for Wide-Area Data Caching
and Replication
Thuraiappah Vaseeharan and Muthucumaru Maheswaran
Proc. of the International Conference on Internet Computing, 2000
Daehyun, Cho
DB Lab, Div. of CS, Dept. of EECS
2001. 10. 30
Contents








Introduction
Previous Work
Approach
Architecture
Example
Experiments
Conclusion and Future Work
Critiques
2
Introduction

To ensure fast and highly available access to
internet services



Optimally locating data objects and service provision
points in the network is critical.
Currently, caching and replication are widely used.
However, current caching and replication
techniques should be reappraised because of the
followings




Rapid growth of the internet
Increasing variety of clients demanding internet services
Phenomenon of hot-spots
Transient increases in user access patterns
3
Previous Work (Con’t)

Caching

Client-pull



Browser-level caching
per-site proxy servers
caching hierarchies



increase the client latency for popular documents because the
requests have to percolate through a large number of levels.
Some papers [WWW95, SDNE95] document the limitation
of client-pull approaches.
Server-push (Data Dissemination)

This has been shown in paper [SPDP95], the authors are
not aware of any such systems in wide spread use.
4
Previous Work (Con’t)

Replication among servers at a single location


DNS-based request distribution
Sites are forced to configure to handle the peak demand.


Mirroring



Much resources are wasted during periods of average
demand.
require manual effort to setup and maintain the
consistency of data.
The user has to select the mirror site to access.
Rent-A-Server (WebOS project [berkeley98])

dynamically spawn server clones and replicate the server
data in response to changes in server load.
5
Approach

Motivation



To minimize access time and conserve bandwidth, the
dynamically replicated copies of the server must be
located close to hot-spots in client access patterns.
But, currents approaches do not consider the client
access patterns in determining the locations to spawn
the server replicas.
We presents a dynamic caching and replication
architecture that considers the temporal and
geographical spikes in user demand.


Distributed computation of client access patterns
Using the access statistics for replication/migration
6
Approach (Con’t)

Approach

Let’s use the network nodes

To compute the access patterns in a distributed fashion
and sent to the server periodically


The network nodes can actually see the flow of client requests.
Let’s endow objects with intelligence

To make its own migration/replication decisions based on
the access statistics
7
Architecture

Design Goals

Design of a caching/replication system that obtains the
user access patterns from the network to locate the data
objects


The network utilization and access latency are optimized.
Making caching transparent


Clients do not have to choose proxies.
Caches do not have to be configured manually in a
hierarchy.
8
Architecture (Con’t)

Using Active Networks

In case wide-area caching and replication, network
nodes are in an ideal position to determine the location
of hot-spots in the access patterns of objects retrieved
over the network.
9
Architecture (Con’t)

Caching/Replication Model
10
Architecture (Con’t)

Active Data Object (ADO)



Set of related files that may be transferred as a group
contain intelligence to make its own migration/replication
decisions based on the access statistics obtained from
the network.
Active Node


maintain state information about the accesses for
various data objects.
have intelligence to periodically update server with
access statistics for individual objects.
11
Architecture (Con’t)

Process of caching/replication model





1. The arrival of an update triggers the ADOs migration
routine.
2. ADOs migration routine analyzes the traffic information,
and decides whether to migrate to the hot-spot region.
3. The server transfer transfers the ADO to the
corresponding cache in the hot-spot region.
4. The caching server announces the presence of the
ADO to its associated active node.
5. Subsequent request for the object is routed to closer
server by the associated active node.
12
Architecture (Con’t)

Argument on additional overhead for active
networking



Do the additional overhead for active networking
outweigh the benefits for performance gains made by
better replication?  No.
Requiring active processing only for requests for data
objects and updates of access statistics keep the
network performance penalty small.
The actual transfer of the data object content between
client and server is performed by application-specific,
not-active protocols (eq, HTTP, FTP)
13
Example


“ADO based Web Caching and Replication”
Implementation

Using Java applets



For identifying hot-spots in the client access patterns
For making the migration/replication decisions
ANTS


Capsule based active networking toolkit written in Java
Capsules carry data and references to the code to b
executed at active nodes
14
Example (Con’t)

ANTS Capsule Types

Request Capsule



Response Capsule



sent from clients to the server during connection initiations.
queries the active nodes it passed through.
sent from the client from the active node when a hit occurs.
conveys the IP address of the ADO server holding the requested
object.
Information Capsule

sent from the active node to the ADO server


when the demand for a data object held by ADO server exceeds the
threshold of a server specified popularity modulus.
Register Capsule

sent from ADO servers to active nodes


to register an object held in the cache.
to refresh object entry periodically.
15
Example (Con’t)

Capsule Processing in an Active Node
16
Example (Con’t)

Capsule Processing (when a object is requested)

Common processing





If a match is not found,



1. When the user input a URL, the client resolves the server name.
2. The client send the IP address and the URL to a local active
network daemon.
3. A local active network daemon creates an request capsule and
forward it towards the home server.
4. When a request capsule arrives at a node, its forwarding routine
queries the activity cache for the requested URL.
5. The request capsule sets up an entry for the requested URL in
the activity cache and increments the access count.
6. The request capsule forwarded towards the home server.
If a match is found,


5. A response capsule is sent back to the client with the IP
address of that server holding the requested URL.
6. The client proceeds to transfer the data.
17
Example (Con’t)

Capsule Processing (which is performed periodically)


1. The active node sends information capsules to the
associated ADO servers with the access statistics.
2. The arrival of the information capsule triggers the ADO
control routine.




analyze the access statistics.
make a decision to migrate to a region of high demand.
3. The server transfers the ADO to the caching server.
4. The server sends an register capsule to the associated
active node to create an entry for the URL in the active node’s
activity cache.
18
Experiments

Measurement



Client latency
Overheads associated with the active networking
Instrument of Experiments

Modified version of the Webpolygraph

polyclt



measure the client latency and the overheads.
request a single object repeatedly at rate of 1 request/second.
polysrv


response with a document selected from an exponential distribution
with a mean size of 13KB.
simulate a wide area link by delaying the response by a normal
distribution with mean 10 seconds and standard deviation 3 second.
19
Experiments (Con’t)

Network Topology

Experiments


Caching threshold = 10
Number of clients = 1, 2, 5
20
Experiments (Con’t)

Result
21
Conclusion and Future Work

Conclusion


propose an Active Networks based architecture for
improving caching/replication that operates by obtaining
statistics from the network node to identify hot-spots in
client access patterns.
Future Work

challenge the replication of dynamically generated
content.
22
Critiques

Good Points


Trial using network nodes to identify hot-spots in client
access patterns
Weak Points

This architecture cannot
achieve good performance
than a classical caching do.
A classical caching may be …
23
Critiques (Con’t)

Weak Points (Con’t)


Previous work is too old. 1995?
The authors don’t describe the problems of previous work well.




The authors have wrong idea about overhead for active
networking.
The authors don’t consider cache consistency.
Poor experiments




in client-pull, server-push, clustered server.
Experiments do not present the excellence of this work.
And, experiments are swindles. (I think ..)
Poor writing (wrong words, wrong sentences..)
Alternatives

Content Delivery Network (CDN)
24