Towards a Novel Architecture for Wide-Area Data
Download
Report
Transcript Towards a Novel Architecture for Wide-Area Data
Towards a Novel Architecture
for Wide-Area Data Caching
and Replication
Thuraiappah Vaseeharan and Muthucumaru Maheswaran
Proc. of the International Conference on Internet Computing, 2000
Daehyun, Cho
DB Lab, Div. of CS, Dept. of EECS
2001. 10. 30
Contents
Introduction
Previous Work
Approach
Architecture
Example
Experiments
Conclusion and Future Work
Critiques
2
Introduction
To ensure fast and highly available access to
internet services
Optimally locating data objects and service provision
points in the network is critical.
Currently, caching and replication are widely used.
However, current caching and replication
techniques should be reappraised because of the
followings
Rapid growth of the internet
Increasing variety of clients demanding internet services
Phenomenon of hot-spots
Transient increases in user access patterns
3
Previous Work (Con’t)
Caching
Client-pull
Browser-level caching
per-site proxy servers
caching hierarchies
increase the client latency for popular documents because the
requests have to percolate through a large number of levels.
Some papers [WWW95, SDNE95] document the limitation
of client-pull approaches.
Server-push (Data Dissemination)
This has been shown in paper [SPDP95], the authors are
not aware of any such systems in wide spread use.
4
Previous Work (Con’t)
Replication among servers at a single location
DNS-based request distribution
Sites are forced to configure to handle the peak demand.
Mirroring
Much resources are wasted during periods of average
demand.
require manual effort to setup and maintain the
consistency of data.
The user has to select the mirror site to access.
Rent-A-Server (WebOS project [berkeley98])
dynamically spawn server clones and replicate the server
data in response to changes in server load.
5
Approach
Motivation
To minimize access time and conserve bandwidth, the
dynamically replicated copies of the server must be
located close to hot-spots in client access patterns.
But, currents approaches do not consider the client
access patterns in determining the locations to spawn
the server replicas.
We presents a dynamic caching and replication
architecture that considers the temporal and
geographical spikes in user demand.
Distributed computation of client access patterns
Using the access statistics for replication/migration
6
Approach (Con’t)
Approach
Let’s use the network nodes
To compute the access patterns in a distributed fashion
and sent to the server periodically
The network nodes can actually see the flow of client requests.
Let’s endow objects with intelligence
To make its own migration/replication decisions based on
the access statistics
7
Architecture
Design Goals
Design of a caching/replication system that obtains the
user access patterns from the network to locate the data
objects
The network utilization and access latency are optimized.
Making caching transparent
Clients do not have to choose proxies.
Caches do not have to be configured manually in a
hierarchy.
8
Architecture (Con’t)
Using Active Networks
In case wide-area caching and replication, network
nodes are in an ideal position to determine the location
of hot-spots in the access patterns of objects retrieved
over the network.
9
Architecture (Con’t)
Caching/Replication Model
10
Architecture (Con’t)
Active Data Object (ADO)
Set of related files that may be transferred as a group
contain intelligence to make its own migration/replication
decisions based on the access statistics obtained from
the network.
Active Node
maintain state information about the accesses for
various data objects.
have intelligence to periodically update server with
access statistics for individual objects.
11
Architecture (Con’t)
Process of caching/replication model
1. The arrival of an update triggers the ADOs migration
routine.
2. ADOs migration routine analyzes the traffic information,
and decides whether to migrate to the hot-spot region.
3. The server transfer transfers the ADO to the
corresponding cache in the hot-spot region.
4. The caching server announces the presence of the
ADO to its associated active node.
5. Subsequent request for the object is routed to closer
server by the associated active node.
12
Architecture (Con’t)
Argument on additional overhead for active
networking
Do the additional overhead for active networking
outweigh the benefits for performance gains made by
better replication? No.
Requiring active processing only for requests for data
objects and updates of access statistics keep the
network performance penalty small.
The actual transfer of the data object content between
client and server is performed by application-specific,
not-active protocols (eq, HTTP, FTP)
13
Example
“ADO based Web Caching and Replication”
Implementation
Using Java applets
For identifying hot-spots in the client access patterns
For making the migration/replication decisions
ANTS
Capsule based active networking toolkit written in Java
Capsules carry data and references to the code to b
executed at active nodes
14
Example (Con’t)
ANTS Capsule Types
Request Capsule
Response Capsule
sent from clients to the server during connection initiations.
queries the active nodes it passed through.
sent from the client from the active node when a hit occurs.
conveys the IP address of the ADO server holding the requested
object.
Information Capsule
sent from the active node to the ADO server
when the demand for a data object held by ADO server exceeds the
threshold of a server specified popularity modulus.
Register Capsule
sent from ADO servers to active nodes
to register an object held in the cache.
to refresh object entry periodically.
15
Example (Con’t)
Capsule Processing in an Active Node
16
Example (Con’t)
Capsule Processing (when a object is requested)
Common processing
If a match is not found,
1. When the user input a URL, the client resolves the server name.
2. The client send the IP address and the URL to a local active
network daemon.
3. A local active network daemon creates an request capsule and
forward it towards the home server.
4. When a request capsule arrives at a node, its forwarding routine
queries the activity cache for the requested URL.
5. The request capsule sets up an entry for the requested URL in
the activity cache and increments the access count.
6. The request capsule forwarded towards the home server.
If a match is found,
5. A response capsule is sent back to the client with the IP
address of that server holding the requested URL.
6. The client proceeds to transfer the data.
17
Example (Con’t)
Capsule Processing (which is performed periodically)
1. The active node sends information capsules to the
associated ADO servers with the access statistics.
2. The arrival of the information capsule triggers the ADO
control routine.
analyze the access statistics.
make a decision to migrate to a region of high demand.
3. The server transfers the ADO to the caching server.
4. The server sends an register capsule to the associated
active node to create an entry for the URL in the active node’s
activity cache.
18
Experiments
Measurement
Client latency
Overheads associated with the active networking
Instrument of Experiments
Modified version of the Webpolygraph
polyclt
measure the client latency and the overheads.
request a single object repeatedly at rate of 1 request/second.
polysrv
response with a document selected from an exponential distribution
with a mean size of 13KB.
simulate a wide area link by delaying the response by a normal
distribution with mean 10 seconds and standard deviation 3 second.
19
Experiments (Con’t)
Network Topology
Experiments
Caching threshold = 10
Number of clients = 1, 2, 5
20
Experiments (Con’t)
Result
21
Conclusion and Future Work
Conclusion
propose an Active Networks based architecture for
improving caching/replication that operates by obtaining
statistics from the network node to identify hot-spots in
client access patterns.
Future Work
challenge the replication of dynamically generated
content.
22
Critiques
Good Points
Trial using network nodes to identify hot-spots in client
access patterns
Weak Points
This architecture cannot
achieve good performance
than a classical caching do.
A classical caching may be …
23
Critiques (Con’t)
Weak Points (Con’t)
Previous work is too old. 1995?
The authors don’t describe the problems of previous work well.
The authors have wrong idea about overhead for active
networking.
The authors don’t consider cache consistency.
Poor experiments
in client-pull, server-push, clustered server.
Experiments do not present the excellence of this work.
And, experiments are swindles. (I think ..)
Poor writing (wrong words, wrong sentences..)
Alternatives
Content Delivery Network (CDN)
24