Analysis of Caching and Replication Strategies for Web

Download Report

Transcript Analysis of Caching and Replication Strategies for Web

Analysis of Caching and
Replication Strategies for Web
Applications
Authors:
Swaminathan Sivasubramaniam,
Guillaume Pierre,
Maarten van Steen.
Presented By
Sudarsan Maddi
Graduate Student
1
Topics That We Will be Seeing…




Introduction
Techniques to scale Web applications
Performance Analysis
Choosing the Right Strategy
Analysis of Caching and Replication
Strategies for Web Applications
2
Introduction


In this paper the authors present qualitative
and quantitative analysis of replication and
caching techniques to host Web applications.
Their analysis shows that selecting the best
mechanism depends heavily on data
workload and application characteristics.
Analysis of Caching and Replication
Strategies for Web Applications
3
Introduction



Web sites are slow dew to many reasons,
one of the main reason is dynamic generation
of web documents.
Web page caching: Fragments of HTML
pages the application generates are cached
to serve future requests.
Content-delivery networks such as Akamai do
this by deploying edge servers around the
Internet thus reducing request’s network
latency.
Analysis of Caching and Replication
Strategies for Web Applications
4
Introduction

Limitations of page caching have given raise
to different approaches for scalable Web
applications, classified broadly into:





Application code replication
Cache database records
Cache query results
Entire Database replication
In this article they have given overview of
various scalable techniques compared and
analyzed their features and performance.
Analysis of Caching and Replication
Strategies for Web Applications
5
Techniques to scale Web Applications

The techniques we are going to see are




Edge Computing
Data Replication
Content-Aware data Caching (CAC)
Content-Blind data Caching (CBC)
Analysis of Caching and Replication
Strategies for Web Applications
6
Edge Computing



In this the application code is replicated at
multiple edge servers and data is centralized.
Akamai and ACDN use this technique.
The data centralization create problems,


If the edge servers are located worldwide, each data
access incurs WAN latency.
The central database becomes a performance
bottleneck if the load increases.
Techniques to Scale Web Applications
7
Data Replication


Solution for Edge computing is to place the
data at each edge server.
Database replication (REPL) techniques can
help maintaining identical copies at multiple
locations.
Continued…
Techniques to Scale Web Applications
8
Data Replication


The problem with this is when there is a
database update.
This creates huge network traffic and
performance overhead.
Techniques to Scale Web Applications
9
Content-Aware data Caching (CAC)



Instead of maintaining full copies of database CAC
systems cache database query results as the
application code issues them.
Query Containment Check: The application running
at the edge-server issues a query, the local
database checks if it has enough data to answer the
query locally.
Containment check results positive query is present
locally, else its sent to central database and inserts
the result in its local database.
Continued…
Techniques to Scale Web Applications
10
An Example of CAC



CAC store query results efficiently
For example:
Query Q1: Select* from items where price<50
Query Q2: Select* from items where price<20
Query template QT1:
“Select* from items where price<”
Analysis of Caching and Replication
Strategies for Web Applications
11
Content-Aware data Caching (CAC)



This query containment check is highly
computationally expensive because it must
check the new query with all previously
cached queries.
In order to reduce this cost CAC makes use
of query template, which is a parameterized
SQL query whose parameter values are
parse at runtime
In, CAC systems update queries is always
executed at the central database.
Techniques to Scale Web Applications
12
Content-Blind data Caching (CBC)



Here, edge servers don’t need to run a
database at all.
Instead they store the results of remote
database queries independently.
The query results aren't merged here storing
redundant information, and will have a hit
only if application issues exact query, so hit
rates are low
Continued….
Techniques to Scale Web Applications
13
Content-Blind data Caching (CBC)

This have some advantages over CAC as,



Incurs very little computational load.
Caching query results as result sets instead of database
records, so can return results immediately.
Finally, inserting a new element into the cache doesn't
require a query rewrite.
Techniques to Scale Web Applications
14
Scalable Web hosting.
(a) edge computing, (b) content-aware caching,
(c) content-blind caching, and (d) data replication.
15
Performance Analysis

To compare the four techniques, they have
made use of two different applications,


RUBBoS, a bulletin-board benchmark application that
models Slashdot.org,
http://jmob.objectweb.org/rubbos.html
TPC-W, an industry-standard e-commerce benchmark
that models an online book store such as Amazon.com,
http://pgfoundry.org/projects/tpc-w-php/
Analysis of Caching and Replication
Strategies for Web Applications
16
Performance Analysis



They have measured the end-to-end client
latency, which is the sum of network latency
and internal latency.
The results shows that CBC performed best
in terms of client latency whereas EC
performed the worst for RUBBoS.
Were as for TPC-W REPL performed the best
and EC worst again.
Analysis of Caching and Replication
Strategies for Web Applications
17
Performance Results
(a) RUBBoS benchmark
(b) TPC-W Browsing
(c) TPC-W Ordering
Analysis of Caching and Replication
Strategies for Web Applications
18
Choosing the Right Strategy




According to the author the Web designers should
choose the scalable technique by carefully
analyzing their Web application characteristics.
They have suggested the best strategy is the one
that minimizes the applications end-to-end client
latency.
This latency is affected by many parameters as hit
ratio, database query execution time, application
server execution time.
To do this they have proposed a concept called
virtual caches (VC).
Continued…
Analysis of Caching and Raeplication
Strategies for Web Applications
19
Choosing the Right Strategy


VC behaves just like a real cache but it stores
only meta data, such as the list of objects in
the cache, sizes. So this requires less
memory compared to real caches.
So with the help of these VC we can get the
hit ratios and execution times for servers and
can estimate end-to-end latency.
Analysis of Caching and Raeplication
Strategies for Web Applications
20
Thank You.
Analysis of Caching and Replication
Strategies for Web Applications
21