Replication based on Objects Load under a Content Distribution
Download
Report
Transcript Replication based on Objects Load under a Content Distribution
March 15, 2004
Replication based on Objects Load
under a Content Distribution Network
George Pallis
Konstantinos Stamos
Athena Vakali
Dimitrios Katsaros
Antonis Sidiropoulos
Yannis Manolopoulos
Programming Languages & Software Engineering Lab
Department of Informatics
Aristotle Univ. of Thessaloniki, Greece
http://www.csd.auth.gr/~oswinds
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
1
March 15, 2004
INTRODUCTION
The Problem
• Congested lines, obsolete backbones, multimedia content, increasing
user populations Great Internet Traffic Jam
Solutions
• Increasing Bandwidth
• Web Caching
– temporary storage of objects closer to the consumer
• Web Prefetching
– the process of predicting future requests for Web objects and
bringing those objects into the cache in the background, before an
explicit request is made for them
• Content Distribution Networks (CDNs)
– moving the content to the “edge” of the Internet, closer to the
end-user
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
2
March 15, 2004
Content Distribution Network (CDN)
• A CDN (such as Akamai, Mirror Image etc.) is a
network of cache servers, called surrogate
servers, owned by the same Internet Service
Provider (ISP) that delivers content to users on
behalf of content providers.
• Surrogate servers are typically shared, delivering
content belonging to multiple Web sites.
• The networking functional components of a CDN
include user redirection services, distribution
services, accounting and billing
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
3
March 15, 2004
Content Distribution Network (2)
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
4
March 15, 2004
CDNs Challenging Issues
• Replica/Surrogate server placement
problem
– where should be located the surrogate servers?
• Content selection problem
– which content should be outsourced?
• Content replication problem
– which surrogate servers should replicate the
outsourced content?
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
6
March 15, 2004
Motivation
• We study the Content Replication Problem
– NP-Complete
– Existing heuristic methods for optimally replicating the
“outsourced content” in surrogate servers over a CDN
• Random
– Naive, unscalable approach
• Popularity
– Requires popularity statistics (e.g. users traffic)
• Greedy-single
– Requires popularity statistics, huge memory requirements
• Greedy-global
– Requires popularity statistics, huge memory requirements
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
7
March 15, 2004
Contribution
• We provide a novel strategy for optimally
placing outsourced objects in CDN’s
surrogate servers, integrating both the
network’s latency and the objects’ load.
• We develop
an analytic
simulation
environment to test the efficiency of the
proposed scheme.
– Using real and synthetically generated test data, we
show the robustness and efficiency of the proposed
method which can reap performance benefits better
than an analogous heuristic method.
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
8
March 15, 2004
The il2p (integration of latency and
load object placement) Algorithm
• Main idea
Considering that all the outsourced objects are
initially placed on an origin server, the content
replication problem is separated into two subproblems:
–Choice of the best surrogate server to
replicate an outsourced object (based on the
network’s latency)
–Arrangement priorities for outsourced objects
replication (based on the objects’ load)
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
9
March 15, 2004
The il2p Algorithm (2)
Choice of the best surrogate server to replicate an
outsourced object
• For each outsourced object, we select its optimal
surrogate server such that it minimizes:
N
p k i
K
cos t ( x)
i 1 k 1
N
j 1
Dik ( x)
j
• Dik(x) is the “distance” to a replica of object k from
surrogate server i under the placement x
• the distance reflects the latency (the elapsed time between
when a user issues a request and when it receives the
response)
• N is the number of surrogate servers, K is the number of
outsourced objects, λi is the request rate for surrogate
server i, and pκ is the probability that a client will
request the object k.
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
10
March 15, 2004
The il2p Algorithm (3)
Arrangement priorities for outsourced objects
replication
• From the objects assigned to a single server
we replicate the one which has the maximum
utility value.
Utility_Valuek=loadk*latencyk,where
loadk=access_ratek*sk
– latencyk is the latency that the object k produces
if it is replicated to the surrogate server which
has been determined by the previous step, loadk is
the total load due to object k and access_ratek is
defined as the number of accesses of object k per
unit time.
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
11
March 15, 2004
The il2p Algorithm: The Flowchart
CDN
Infrastructur
e
outsourced
objects
All the “outsourced objects” are stored in the origin
server and all the CDN’s surrogate servers are empty
For each outsourced object, we find which is the best
surrogate server in order to place it (produces the
minimum network latency (min Dik(x)) )
We select from all the pairs of
outsoursed object – surrogate server
that have been occurred in the previous step, the one with the maximum
utility value and thus place this object to that surrogate server
The final Placement
Surrogate
servers
become
full?
Yes
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
12
No
March 15, 2004
Simulation Testbed
• We use trace-driven simulations developing
an analytic simulation environment:
– a system model simulating the CDN infrastructure
(CDNsim)
– a network topology generator
– a Web site generator, modeling file sizes, linkage, etc.
– a client request stream generator capturing the main
characteristics of Web users' behavior
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
14
March 15, 2004
System Model
• We have implemented a simulation model for CDNs
(CDNsim) using the ParaSol library
– CDN networking issues are computed dynamically via the
simulation model
– Provides an implementation as close as possible to the working
TCP/IP protocol
– Uses Bloom filters for memory-efficient CDN simulation
• We consider a CDN infrastructure consisting of
20 surrogate servers
– All the surrogate servers have the same storage capacity
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
15
March 15, 2004
CDNsim: A Simulation Tool for Content
Distribution Networks
http://oswinds.csd.auth.gr/~cdnsim
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
16
March 15, 2004
Network Topology
• Using the GT-ITM internetwork topology
generator, we generate a random network
topology, called Waxman, with a total of
1008 nodes
• Using BGP routing data collected from a
set of
7 geographically-dispersed BGP
peers in April 2000, we construct an ASlevel Internet topology with a total of
3037 nodes
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
17
March 15, 2004
Web Site Generation
• Using the R-MAT tool, we construct Web
graphs (Web sites)
– The R-MAT produces realistic Web graphs capturing
the essence of each graph in only a few parameters
• We create two graphs with varying number
of nodes (objects)
– sparse-density graph (4000 nodes)
– moderate-density graph (3000 nodes)
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
18
March 15, 2004
Request Streams Generation
• Using a requests’ generator, we
generate clients’ transactions
• Given a Web site graph, we generate
transactions as sequences of page
traversals (random walks) upon the
site graph
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
19
March 15, 2004
Performance Evaluation
• Examined Methods
– Random: Assigns the outsourced objects to CDN’s
surrogate servers randomly subjected to the storage
constraints. Both the outsourced object and the
surrogate server are selected by uniform probability.
– Popularity: Each surrogate server stores the most
popular outsourced objects among its clients. The node
sorts the objects in decreasing order of popularity and
stores as many outsourced objects in this order as the
storage constraint allows.
– Lat-cdn: The outsourced objects are placed to surrogate
servers with respect to the total network’s latency,
without taking into account the objects’ popularity. Each
surrogate server stores the outsourced objects which
produce the maximum latency.
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
20
March 15, 2004
il2p for Typical Object Sizes
• Average Response Time for Moderate-density
Web site graphs (3000 objects)
The size of the cache is expressed in terms of the percentage of the
total number of bytes of the Web site
As the cache size increases, the average response time also increases
since the larger in size caches satisfy more requests
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
21
March 15, 2004
il2p for Typical Object Sizes (2)
• Average Response Time for Sparse-density Web
site graphs (4000 objects)
The difference in performance between il2p and the other three
heuristics is quite significant (in most cases around 5% absolute
improvement with respect to lat-cdn and consistently around 25%
absolute improvement with respect to other two heuristics )
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
22
March 15, 2004
il2p Limitations
• Average Response Time for Real Web Site
(Stanford Web site - 281903 Web objects)
We use a different scale for the cache sizes (compared with the previous
ones) due to the large amount of objects of the Stanford Web site
The response times are too small because the majority of objects of
Stanford Web site have very small sizes
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
23
March 15, 2004
Conclusion
• We addressed the content replication
problem for CDNs.
• Our goal is to find an efficient placement
so that when clients fetch objects from
the nearest surrogate server, the average
response time is minimized.
• For the future, we plan to investigate the
content replication problem in CDNs
integrating both caching and replication.
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
24
March 15, 2004
Thank you for
your attention
2nd International Workshop on Challenges
in Web Information Retrieval and Integration (WIRI)
27