placement6-notes

Download Report

Transcript placement6-notes

On the Placement of Web
Server Replicas
Lili Qiu, Microsoft Research
Venkata N. Padmanabhan, Microsoft Research
Geoffrey M. Voelker, UCSD
IEEE INFOCOM’2001, Anchorage, AK, April 2001
1
Outline





Overview
Related work
Our approach
Simulation methodology & results
Summary
2
Motivation

Growing interests in Web
server replicas

replica
replica

Internet
replica
replica
replica


Forms of Web server replicas

Clients
Content
Providers
Exponential growth in Web usage
Content providers want to offer
better service at lower cost
Solution: replication

Mirror sites
Content Distribution Networks
(CDNs)


CDN: a network of servers
Examples: Akamai, Digital Island
3
Placement of Web Server Replicas

Problem specification

Among a set of N potential sites, pick K sites as replicas
to minimize users’ latency or bandwidth usage
Internet
Clients
Content
Providers
4
Related Work



Placement of Web proxies [LGI+99]
Cache location [KRS00]
Placement of Internet instrumentation
[JJJ+00]
5
Our Approach


Model Internet as a graph
Parameterize the graph using measured inputs



# requests generated from each region
Distance between different regions
Map the placement problem onto a graph
optimization problem

Assumption:


Each client uses a single replica that is closest to it
Solve graph optimization problem

Using various approximation algorithms
6
Minimum K-median Problem

8
4
2
5
2
6
3
10
4
6

7

3
2
5
Given a complete graph
G=(V,E), d(j), c(i,j)
d(j): # requests
c(i,j): distance between node
i and j


8



Latency
or hop counts
or other metric to be
optimized
Find a subset V’ V with
|V’| = K s.t. it minimizes
vV minwV’ d(v)c(v,w)
NP-hard problem
7
Placement Algorithms

Tree based algorithm [LGG+99]



Random


Assume the underlying topologies are trees, and
model it as a dynamic programming problem
O(N3M2) for choosing M replicas among N potential
places
Pick the best among several random assignments
Hot spot

Place replicas near the clients that generate the
largest load
8
Placement Algorithms (Cont.)

Greedy algorithm




Calculate costs of assigning clients to replicas
Select replica with lowest cost
Adjust costs based upon assignment, repeat until
done
Super-Optimal algorithm

Lagrangian relaxation + subgradient method
9
Simulation Methodology

Network topology

Randomly generated topologies


Real Internet network topology


AS level topology obtained using BGP routing data
from a set of seven geographically dispersed BGP
peers
Web Workload

Real server traces


Using GT-ITM Internet topology generator
MSNBC, ClarkNet, NASA Kennedy Space Center
Performance Metric

Relative performance: costpractical/costsuper-optimal
10
Simulation Methodology
(Cont.)

Simulate a network of N
nodes (100  N  3000)

Cluster clients using network
aware clustering [KW00]


IP addresses with the same
address prefix belong to a
cluster
A small number of popular
clusters account for most
requests



Top 10, 100, 1000, 3000
clusters account for about
24%, 45%, 78%, and 94% of
the requests respectively
Pick the top N clusters
Map them to different nodes
11
Simulation Methodology
(Cont.)




Random trees
Random graphs
AS-level topologies
Sensitivity to the error in the input
12
Random Tree Topologies
Tree-based algorithm performs well as expected.
Greedy algorithm performs equally as well.
13
Random Graph Topologies
The greedy and hot-spot algorithms
out-perform the tree-based algorithm.
14
Large Random Graph Topologies
The greedy performs the best,
and the hot-spot performs nearly as well.
15
AS-level Internet Topologies
The greedy performs the best,
and the hot-spot performs nearly as well.
16
Effects of Imperfect Knowledge
about Input Data


Predicted workload (using moving window average)
Perfect topology information
Within 5% degradation when using predicted workload
17
Effects of Imperfect Knowledge
about Input Data (Cont.)


Predicted workload (using moving window average)
Noisy topology information

Perturb the distance between two nodes i and j by up to a
factor of 2
Within 15% degradation when using
predicted workload and noisy topology information
18
Summary



One of the first experimental studies on placement of
Web server replicas
Knowledge about client workload and topology is needed
for provisioning replicas
The greedy algorithm performs very well


Within a factor of 1.1 – 1.5 of the super-optimal
Insensitive to noise


The hot spot algorithm performs nearly as well


Stay within a factor of 2 of the super-optimal when the
salted error is a factor of 4
Within a factor of 1.6 – 2 of the super-optimal
Obtaining input data


Moving window average for load prediction
Using BGP router data to obtain topology information
19
Conclusion

Recommend using the greedy
algorithm for deciding the placement
of Web server replicas
20
Acknowledgement



Craig Labovitz
Yin Zhang
Ravi Kumar
21
Comments on greedy
algorithm performance


Worst-case performance: unbounded
Bad example

A full homogeneous binary tree with n=2i leaves
and n caches
0
optimal cost = 0
0
0
greedy cost = (n-1)*d
d

d
d
d
However, the worst-case scenario seems
unlikely to occur in real and random
topologies
22
Simulation Results in
Random Tree Topologies
23
Random Tree Topologies
Tree-based algorithm performs well as expected.
Greedy algorithm performs equally as well.
24
Random Graph Topologies
The greedy and hot-spot algorithms
out-perform the tree-based algorithm.
25
Large Random Graph Topologies
The greedy performs the best,
and the hot-spot performs nearly as well.
26
AS-level Internet Topologies
The greedy performs the best,
and the hot-spot performs nearly as well.
27
Simulation Results in
Real Internet Topologies
28
Obtaining Input Data

Workload


The number of requests generated by popular
client clusters
Stable


Placement algorithm can use moving window average
for predicting load with negligible impact on
performance
Network topology




Propagation delay
Hop count
AS hop count
Internet weather map
29
Placement of Web Server Replicas

Goal

replica
replica
Internet
replica
replica
replica

Minimum K-median
problem

Clients
Placing K replicas to
minimize users’ latency
or bandwidth usage
Content
Providers

Select K servers to
minimize the sum of
assignment costs
NP-hard problem
30