Transcript job-talk

Efficient and Adaptive Replication
using Content Clustering
Yan Chen
EECS Department
UC Berkeley
Motivation
• The Internet has evolved to become a
commercial infrastructure for service delivery
– Web delivery, VoIP, streaming media …
• Challenges for Internet-scale services
–
–
–
–
Scalability: 600M users, 35M Web sites, 2.1Tb/s
Efficiency: bandwidth, storage, management
Agility: dynamic clients/network/servers
Security, etc.
• Focus on content delivery - Content
Distribution Network (CDN)
– Totally 4 Billion Web pages, daily growth of 7M pages
– Annual traffic growth of 200% for next 4 years
How CDN Works
New Challenges for CDN
• Large multimedia files ― Efficient replication
• Dynamic content ― Coherence support
• Network congestion/failures
― Scalable network monitoring
Existing CDNs Fail to Address these
Challenges
No coherence
for dynamic
content
X
Unscalable
network
monitoring O(M × N)
M: # of client
groups, N: #
of server
Non-cooperative
replication
inefficient
SCAN: Scalable Content Access Network
Provisioning (replica placement)
Access/Deployment Mechanisms
Non-cooperative Cooperative
Pull
Existing CDNs
Push
SCAN
Coherence Support
Unicast App-level
IP
multicast multicast
on P2P DHT
Granularity
Per
Per
Per
Website cluster object
Network Monitoring
Ad hoc
pair-wise
monitoring
O(M×N)
Tomography
-based
monitoring
O(M+N)
SCAN
Coherence for
dynamic
content
s1
s4
s5
s1, s4, s5
Cooperative
clustering-based
replication
SCAN
Coherence for
dynamic
content
X
Scalable
network
monitoring O(M+N)
M: # of client
groups, N: #
of server
farms
s1, s4, s5
Cooperative
clustering-based
replication
Evaluation of Internet-scale Systems
iterate
Analytical
evaluation
Algorithm
design
Real
Realistic
evaluation?
simulation
• Network topology
• Web workload
• Network end-to-end latency
measurement
Network Topology and Web Workload
• Network Topology
– Pure-random, Waxman & transit-stub synthetic topology
– An AS-level topology from 7 widely-dispersed BGP peers
• Web Workload
Web
Site
Period
Duration
# Requests
avg –min-max
# Clients
avg –min-max
# Client groups
avg –min-max
MSNBC
Aug-Oct/1999
10–11am
1.5M–642K–1.7M
129K–69K–150K
15.6K-10K-17K
NASA
Jul-Aug/1995
All day
79K-61K-101K
5940-4781-7671
2378-1784-3011
– Aggregate MSNBC Web clients with BGP prefix
» BGP tables from a BBNPlanet router
– Aggregate NASA Web clients with domain names
– Map the client groups onto the topology
Network E2E Latency Measurement
• NLANR Active Measurement Project data set
– 111 sites on America, Asia, Australia and Europe
– Round-trip time (RTT) between every pair of hosts every minute
– 17M daily measurement
– Raw data: Jun. – Dec. 2001, Nov. 2002
• Keynote measurement data
– Measure TCP performance from about 100 worldwide agents
– Heterogeneous core network: various ISPs
– Heterogeneous access network:
» Dial up 56K, DSL and high-bandwidth business connections
– Targets
» 40 most popular Web servers + 27 Internet Data Centers
– Raw data: Nov. – Dec. 2001, Mar. – May 2002
Clustering Web Content for
Efficient Replication
Overview
• CDN uses non-cooperative replication - inefficient
• Paradigm shift: cooperative push
– Where to push – greedy algorithms can achieve close to
optimal performance [JJKRS01, QPV01]
– But what content to be pushed?
– At what granularity?
• Clustering of objects for replication
– Close-to-optimal performance with small overhead
• Incremental clustering
– Push before accessed: improve availability during flash
crowds
Outline
•
•
•
•
•
•
Architecture
Problem formulation
Granularity of replication
Incremental clustering and replication
Conclusions
Future Research
Conventional CDN: Non-cooperative Pull
3.GET request
4.GET request if cache miss
6. Response
Client 1
5. Response
Local CDN server
Web content server
CDN name server
Local DNS server
ISP 1
Client 2
Inefficient replication
Local CDN server
Local DNS server
ISP 2
SCAN: Cooperative Push
3.GET request if no replica yet
Client 1
Local CDN server
4. Response
s2
Web content server
CDN name server
2. Reply: nearby replica server or Web
server IP address
4. Response
Local DNS server
ISP 1
3.GET request
Significantly reduce the # of
replicas and update cost
Local CDN server
Local DNS server
Client 2
ISP 2
Comparison between
Conventional CDNs and SCAN
Conventional
CDNs
SCAN
Average retrieval
latency (ms)
79.2
77.9
Number of object
replicas deployed
121,016
5,000
Number of update
messages
1,349,655
54,564
Problem Formulation
• How to use cooperative push for replication to
reduce
– Clients’ average retrieval cost
– Replica location computation cost
– Amount of replica directory state to maintain
• Subject to certain total replication cost (e.g., #
of object replicas)
Outline
•
•
•
•
•
•
Architecture
Problem formulation
Granularity of replication
Incremental clustering and replication
Conclusions
Future Research
1
Per Web site
2
4
1
3
2
4
Per object
3
Replica Placement: Per Site vs. Per Object
Average retrieval cost
180
Replicate per Website
160
140
Replicate per object
120
100
80
60
40
20
0
1
2
3
4
5
6
7
8
9
Average number of replicas per object
• 60 – 70% average retrieval cost reduction for
Per object scheme
• Per object is too expensive for management!
Overhead Comparison
Replication Scheme
Computation Cost
Per Website
Replica Directory
State to Maintain
O(R)
Per Object
O(R × M)
O(R × M)
O(R)
Where R: # of replicas per object
M: total # of objects in the Website
To compute on average 10 replicas/object for top 1000
objects takes several days on a normal server!
Overhead Comparison
Replication Scheme
Per Website
Per Cluster
Per Object
Replica Directory
State to Maintain
O(R)
O(R × K + M)
O(R × M)
Computation Cost
O(R)
O(R × K)
O(R × M)
Where R: # of replicas per object
K: # of clusters
M: total # of objects in the Website (M >> K)
Clustering Web Content
• General clustering framework
– Define the correlation distance between objects
– Cluster diameter: the max distance between any
two members
» Worst correlation in a cluster
– Generic clustering: minimize the max diameter of
all clusters
• Correlation distance definition based on
– Spatial locality
– Temporal locality
– Popularity
Spatial Clustering
• Object spatial
access vector
1
– Blue object
2
1 
0 
 
2
 
0 
4
3
• Correlation distance between two objects defined as
– Euclidean distance
– Vector similarity cor _ dist ( A, B)  1 
 A B
 A  B
k
i
i 1
k
i 1
2
i
i
k
i 1
i
2
Clustering Web Content (cont’d)
• Temporal clustering
– Divide traces into multiple individuals’ access sessions [ABQ01]
– In each session,
cor _ dist ( A, B)  1 
 co _ occurrence( A, B)
occurrence ( A)  occurrence ( B)
– Average over multiple sessions in one day
• Popularity-based clustering
cor _ dist ( A, B) | access _ freq( A)  access _ freq( B) |
– OR even simpler, sort them and put the first N/K
elements into the first cluster, etc.
Performance of Cluster-based Replication
• Use greedy algorithm for replication
• Spatial clustering with Euclidean distance and
popularity-based clustering perform the best
Average retrieval cost
100
90
80
70
60
50
40
30
20
10
0
Spatial clustering: Euclidean distance
Spatial clustering: cosine similarity
Temporal clustering
Popularity-based clustering
Computational cost (hours)
– Small # of clusters (with only 1-2% of # of objects) can
achieve close to per-object performance, with much less
overhead
50
40
30
20
10
0
1
10
100
Number of clusters
1000
1
10
100
Number of clusters
1000
Outline
•
•
•
•
•
•
Architecture
Problem formulation
Granularity of replication
Incremental clustering and replication
Conclusions
Future Research
Static clustering and replication
• Two daily traces: training trace and new trace
Methods
Static 1
Static 2
Optimal
Traces used for clustering
Training
Training
New
Traces used for replication
Traces used for evaluation
Training
New
New
New
New
New
• Static clustering performs poorly beyond a week
Retrieval
cost of
static
clustering
almost
doubles the
optimal !
Average retrieval cost
60
Static
clustering 1
50
40
30
Static
clustering 2
20
10
0
8/3
8/4
8/5
8/10 8/11 9/27 9/28 9/29 9/30 10/1
New traces
Reclustering,
re-replication
(optimal)
Incremental Clustering
• Generic framework
1. If new object o matches with existing cluster c,
add o to c and replicate o to existing replicas of c
2. Else create new cluster and replicate them
• Two types of incremental clustering
– Online: without any access logs
» High availability
– Offline: with access logs
» Close-to-optimal performance
Online Incremental Clustering
•
•
•
Predict access patterns based on semantics
Simplify to popularity prediction
Groups of objects with similar popularity? Use hyperlink
structures!
Object 1
<a href=“object2”>
<a href=“object3”>
<a href=“object4”>
Object 4
<a href=“object3”>
<a href=“object7”>
Object 2
<a href=“object5”>
<a href=“object6”>
Groups of siblings
Groups of the same hyperlink depth
(smallest # of links from root)
1
1
2
4
2
4
3
5
6
3
7
5
6
7
Online Popularity Prediction
• Measure the
divergence of
object popularity
within a group:
access freq span =
std _ dev(access _ frequency)
average(access _ frequency)
• Experiments
– Crawl http://www.msnbc.com with hyperlink depth 4,
then group the objects
– Use corresponding access logs to analyze the correlation
• Groups of siblings have better correlation
Semantics-based Incremental Clustering
• Put new object into existing cluster with largest
number of siblings
– In case of a tie, choose the cluster w/ more replicas
• Simulation on MSNBC daily traces
– 8-10am trace: static popularity clustering + replication
– At 10am: M new objects - online inc. clustering + replication
– Evaluated with 10-12am trace: each new object O(103) requests
1
1
2
3
4
5
6
1
2
3
4
5
6
+
?
4
3
5
6
2
Average retrieval cost
Online Incremental Clustering
and Replication Results
500
450
400
350
300
250
200
150
100
50
0
1/8 compared w/ no
replication, and 1/5 for
random replication
No replication
Random
of new objects replication of
new objects
Online
incremental
clustering &
replication
Average retrieval cost
Online Incremental Clustering
and Replication Results
500
450
400
350
300
250
200
150
100
50
0
Double the optimal
retrieval cost, but
only 4% of its
replication cost
No replication
Random
of new objects replication of
new objects
Online
incremental
clustering &
replication
Complete reclustering &
re-replication
(oracle)
Conclusions
• Cooperative, clustering-based replication
• Cooperative push: only 4 - 5% replication/update
cost compared with existing CDNs
• Clustering reduce the management/computational
overhead by two orders of magnitude
– Spatial clustering and popularity-based clustering
recommended
• Incremental clustering to adapt to emerging
objects
– Hyperlink-based online incremental clustering for high
availability and performance improvement
Tie Back to SCAN
• Self-organize replicas into app-level multicast
tree for update dissemination
• Scalable overlay network monitoring
– O(M+N) instead of O(M×N), given M client groups
and N servers
• For more info:
http://www.cs.berkeley.edu/~yanchen/resume.html#
Publications
Outline
•
•
•
•
•
•
Architecture
Problem formulation
Granularity of replication
Incremental clustering and replication
Conclusions
Future Research
Future Research (I)
• Measurement-based Internet study and
protocol/architecture design
– Use inference techniques to develop Internet behavior
models
» Network operators reluctant to reveal internal network configs
– Root cause analysis: large, heterogeneous data mining
» Leverage graphics/visualization for interactive mining
– Apply deeper understanding of Internet behaviors for
reassessment/design of protocol/architecture
– E.g., Internet bottleneck – peering links? How and Why?
Implications?
Future Research (II)
• Network traffic anomaly characterization,
identification and detection
– Many unknown flow-level anomalies revealed from real
router traffic analysis (AT&T)
– Profile traffic patterns of new applications (e.g., P2P)
–> benign anomalies
– Understand the causes, patterns and prevalence of
other unknown anomalies
– Apply malicious patterns for intrusion detection
– E.g., fight against Sapphire/Slammer Worm
– Leverage Forensix for auditing and querying
Backup Materials
Tomography-based Network Monitoring
B
A
P_i
1 – P = (1 – l_0)(1 – l_1)(1 – l_2)
L_j
O(M + N)
M×N
1001...0
011... 


...



110... 
• Given O(M+N) end hosts, powerlaw degree topology imply O(M+N)
links
• Transform to the topology
matrix
• Pick O(M + N) paths to compute
the link loss rates
• Use link loss rates to compute
the loss rates of other paths
Path Loss Rate Inference

Topology
transformation
Real links
Virtual links
• Ideal case: rank = # of links (K)
• Rank deficiency solved through topology
transformation
Future Research (I)
• Internet behavior modeling and protocol /
architecture design
– Use inference techniques to develop Internet behavior
models
– Root cause analysis: large, heterogeneous data mining
» Leverage graphics/visualization for interactive mining
– Leverage SciClone Cluster for parallel network
tomography
– Apply deeper understanding of Internet behaviors for
reassessment/design of protocol/architecture
– E.g., Internet bottleneck – peering links? How and Why?
Implications?
Tomography-based Network Monitoring
• Observations
– # of lossy links is small, dominate E2E loss
– Loss rates are stable (in the order of hours ~ days)
– Routing is stable (in the order of days)
• Identify the lossy links and only monitor a few
paths to examine lossy links
• Make inference for other paths
End hosts
Routers
Normal links
Lossy links
SCAN
Coherence for
dynamic
content
X
Scalable
network
monitoring
O(M+N)
s1, s4, s5
Cooperative
clustering-based
replication
Problem Formulation
• Subject to certain total replication cost (e.g., # of URL replicas)
• Find a scalable, adaptive replication strategy to reduce avg access cost
SCAN: Scalable Content Access Network
CDN Applications (e.g. streaming media)
Provision: Cooperative
Coherence: Update Multicast
Tree Construction
Clustering-based Replication
Network
Distance/
Congestion/
Failure
Estimation
User Behavior/
Workload Monitoring
Network Performance
Monitoring
red: my work, black: out of scope
Evaluation of Internet-scale System
• Analytical evaluation
• Realistic simulation
– Network topology
– Web workload
– Network end-to-end latency measurement
• Network topology
– Pure-random, Waxman & transit-stub synthetic
topology
– A real AS-level topology from 7 widely-dispersed BGP
peers
Web Workload
Web
Site
Period
Duration
# Requests
avg –min-max
# Clients
avg –min-max
# Client groups
avg –min-max
MSNBC
Aug-Oct/1999
10–11am
1.5M–642K–1.7M
129K–69K–150K
15.6K-10K-17K
NASA
Jul-Aug/1995
All day
79K-61K-101K
5940-4781-7671
2378-1784-3011
World
Cup
May-Jul/1998
All day
29M – 1M – 73M
103K–13K–218K
N/A
• Aggregate MSNBC Web clients with BGP prefix
– BGP tables from a BBNPlanet router
• Aggregate NASA Web clients with domain names
• Map the client groups onto the topology
Simulation Methodology
• Network Topology
– Pure-random, Waxman & transit-stub synthetic topology
– An AS-level topology from 7 widely-dispersed BGP peers
• Web Workload
Web
Site
Period
Duration
# Requests
avg –min-max
# Clients
avg –min-max
# Client groups
avg –min-max
MSNBC
Aug-Oct/1999
10–11am
1.5M–642K–1.7M
129K–69K–150K
15.6K-10K-17K
NASA
Jul-Aug/1995
All day
79K-61K-101K
5940-4781-7671
2378-1784-3011
– Aggregate MSNBC Web clients with BGP prefix
» BGP tables from a BBNPlanet router
– Aggregate NASA Web clients with domain names
– Map the client groups onto the topology
Online Incremental Clustering
•
•
•
Predict access patterns based on semantics
Simplify to popularity prediction
Groups of URLs with similar popularity? Use hyperlink
structures!
– Groups of siblings
– Groups of the same hyperlink depth: smallest # of
links from root
Challenges for CDN
• Over-provisioning for replication
– Provide good QoS to clients (e.g., latency bound, coherence)
– Small # of replicas with small delay and bandwidth
consumption for update
• Replica Management
– Scalability: billions of replicas if replicating in URL
» O(104) URLs/server, O(105) CDN edge servers in O(103) networks
– Adaptation to dynamics of content providers and customers
• Monitoring
– User workload monitoring
– End-to-end network distance/congestion/failures monitoring
» Measurement scalability
» Inference accuracy and stability
SCAN Architecture
• Leverage Decentralized Object Location and Routing
(DOLR) - Tapestry for
– Distributed, scalable location with guaranteed success
– Search with locality
• Soft state maintenance of dissemination tree (for each
object)
data
source
replica
cache
always update
adaptive
coherence
data plane
Dynamic Replication/Update
and Content Management
Web
server
SCAN server
client
Tapestry mesh
Request Location
network plane
Wide-area Network Measurement
and Monitoring System (WNMMS)
• Select a subset of SCAN servers to be monitors
Cluster C
• E2E estimation for
• Distance
• Congestion
Cluster B
Cluster
A
• Failures
Monitors
SCAN edge servers
Clients
network plane
Distance measured from a host to its monitor
Distance measured among monitors
Dynamic Provisioning
• Dynamic replica placement
– Meeting clients’ latency and servers’ capacity constraints
– Close-to-minimal # of replicas
• Self-organized replicas into app-level multicast tree
– Small delay and bandwidth consumption for update multicast
– Each node only maintains states for its parent & direct
children
• Evaluated based on simulation of
– Synthetic traces with various sensitivity analysis
– Real traces from NASA and MSNBC
• Publication
– IPTPS 2002
– Pervasive Computing 2002
Effects of the Non-Uniform Size of URLs
1
2
4
• Replication cost constraint : bytes
• Similar trends exist
3
– Per URL replication outperforms per Website
dramatically
Diagram of Internet Iso-bar
Cluster C
Cluster B
Cluster A
Landmark
End Host
Diagram of Internet Iso-bar
Cluster C
Cluster B
Cluster A
Landmark
Monitor
End Host
Distance probes from monitor to its hosts
Distance probes among monitors
Real Internet Measurement Data
• NLANR Active Measurement Project data set
– 119 sites on US (106 after filtering out most offline sites)
– Round-trip time (RTT) between every pair of hosts every
minute
– Raw data: 6/24/00 – 12/3/01
• Keynote measurement data
– Measure TCP performance from about 100 agents
– Heterogeneous core network: various ISPs
– Heterogeneous access network:
» Dial up 56K, DSL and high-bandwidth business connections
– Targets
» Web site perspective: 40 most popular Web servers
» 27 Internet Data Centers (IDCs)
Related Work
• Internet content delivery systems
– Web caching
» Client-initiated
» Server-initiated
– Pull-based Content Delivery Networks (CDNs)
– Push-based CDNs
• Update dissemination
– IP multicast
– Application-level multicast
• Network E2E Distance Monitoring Systems
Web Proxy Caching
ISP 1
Web content server
1.GET request
Clien
t
4. Response
Proxy cache server
Proxy cache server
Local DNS server
Client
ISP 2
Local DNS server
Conventional CDN: Non-cooperative Pull
5.GET request
6.GET request if cache miss
8. Response
Client 1
7. Response
Local CDN server
Web content server
CDN name server
Local DNS server
ISP 1
Client 2
Inefficient replication
Local CDN server
Local DNS server
ISP 2
SCAN: Cooperative Push
5.GET request if no replica yet
Client 1
Local CDN server
6. Response
Web content server
CDN name server
6. Response
Local DNS server
ISP 1
5.GET request
Significantly reduce the # of
replicas and update cost
Local CDN server
Local DNS server
Client 2
ISP 2
Internet Content Delivery Systems
Web
caching
(client
initiated)
Efficiency (# No cache
of caches or sharing
replicas)
among
proxies
Properties
Web
caching
(server
initiated)
Cache
sharing
Pull-based
CDNs (Akamai)
Pushbased
CDNs
SCAN
No replica
sharing among
edge servers
Replica
sharing
Replica
sharing
Scalability
for request
redirection
Preconfigured
in browser
Use Bloom
filter to
exchange
replica
locations
Centralized
CDN name
server
Centrali Decentrazed CDN lized P2P
name
location
server
Coherence
support
No
No
Yes
No
Yes
Networkawareness
No
No
Yes, unscalable
monitoring
system
No
Yes,
scalable
monitoring
system
Previous Work:
Update Dissemination
• No inter-domain IP multicast
• Application-level multicast (ALM) unscalable
– Root maintains states for all children (Narada,
Overcast, ALMI, RMX)
– Root handles all “join” requests (Bayeux)
– Root split is common solution, but suffers
consistency overhead
Design Principles
• Scalability
– No centralized point of control: P2P location services,
Tapestry
– Reduce management states: minimize # of replicas, object
clustering
– Distributed load balancing: capacity constraints
• Adaptation to clients’ dynamics
– Dynamic distribution/deletion of replicas with regarding to
clients’ QoS constraints
– Incremental clustering
• Network-awareness and fault-tolerance (WNMMS)
– Distance estimation: Internet Iso-bar
– Anomaly detection and diagnostics
Comparison of Content Delivery
Systems (cont’d)
Web
caching
(client
initiated)
No
Web
caching
(server
initiated)
Yes
Pull-based
CDNs (Akamai)
Pushbased
CDNs
SCAN
Yes
No
Yes
Dynamic
replica
placement
Yes
Yes
Yes
No
Yes
Networkawareness
No
No
Yes, unscalable
monitoring
system
No
No global
network
topology
assumption
Yes
Yes
Yes
No
Yes,
scalable
monitoring
system
Yes
Properties
Distributed
load
balancing