Transcript qualstatus
A Scalable, Adaptive, Network-aware
Infrastructure for Efficient Content
Delivery
Yan Chen
Ph.D. Status Talk
EECS Department
UC Berkeley
Motivation
• The Internet has evolved to become a
commercial infrastructure for service delivery
– Web delivery, VoIP, streaming media …
• Challenges for Internet-scale services
–
–
–
–
Scalability: 600M users, 35M Web sites, 28Tb/s
Efficiency: bandwidth, storage, management
Agility: dynamic clients/network/servers
Security, etc.
• Focus on content delivery - Content
Distribution Network (CDN)
– Totally 4 Billion Web pages, daily growth of 7M pages
– Annual growth of 200% for next 4 years
How CDN Works
New Challenges for CDN
• Large multimedia files ― Efficient replication
• Dynamic content ― Coherence support
• Network congestion/failures
― Scalable network monitoring
Existing CDNs Fail to Address these
Challenges
No coherence
for dynamic
content
X
Unscalable
network
monitoring
- O(M × N)
Non-cooperative
replication
inefficient
SCAN: Scalable Content Access Network
Provisioning (replica placement)
Access/Deployment Mechanisms
Non-cooperative Cooperative
Pull
Existing CDNs
Push
SCAN
Coherence Support
Unicast App-level
multicast
IP
multicast
Granularity
Per
Per
Per
Website cluster object
Network Monitoring
Ad hoc
pair-wise
monitoring
O(M×N)
Tomography
-based
monitoring
O(M+N)
SCAN
Coherence for
dynamic
content
s1, s4, s5
Cooperative
clustering-based
replication
SCAN
Coherence for
dynamic
content
X
Scalable
network
monitoring
- O(M+N)
s1, s4, s5
Cooperative
clustering-based
replication
Outline
• Introduction
• Research Methodology
• SCAN Mechanisms and Status
– Cooperative clustering-based replication
– Coherence support
– Scalable network monitoring
• Research Plan
• Conclusions
Design and Evaluation of
Internet-scale Systems
iterate
Analytical
evaluation
Algorithm
design
Real
Realistic
evaluation?
simulation
• Network topology
• Web workload
• Network end-to-end latency
measurement
Network Topology and Web Workload
• Network Topology
– Pure-random, Waxman & transit-stub synthetic topology
– An AS-level topology from 7 widely-dispersed BGP peers
• Web Workload
Web
Site
Period
Duration
# Requests
avg –min-max
# Clients
avg –min-max
# Client groups
avg –min-max
MSNBC
Aug-Oct/1999
10–11am
1.5M–642K–1.7M
129K–69K–150K
15.6K-10K-17K
NASA
Jul-Aug/1995
All day
79K-61K-101K
5940-4781-7671
2378-1784-3011
World
Cup
May-Jul/1998
All day
29M – 1M – 73M
103K–13K–218K
N/A
– Aggregate MSNBC Web clients with BGP prefix
» BGP tables from a BBNPlanet router
– Aggregate NASA Web clients with domain names
– Map the client groups onto the topology
Network E2E Latency Measurement
• NLANR Active Measurement Project data set
– 111 sites on America, Asia, Australia and Europe
– Round-trip time (RTT) between every pair of hosts every minute
– 17M daily measurement
– Raw data: Jun. – Dec. 2001, Nov. 2002
• Keynote measurement data
– Measure TCP performance from about 100 worldwide agents
– Heterogeneous core network: various ISPs
– Heterogeneous access network:
» Dial up 56K, DSL and high-bandwidth business connections
– Targets
» 40 most popular Web servers + 27 Internet Data Centers
– Raw data: Nov. – Dec. 2001, Mar. – May 2002
Outline
• Introduction
• Research Methodology
• SCAN Mechanisms and Status
– Cooperative clustering-based replication
– Coherence support
– Scalable network monitoring
• Research Plan
• Conclusions
Cooperative Clustering-based
Replication
• Cooperative push: only 4 - 5% replication/update
cost compared with existing CDNs
• Clustering reduce the management/computational
overhead by two orders of magnitude
– Spatial clustering and popularity-based clustering
recommended
• Incremental clustering to adapt to emerging
objects
– Hyperlink-based online incremental clustering for high
availability and performance improvement
– Offline incremental clustering performs close to optimal
• Publication
– ICNP 2002
– IEEE J-SAC 2003 (extended version)
Coherence Support
• Leverage on DOLR, Tapestry
• Dynamic replica placement
• Self-organized replicas into app-level multicast
tree
– Small delay and bandwidth consumption for update multicast
– Each node only maintains states for its parent & direct
children
• Evaluated based on simulation of
– Synthetic traces with various sensitivity analysis
– Real traces from NASA and MSNBC
• Publication
– IPTPS 2002
– Pervasive Computing 2002
Network Distance Estimation
• Proposed Internet Iso-bar: a scalable overlay
distance monitoring system
• Procedures
1. Cluster hosts that perceive similar performance to a
small set of sites (landmarks)
2.For each cluster, select a monitor for active and
continuous probing
3.Estimate distance between any pair of hosts using
inter- and intra-cluster distance
Diagram of Internet Iso-bar
Cluster C
Cluster B
Cluster A
Landmark
End Host
Diagram of Internet Iso-bar
Cluster C
Cluster B
Cluster A
Landmark
Monitor
End Host
Distance probes from monitor to its hosts
Distance probes among monitors
Internet Iso-bar
• Evaluated with NLANR AMP and Keynote data
– 90% of relative error less than 0.5
» if 60ms latency, 45ms < prediction < 90ms
– Good stability for distance estimation
• Publications
– ACM SIGMETRICS Performance Evaluation Review
(PER), September issue, 2002.
– Journal of Computer Resource Management,
Computer Measurement Group, Spring Edition,
2002.
Outline
• Introduction
• Research Methodology
• SCAN Mechanisms and Status
– Cooperative clustering-based replication
– Coherence support
– Scalable network monitoring
• Research Plan
• Conclusions
Research Plan
• Focus on congestion/failures estimation (4
months)
– Apply topology information, e.g. lossy link detection
with network tomography
– Cluster and choose monitors based on the lossy links
– Dynamic node join/leave for P2P systems
– More comprehensive evaluation
» Simulate with large network
» Deploy on PlanetLab, and operate at finer level
• Write up thesis (4 months)
Tomography-based Network Monitoring
• Observations
– # of lossy links is small, dominate E2E loss
– Loss rates are stable (in the order of hours ~ days)
– Routing is stable (in the order of days)
• Identify the lossy links and only monitor a few
paths to examine lossy links
• Make inference for other paths
End hosts
Routers
Normal links
Lossy links
Conclusions
• Cooperative, clustering-based replication
– Cooperative push: only 4 - 5% replication/update cost
compared with existing CDNs
– Clustering reduce the management/computational
overhead by two orders of magnitude
» Spatial clustering and popularity-based clustering recommended
– Incremental clustering to adapt to emerging objects
» Hyperlink-based online incremental clustering for high
availability and performance improvement
• Self-organize replicas into app-level multicast tree
for update dissemination
• Scalable overlay network monitoring
– O(M+N) instead of O(M×N), given M client groups and N
servers
Backup Materials
SCAN
Coherence for
dynamic
content
X
Scalable
network
monitoring
O(M+N)
s1, s4, s5
Cooperative
clustering-based
replication
Problem Formulation
• Subject to certain total replication cost (e.g., # of URL replicas)
• Find a scalable, adaptive replication strategy to reduce avg access cost
SCAN: Scalable Content Access Network
CDN Applications (e.g. streaming media)
Provision: Cooperative
Coherence: Update Multicast
Tree Construction
Clustering-based Replication
Network
Distance/
Congestion/
Failure
Estimation
User Behavior/
Workload Monitoring
Network Performance
Monitoring
red: my work, black: out of scope
Evaluation of Internet-scale System
• Analytical evaluation
• Realistic simulation
– Network topology
– Web workload
– Network end-to-end latency measurement
• Network topology
– Pure-random, Waxman & transit-stub synthetic
topology
– A real AS-level topology from 7 widely-dispersed BGP
peers
Web Workload
Web
Site
Period
Duration
# Requests
avg –min-max
# Clients
avg –min-max
# Client groups
avg –min-max
MSNBC
Aug-Oct/1999
10–11am
1.5M–642K–1.7M
129K–69K–150K
15.6K-10K-17K
NASA
Jul-Aug/1995
All day
79K-61K-101K
5940-4781-7671
2378-1784-3011
World
Cup
May-Jul/1998
All day
29M – 1M – 73M
103K–13K–218K
N/A
• Aggregate MSNBC Web clients with BGP prefix
– BGP tables from a BBNPlanet router
• Aggregate NASA Web clients with domain names
• Map the client groups onto the topology
Simulation Methodology
• Network Topology
– Pure-random, Waxman & transit-stub synthetic topology
– An AS-level topology from 7 widely-dispersed BGP peers
• Web Workload
Web
Site
Period
Duration
# Requests
avg –min-max
# Clients
avg –min-max
# Client groups
avg –min-max
MSNBC
Aug-Oct/1999
10–11am
1.5M–642K–1.7M
129K–69K–150K
15.6K-10K-17K
NASA
Jul-Aug/1995
All day
79K-61K-101K
5940-4781-7671
2378-1784-3011
– Aggregate MSNBC Web clients with BGP prefix
» BGP tables from a BBNPlanet router
– Aggregate NASA Web clients with domain names
– Map the client groups onto the topology
Online Incremental Clustering
•
•
•
Predict access patterns based on semantics
Simplify to popularity prediction
Groups of URLs with similar popularity? Use hyperlink
structures!
– Groups of siblings
– Groups of the same hyperlink depth: smallest # of
links from root
Challenges for CDN
• Over-provisioning for replication
– Provide good QoS to clients (e.g., latency bound, coherence)
– Small # of replicas with small delay and bandwidth
consumption for update
• Replica Management
– Scalability: billions of replicas if replicating in URL
» O(104) URLs/server, O(105) CDN edge servers in O(103) networks
– Adaptation to dynamics of content providers and customers
• Monitoring
– User workload monitoring
– End-to-end network distance/congestion/failures monitoring
» Measurement scalability
» Inference accuracy and stability
SCAN Architecture
• Leverage Decentralized Object Location and Routing
(DOLR) - Tapestry for
– Distributed, scalable location with guaranteed success
– Search with locality
• Soft state maintenance of dissemination tree (for each
object)
data
source
replica
cache
always update
adaptive
coherence
data plane
Dynamic Replication/Update
and Content Management
Web
server
SCAN server
client
Tapestry mesh
Request Location
network plane
Wide-area Network Measurement
and Monitoring System (WNMMS)
• Select a subset of SCAN servers to be monitors
Cluster C
• E2E estimation for
• Distance
• Congestion
Cluster B
Cluster
A
• Failures
Monitors
SCAN edge servers
Clients
network plane
Distance measured from a host to its monitor
Distance measured among monitors
Dynamic Provisioning
• Dynamic replica placement
– Meeting clients’ latency and servers’ capacity constraints
– Close-to-minimal # of replicas
• Self-organized replicas into app-level multicast tree
– Small delay and bandwidth consumption for update multicast
– Each node only maintains states for its parent & direct
children
• Evaluated based on simulation of
– Synthetic traces with various sensitivity analysis
– Real traces from NASA and MSNBC
• Publication
– IPTPS 2002
– Pervasive Computing 2002
Effects of the Non-Uniform Size of URLs
1
2
4
• Replication cost constraint : bytes
• Similar trends exist
3
– Per URL replication outperforms per Website
dramatically
Diagram of Internet Iso-bar
Cluster C
Cluster B
Cluster A
Landmark
End Host
Diagram of Internet Iso-bar
Cluster C
Cluster B
Cluster A
Landmark
Monitor
End Host
Distance probes from monitor to its hosts
Distance probes among monitors
Real Internet Measurement Data
• NLANR Active Measurement Project data set
– 119 sites on US (106 after filtering out most offline sites)
– Round-trip time (RTT) between every pair of hosts every
minute
– Raw data: 6/24/00 – 12/3/01
• Keynote measurement data
– Measure TCP performance from about 100 agents
– Heterogeneous core network: various ISPs
– Heterogeneous access network:
» Dial up 56K, DSL and high-bandwidth business connections
– Targets
» Web site perspective: 40 most popular Web servers
» 27 Internet Data Centers (IDCs)
Related Work
• Internet content delivery systems
– Web caching
» Client-initiated
» Server-initiated
– Pull-based Content Delivery Networks (CDNs)
– Push-based CDNs
• Update dissemination
– IP multicast
– Application-level multicast
• Network E2E Distance Monitoring Systems
Web Proxy Caching
ISP 1
Web content server
1.GET request
Clien
t
4. Response
Proxy cache server
Proxy cache server
Local DNS server
Client
ISP 2
Local DNS server
Pull-based CDN
5.GET request
6.GET request if cache miss
8. Response
Clien
t
7. Response
Local CDN server
Web content server
CDN name server
Local DNS server
ISP 1
Local CDN server
Client
Local DNS server
ISP 2
Push-based CDN
5.GET request if no replica yet
Clien
t
Local CDN server
6. Response
Web content server
CDN name server
6. Response
Local DNS server
5.GET request
ISP 1
Local CDN server
Local DNS server
Client
ISP 2
Internet Content Delivery Systems
Web
caching
(client
initiated)
Efficiency (# No cache
of caches or sharing
replicas)
among
proxies
Properties
Web
caching
(server
initiated)
Cache
sharing
Pull-based
CDNs (Akamai)
Pushbased
CDNs
SCAN
No replica
sharing among
edge servers
Replica
sharing
Replica
sharing
Scalability
for request
redirection
Preconfigured
in browser
Use Bloom
filter to
exchange
replica
locations
Centralized
CDN name
server
Centrali Decentrazed CDN lized P2P
name
location
server
Coherence
support
No
No
Yes
No
Yes
Networkawareness
No
No
Yes, unscalable
monitoring
system
No
Yes,
scalable
monitoring
system
Previous Work:
Update Dissemination
• No inter-domain IP multicast
• Application-level multicast (ALM) unscalable
– Root maintains states for all children (Narada,
Overcast, ALMI, RMX)
– Root handles all “join” requests (Bayeux)
– Root split is common solution, but suffers
consistency overhead
Design Principles
• Scalability
– No centralized point of control: P2P location services,
Tapestry
– Reduce management states: minimize # of replicas, object
clustering
– Distributed load balancing: capacity constraints
• Adaptation to clients’ dynamics
– Dynamic distribution/deletion of replicas with regarding to
clients’ QoS constraints
– Incremental clustering
• Network-awareness and fault-tolerance (WNMMS)
– Distance estimation: Internet Iso-bar
– Anomaly detection and diagnostics
Comparison of Content Delivery
Systems (cont’d)
Web
caching
(client
initiated)
No
Web
caching
(server
initiated)
Yes
Pull-based
CDNs (Akamai)
Pushbased
CDNs
SCAN
Yes
No
Yes
Dynamic
replica
placement
Yes
Yes
Yes
No
Yes
Networkawareness
No
No
Yes, unscalable
monitoring
system
No
No global
network
topology
assumption
Yes
Yes
Yes
No
Yes,
scalable
monitoring
system
Yes
Properties
Distributed
load
balancing
Network-awareness (cont’d)
• Loss/congestion prediction
– Maximize the true positive and minimize the false
positive
• Orthogonal loss/congestion paths discovery
– Without underlying topology
orthogonality ( AC , BC ) 1
E ( AC , BC ) E ( AC ) E ( BC )
E ( AC 2 ) E 2 ( AC ) E ( BC 2 ) E 2 ( BC )
– How stable is such orthogonality?
» Degradation of orthogonality over time
• Reactive and proactive adaptation for SCAN