Content Distribution Networks

Download Report

Transcript Content Distribution Networks

Content Distribution Networks
Costin Raiciu
Advanced Topics in Distributed Systems
Fall 2012
Problem: making the web go faster
• Use case: type ebay.com in your browser
– The site takes a while to load
– How long do you wait before you give up and try
another site?
40% of customers will wait no more than 3 seconds
for a webpage to load [forrester consulting]
Why would a site take long to load?
• Let’s assume the content is generated quickly
by the servers
• A webpage will have many small objects and
perhaps a few larger ones
• What network conditions will affect download
time?
The Internet
Download
webpage
Autonomous
System
(AS)
Lookup
www.site.com
Authoritative
DNS AS
Origin server
www.site.com
AS
AS
AS
How do we fix the web?
• Change the Internet architecture – rather
difficult to deploy
• More pragmatic choice: target static web
content
– Basic idea: bring content closer to the clients
• Design principles
– Design for reliability and scalability (Akamai has
120K servers today)
– Limit the need for human management
CDNs are really popular
• Quick check
The Basic Idea of CDNs:
Cache static content near the user
Autonomous System (AS)
AS
Origin server
www.site.com
AS
AS
AS
CDN Operation Summary
CDN Nameserver
Autonomous System (AS)
Authoritative
DNS AS
Lookup
www.site.com
AS E
DNS
Resolver
Origin server
www.site.com
1. www.site.com
delegates zone to
AS
CDN namerserver
2. CDN nameserver uses
resolver’s IP to find
edge server that is
nearest to customer
(geo-location)
AS
3. Edge server E will serve
file from local storage
if it has it
4. Otherwise will fetch file
from origin server
Overview of the Akamai CDN [Akamai]
Edge server functionality
• Located at ISPs worldwide - close to customers
• Maintain metadata for each served file
• Metadata specified by:
– XML files delivered using Akamai infrastructure
– Akamai specific HTTP Response Headers
Metadata stored for each file
• Origin server location
• Content path
• Cache control – how long should we store replica, how
is it invalidated?
• Cache indexing – how should an URL be used to create
a key for that object?
• Access control – who is allowed to view this file?
• Response to origin server failure
• HTTP Header alteration – rewrite headers including
cookies to deal with different browsers etc.
Mapping System
• A scoring system creates an up-to-date
topological map of Internet
– Divides IP addresses into equivalence classes
– Computes connectivity between these classes
• Implementation:
– Run and parse ping and traceroutes in real time
– Parse BGP data and logs
Mapping System
• The Real Time Mapping system creates the
actual maps used to direct users to the best
Akamai edge servers
• Runs in two steps:
a) Map to cluster: select a preferred edge server
cluster for each equivalence class of users
b) Map to server: a low level map sends the user to
a specific server within the cluster
goal: maintain locality within clusters
Implementing the mapping system
using DNS
1. The first request goes to generic TLD servers, which return Akamai
Top Level Name Servers (TLNS) as authorities, generally with long
DNS TTLs. The Akamai TLNS are globally distributed, using a
mixture of IP Anycast and large clusters.
2. The next query, to an Akamai TLNS, returns delegations with
shorter DNS TTLs to a number of Akamai Low Level Name Servers
(LLNS). The Akamai LLNS are typically located in close network
proximity to the resolving name server.
3. The final query, to an Akamai LLNS, returns edge server IP
addresses based on both the cluster assignment and the low level
map described above. These answers have very short TTLs so that
changes to the mapping assignments (such as in response to
failures or shifts in demand) can be rapidly distributed to end
users.
Akamai’s Transport protocol
• The communications between any two Akamai
servers can be optimized to overcome the
inefficiencies of BGP routing
• Goals:
– Accelerate non-cacheable content
– Accelerate apps that check origin server for freshness
• Techniques:
– Path optimizations
– Protocol enhancements
Path Optimization
• Build and Internet overlay
– Use end-to-end path quality between servers
maintained by the mapping system
– Move traffic onto the best performing path
according to measurements or use multiple paths.
• 30-50% performance improvements in Asia
Akamai Transport Protocol
• Proprietary (modified TCP)
• Use pools of persistent connections to avoid
3WHS
• Play with TCP Window size based on path
conditions
– E.g. increase initial cwnd when path is know to be
good
• Set aggressive timeouts based on known path
information
Coral CDN
(slides adapted from Mike Freedman)
A problem…
• Feb 3: Google linked banner to “julia fractals”
• Users clicking directed to Australian University web site
• …University’s network link overloaded, web server taken
down temporarily…
The problem strikes again!
• Feb 4: Slashdot ran the story about Google
• …Site taken down temporarily…again
The response from down under…
• Feb 4, later…Paul Bourke asks:
“They have hundreds (thousands?) of servers
worldwide that distribute their traffic load. If even a
small percentage of that traffic is directed to a single
server … what chance does it have?”
→ Help the little guy ←
Coral’s solution…
Origin
Server
Coral
httpprx
dnssrv
Coral
httpprx
dnssrv
Coral
httpprx
dnssrv
Coral
httpprx
dnssrv
Browser
Browser
Coral
httpprx
dnssrv
Coral
httpprx
dnssrv
Browser
Browser
Pool resources to dissipate flash crowds
•
•
•
•
Implement an open CDN
Allow anybody to contribute
Works with unmodified clients
CDN only fetches once from origin server
Using CoralCDN
• Rewrite URLs into “Coralized” URLs
www.x.com → www.x.com.nyud.net:8090
– Directs clients to Coral, which absorbs load
• Who might “Coralize” URLs?
– Web server operators Coralize URLs
– Coralized URLs posted to portals, mailing lists
– Users explicitly Coralize URLs
CoralCDN components
Origin
Server
?

?
httpprx
Fetch data
from nearby
httpprx
dnssrv
DNS Redirection
Return proxy,
preferably one
near client
Cooperative
Web Caching
Resolver
Browser
www.x.com.nyud.net
216.165.108.10
Functionality needed

DNS: Given network location of resolver, return a
proxy near the client
put (network info, self)
get (resolver info) → {proxies}

HTTP: Given URL, find proxy caching object,
preferably one nearby
put (URL, self)
get (URL) → {proxies}
Key Idea
• Use a distributed hash table – but locality is
poor
– So use multiple DHTs (called clusters)!
– Each peer takes part in 3 clusters based on
network proximity (<20ms, <60ms, all others)
– Insertions are done in all DHTs
– Lookups prefer “nearest” DHT
• A lot more details in the paper.
Challenges for DNS Redirection
• Coral lacks…
– Central management
– A priori knowledge of network topology
• Anybody can join system
– Any special tools (e.g., BGP feeds)
• Coral has…
– Large # of vantage points to probe topology
– Distributed index in which to store network hints
– Each Coral node maps nearby networks to self
Coral’s DNS Redirection
• Coral DNS server probes resolver
• Once local, stay local
When serving requests from nearby DNS resolver
– Respond with nearby Coral proxies
– Respond with nearby Coral DNS servers
→
Ensures future requests remain local
• Else, help resolver find local Coral DNS server
DNS measurement mechanism
Server probes client (2 RTTs)
Coral
httpprx
dnssrv
Browser
Resolver
Coral
dnssrv
• Return servers within appropriate cluster
– e.g., for resolver RTT = 19 ms, return from cluster < 20 ms
• Use network hints to find nearby servers
– i.e., client and server on same subnet
• Otherwise, take random walk within cluster
References
• Forrester Consulting. eCommerce Web Site
Performance Today: An Updated Look At Consumer
Reaction To A Poor Online Shopping Experience. Aug.
17, 2009.
• The Akamai Network: A Platform for High-Performance
Internet Applications – Erik Nygren et al. http://www.akamai.com/dl/technical_publications/net
work_overview_osr.pdf
• Democratizing Content Publication with Coral.
Michael J. Freedman, Eric Freudenthal, and David
Mazières. In Proc. 1st USENIX/ACM Symposium on
Networked Systems Design and Implementation