Ideas for Next Generation Content Delivery

Download Report

Transcript Ideas for Next Generation Content Delivery

Next Steps in
Internet Content Delivery
Peter B Danzig
[email protected]
Understanding WAN traffic
HOW MUCH WEB TRAFFIC
CROSSES THE INTERNET?
How much WAN HTTP traffic?
Assumptions:
250 million internet users
Average 10 kbits/s per user when online
Average 10% online
Yields:
Bandwidth = 250M * 10kb/s * 0.10 = 250 Gbits/s
How much WAN HTTP traffic?
Observations:
Doubleclick.Com is 1% of web by byte
Geocities.Com is 1 % of web by byte
Download.Microsoft.Com is 1% byte byte
Learn their aggregate bandwidth purchases….And
estimate internet bandwidth:
250 Gbits/s
Percent of Internet Traffic
Internet Web Sites
45
40
35
30
25
20
15
10
5
0
20
100
300
Number of Web Sites
1000
Percent of Internet Traffic
More than 6,000 ISPs
100
80
60
Percent
40
20
0
1
10
100
ISPs
1000
10000
Internet CDN Market Sizing
yr
• Market Size = $2M/Gb * 250 Gb * 3 / Sqrt(2)
yr
• Revenue Potential = Market Size * Mkt Fraction
• Market size grows by 4.5x every two years
• Today: Mkt = (250 Gb) (Multiplex Factor) +
Streaming Mkt
• Streaming Mkt = Yahoo Broadcast + IBeam +
Akamai + Real Broadcast Network < 10Gb/s
Internet CDN Strategy
• Except for top 1000 sites, the world’s 199,000+ web
sites serve minimal bandwidth
• This determines the business strategy
•Acct provisioning need be cheap & easy
• Need indirect sales
• Need bigger, more expensive product bundle
• Customer care need be inexpensive
• Make money from streaming media?
Internet CDN Strategy, cont.
• Live Streaming Media
• Lights, camera, action
• Event connectivity: ISDN or Satellite truck role
• Production and encoding
• Yucky, dirty, icky, labor intensive, non-cerebral,
labor-of-love, crafty, stuff
• Work more reminiscent of WebVan than Cisco
Akamai ‘GIF’ Delivery
“Akamaized”
“Akamaized”
“Akamaized
”
Without Akamai
“Akamaized”
“Akamaized”
Entire Web
Page delivered
by CNN
HTML
Delivered by
CNN
“Akamaized”
“Akamaized”
KeyNote System Measurements
Without Akamai
With Akamai
KeyNote Systems & its
Wannabes



Deploys “footprint” of monitoring agents,
provisioning interface, global log collection,
reports
Agents: Emulate URL and page download.
Emulate broadband and dialup access rates
Wannabe Competitors: Mercury Interactive,
Service Metrics, StreamCheck, etc.
KeyNote: Operational Issues





Where’s the bottleneck: the agent or the
agent’s network connection
Where’s the agent’s DNS resolver?
How to excise mistaken points from database
How can a CDN beat a Keynote benchmark?
How does Keynote’s TCP stack affect its
results?
End-to-End CDN
Measurements?



Contrast methodology between Johnson et al
and Keynote Systems
Server log analysis-e.g. Web Trends
Server logs don’t record page arrival times,
as the bytes stay queued in TCP or OS
buffers.
Client-side reporting (e.g. WebSideStory)
Place JavaScript on web page that reports
client experience to aggregator
HTML Delivery

Consider Web Traffic breakdown:




GIFS and JPEG 55%
HTML 25%
J. Random Gunk 20%
HTML is half of the delivery market, but
HTML is 1/3 static and 2/3 dynamic
HTML?
HTML Delivery


Delivering static HTML from caches is fast
How can we make dynamic HTML faster?




Compress it or Delta-Encode it
Black magic: Transfer it over a TCP tunnel or L2TP
Little’s Law almost always surprises laymen.
Construct or “assemble” it within the CDN via
proprietary language extensions
Future of HTML Delivery




Profiling: detect client location and link
speed
Interpret XML style sheets at edge (see
Oracle/Akamai)
Insert ads
Compress at source/decompress in
browser or edge network
Components of a CDN




Distributed server load balancing, e.g.
“Internet Mapping”
DNS redirection, hashing, and fault
tolerance
Distributed system monitoring
Distributed software configuration
management
Components of a CDN (cont)




Live stream distribution and entry
points
Log collection, reporting, and
performance monitoring
Client provisioning mechanism
Content management and replication
Network Mapping



Network mapping chooses reasonable data
centers to satisfy a client request.
We could devote an entire day to mapping.
Briefly, what factors help predict good
mapping?




Contracted data center bandwidth
Path characteristics: RTT, Bottleneck Bandwidth,
“Experience”, Autonomous Systems Crossed, Hop
Count, Observed loss rates, etc.
How do you measure these factors?
Mapping is an art.
Black Art of Network Mapping

Cisco Boomerang


Radware’s DSLB box


Linear combination of hop count and RTT
F5’s 3DNS


Synchronized DNS servers
ICMP ping
Alteon, Foundry, Resonate, and
others….
Live Stream Distribution



Ubiquitous IP Multicast hasn’t emerged
Alternative: IP Multicast plus FEC
Yahoo Broadcast’s Approach:



Private network link to principal ISPs
Support multicast where available
Otherwise, just blast it by unicast and hope
Live Stream Distribution



Some CDNs attempt to route
independent live streams via multiple
paths
Encode with simple error correction
codes- better code would increase delay
Makes client provisioning more
challenging—need to get encoded
signal to multiple entry points
Live Stream Distribution



Splitter-combiner network burns bandwidth
Subscription and teardown expensive, given
low median subscriber count
According to Yahoo Broadcast



Mean subscribers?
Average subscribers?
Splitter/combiner masks failures too
successfully, until hell breaks loose
DNS Redirection, Hashing, and
Fault Tolerance




Top-level DNS: Uses IP Anycast to a
dozen DNS servers (or more)
Second-level DNS servers: Redirect
client to a reasonable region
Low-level DNS servers: Implement
something akin to consistent hashing
Hot spare address takeover to mask
machine failures
Distributed system monitoring


Problem: export monitoring information
across thousands of machines running in
hundreds of regions
Design principles:






Aggregation
Scalable
Extensible data types
Fault-tolerant
Timely delivery
Expressible queries
Distributed software
configuration management



Manage software and OS on thousands
of remote machines
Stage system software pushes
Detect incompatibilities before hell
breaks loose
Log collection, reporting, and
performance monitoring



Collect and create database of 10-100
billion log lines per day
Allow customer to see their logs and
performance
How would you do this in real time?
Content management and
replication



Reliably update replicated hosting
Mask storage volume boundaries
Enable billing and reclaiming lost space
Consistent Hashing



Cute algorithm for splitting load across
multiple servers
Create permutation on hash bucket
Add servers and subtract servers for
given bucket (e.g. permutation) in same
order
Consistent Hashing




http://a32.g.akamaitech.net/
Would a less elegant algorithm suffice?
Yes, hit rates are 98-99% anyway, any
hash algorithm suffices.
The 2nd level of Akamai DNS servers
slightly degrade performance, since
DNS TTLs are short
What are the next steps?




Got to address HTTP and
compression/delta encoding
What about peer-to-peer for GIFS and
Video?
How about PVR (e.g. TIVO) and Peerto-peer
What about live stream distribution?