Ideas for Next Generation Content Delivery
Download
Report
Transcript Ideas for Next Generation Content Delivery
Next Steps in
Internet Content Delivery
Peter B Danzig
[email protected]
Understanding WAN traffic
HOW MUCH WEB TRAFFIC
CROSSES THE INTERNET?
How much WAN HTTP traffic?
Assumptions:
250 million internet users
Average 10 kbits/s per user when online
Average 10% online
Yields:
Bandwidth = 250M * 10kb/s * 0.10 = 250 Gbits/s
How much WAN HTTP traffic?
Observations:
Doubleclick.Com is 1% of web by byte
Geocities.Com is 1 % of web by byte
Download.Microsoft.Com is 1% byte byte
Learn their aggregate bandwidth purchases….And
estimate internet bandwidth:
250 Gbits/s
Percent of Internet Traffic
Internet Web Sites
45
40
35
30
25
20
15
10
5
0
20
100
300
Number of Web Sites
1000
Percent of Internet Traffic
More than 6,000 ISPs
100
80
60
Percent
40
20
0
1
10
100
ISPs
1000
10000
Internet CDN Market Sizing
yr
• Market Size = $2M/Gb * 250 Gb * 3 / Sqrt(2)
yr
• Revenue Potential = Market Size * Mkt Fraction
• Market size grows by 4.5x every two years
• Today: Mkt = (250 Gb) (Multiplex Factor) +
Streaming Mkt
• Streaming Mkt = Yahoo Broadcast + IBeam +
Akamai + Real Broadcast Network < 10Gb/s
Internet CDN Strategy
• Except for top 1000 sites, the world’s 199,000+ web
sites serve minimal bandwidth
• This determines the business strategy
•Acct provisioning need be cheap & easy
• Need indirect sales
• Need bigger, more expensive product bundle
• Customer care need be inexpensive
• Make money from streaming media?
Internet CDN Strategy, cont.
• Live Streaming Media
• Lights, camera, action
• Event connectivity: ISDN or Satellite truck role
• Production and encoding
• Yucky, dirty, icky, labor intensive, non-cerebral,
labor-of-love, crafty, stuff
• Work more reminiscent of WebVan than Cisco
Akamai ‘GIF’ Delivery
“Akamaized”
“Akamaized”
“Akamaized
”
Without Akamai
“Akamaized”
“Akamaized”
Entire Web
Page delivered
by CNN
HTML
Delivered by
CNN
“Akamaized”
“Akamaized”
KeyNote System Measurements
Without Akamai
With Akamai
KeyNote Systems & its
Wannabes
Deploys “footprint” of monitoring agents,
provisioning interface, global log collection,
reports
Agents: Emulate URL and page download.
Emulate broadband and dialup access rates
Wannabe Competitors: Mercury Interactive,
Service Metrics, StreamCheck, etc.
KeyNote: Operational Issues
Where’s the bottleneck: the agent or the
agent’s network connection
Where’s the agent’s DNS resolver?
How to excise mistaken points from database
How can a CDN beat a Keynote benchmark?
How does Keynote’s TCP stack affect its
results?
End-to-End CDN
Measurements?
Contrast methodology between Johnson et al
and Keynote Systems
Server log analysis-e.g. Web Trends
Server logs don’t record page arrival times,
as the bytes stay queued in TCP or OS
buffers.
Client-side reporting (e.g. WebSideStory)
Place JavaScript on web page that reports
client experience to aggregator
HTML Delivery
Consider Web Traffic breakdown:
GIFS and JPEG 55%
HTML 25%
J. Random Gunk 20%
HTML is half of the delivery market, but
HTML is 1/3 static and 2/3 dynamic
HTML?
HTML Delivery
Delivering static HTML from caches is fast
How can we make dynamic HTML faster?
Compress it or Delta-Encode it
Black magic: Transfer it over a TCP tunnel or L2TP
Little’s Law almost always surprises laymen.
Construct or “assemble” it within the CDN via
proprietary language extensions
Future of HTML Delivery
Profiling: detect client location and link
speed
Interpret XML style sheets at edge (see
Oracle/Akamai)
Insert ads
Compress at source/decompress in
browser or edge network
Components of a CDN
Distributed server load balancing, e.g.
“Internet Mapping”
DNS redirection, hashing, and fault
tolerance
Distributed system monitoring
Distributed software configuration
management
Components of a CDN (cont)
Live stream distribution and entry
points
Log collection, reporting, and
performance monitoring
Client provisioning mechanism
Content management and replication
Network Mapping
Network mapping chooses reasonable data
centers to satisfy a client request.
We could devote an entire day to mapping.
Briefly, what factors help predict good
mapping?
Contracted data center bandwidth
Path characteristics: RTT, Bottleneck Bandwidth,
“Experience”, Autonomous Systems Crossed, Hop
Count, Observed loss rates, etc.
How do you measure these factors?
Mapping is an art.
Black Art of Network Mapping
Cisco Boomerang
Radware’s DSLB box
Linear combination of hop count and RTT
F5’s 3DNS
Synchronized DNS servers
ICMP ping
Alteon, Foundry, Resonate, and
others….
Live Stream Distribution
Ubiquitous IP Multicast hasn’t emerged
Alternative: IP Multicast plus FEC
Yahoo Broadcast’s Approach:
Private network link to principal ISPs
Support multicast where available
Otherwise, just blast it by unicast and hope
Live Stream Distribution
Some CDNs attempt to route
independent live streams via multiple
paths
Encode with simple error correction
codes- better code would increase delay
Makes client provisioning more
challenging—need to get encoded
signal to multiple entry points
Live Stream Distribution
Splitter-combiner network burns bandwidth
Subscription and teardown expensive, given
low median subscriber count
According to Yahoo Broadcast
Mean subscribers?
Average subscribers?
Splitter/combiner masks failures too
successfully, until hell breaks loose
DNS Redirection, Hashing, and
Fault Tolerance
Top-level DNS: Uses IP Anycast to a
dozen DNS servers (or more)
Second-level DNS servers: Redirect
client to a reasonable region
Low-level DNS servers: Implement
something akin to consistent hashing
Hot spare address takeover to mask
machine failures
Distributed system monitoring
Problem: export monitoring information
across thousands of machines running in
hundreds of regions
Design principles:
Aggregation
Scalable
Extensible data types
Fault-tolerant
Timely delivery
Expressible queries
Distributed software
configuration management
Manage software and OS on thousands
of remote machines
Stage system software pushes
Detect incompatibilities before hell
breaks loose
Log collection, reporting, and
performance monitoring
Collect and create database of 10-100
billion log lines per day
Allow customer to see their logs and
performance
How would you do this in real time?
Content management and
replication
Reliably update replicated hosting
Mask storage volume boundaries
Enable billing and reclaiming lost space
Consistent Hashing
Cute algorithm for splitting load across
multiple servers
Create permutation on hash bucket
Add servers and subtract servers for
given bucket (e.g. permutation) in same
order
Consistent Hashing
http://a32.g.akamaitech.net/
Would a less elegant algorithm suffice?
Yes, hit rates are 98-99% anyway, any
hash algorithm suffices.
The 2nd level of Akamai DNS servers
slightly degrade performance, since
DNS TTLs are short
What are the next steps?
Got to address HTTP and
compression/delta encoding
What about peer-to-peer for GIFS and
Video?
How about PVR (e.g. TIVO) and Peerto-peer
What about live stream distribution?