ppt - CSE Labs User Home Pages

Download Report

Transcript ppt - CSE Labs User Home Pages

Overlay, P2P and CDN
• Recap Internet Architecture
– original design goals vs. today’s requirements
• how to meet the challenges?
– contrast & convergence: telephone & cellular voice/data
networks (see separate slides)
• Overlay Networks
– as a general paradigm for (incrementally?) introducing
new services
– infrastructure overlays, e.g., DNS
– (third-party) application overlays vs. end-system overlays
• Peer-to-Peer (P2P) Networks/Applications
– Key design challenges?
– Unstructured vs. structured P2P networks
• CDNs & Large-scale content distribution
– key concepts & design challenges?
– Case studies: YouTube & Netflix (see separate slides)
CSci5221: Overlay, P2P and CDN
1
Recap: Internet Architecture
• Packet-switched datagram network
• Layering: hourglass architecture
Some
• End-to-end argument
Design/Implementation
Principles
• IP is the
“compatibility
layer”
TCP
– all hosts and
routers run IP
IP
• Stateless
– No per flow
state inside
network
UDP
Satellite
Ethernet ATM
CSci5221: Overlay, P2P and CDN
• virtualization
• indirection
• soft state vs. hard
state
• fate sharing
• randomization
• expose faults, don’t
suppress/ignore
• caching
•
……
2
Original Internet Design Goals
[Clark’88]
In order of importance:
0
1.
2.
3.
4.
5.
6.
Connect existing networks
– initially ARPANET and ARPA packet radio network
Survivability
- ensure communication service even with network and router
failures
Support multiple types of services
Must accommodate a variety of networks
Allow distributed management
Allow host attachment with a low level of effort
Be cost effective
7. Allow resource accountability
CSci5221: Overlay, P2P and CDN
3
Priorities
• The effects of the order of items in that list are
still felt today
– E.g., resource accounting is a hard, current research
topic
• Different ordering of priorities would make a
different architecture!
• How well has today’s Internet satisfied these
goals?
• Let’s look at them in detail
CSci5221: Overlay, P2P and CDN
4
Questions
• What priority order would a commercial
design have?
• What would a commercially invented
Internet look like?
• What goals are missing from this list?
• Which goals led to the success of the
Internet?
CSci5221: Overlay, P2P and CDN
5
Requirements for Today’s Internet
Some key requirements (“-ities”)
• Availability and reliability
– “Always on”, fault-tolerant, fast recovery from failures, …
• Quality-of-service (QoS) for applications
– fast response time, adequate quality for VoIP, IPTV, etc.
• Scalability
– millions or more of users, devices, …
• Mobility
– untethered access, mobile users, devices, …
• Security (and Privacy?)
– protect against malicious attacks, accountability of user actions?
• Manageability
– configure, operate and manage networks
– trouble-shooting network problems
• Flexibility, Extensibility, Evolvability, ……?
– ease of new service creation and deployment?
– evolvable to meet future needs?
CSci5221: Overlay, P2P and CDN
6
Overlay Networks
CSci5221: Overlay, P2P and CDN
7
Overlay Networks
Can be done at network layer
or application layer
CSci5221: Overlay, P2P and CDN
8
Overlay Networks
• A “logical” network built on top of a physical
network
– Overlay links are tunnels through the underlying network
• Many logical networks may coexist at once
– Over the same underlying network
– And providing its own particular service
• Nodes are often end hosts
– Acting as intermediate nodes that forward traffic
– Providing a service, such as access to files
• Who controls the nodes providing service?
– The party providing the service (e.g., Akamai)
– Distributed collection of end users (e.g., peer-to-peer)
CSci5221: Overlay, P2P and CDN
9
Overlay and IP/Internet
• IP Network/Internet started as an Overlay
– over various physical networks, in particular
telephone networks
– There are now many “overlays” over today’s
Internet physical infrastructure
• Use tool for incremental enhancements to IP
–
–
–
–
IPv6
Security, e.g., VPNs
Mobility
Multicast
• 2.5G/3G Cellular Data Network as an Overlay
• CDNs and P2P Networks, …
• Question: where a function belongs?
CSci5221: Overlay, P2P and CDN
10
Taxonomy of (IP) Overlays
Host
Middleboxes Infrastructure
Internet &
Infrastructure
overlays
Data plane
Control plane
Data plane
(Third-Party)
application
Control plane
overlay services
Control plane
p2p &
End-host overlays Data plane
CSci5221: Overlay, P2P and CDN
11
IP Tunneling
• IP tunnel is a virtual point-to-point link
– Illusion of a direct link between two separated nodes
Logical view:
Physical view:
A
B
A
B
tunnel
E
F
E
F
• Encapsulation of the packet inside an IP
datagram
– Node B sends a packet to node E
– … containing another packet as the payload
CSci5221: Overlay, P2P and CDN
12
6Bone: Deploying IPv6 over IP4
Logical view:
Physical view:
A
B
IPv6
IPv6
A
B
C
IPv6
IPv6
IPv4
Flow: X
Src: A
Dest: F
data
Src:B
Dest: E
Flow: X
Src: A
Dest: F
data
A-to-B:
IPv6
CSci5221: Overlay, P2P and CDN
E
F
IPv6
IPv6
D
E
F
IPv4
IPv6
IPv6
tunnel
B-to-C:
IPv6 inside
IPv4
Src:B
Dest: E
Flow: X
Src: A
Dest: F
Flow: X
Src: A
Dest: F
data
data
B-to-C:
IPv6 inside
IPv4
E-to-F:
IPv6
13
IP Multicast & MBone
• Multicast
– Delivering the same data to many receivers
– Avoiding sending the same data many times
unicast
multicast
• IP multicast
– Special addressing, forwarding, and routing schemes
– Not widely deployed, so MBone tunneled between nodes
14
IP Multicast
CMU
Stanford
UMN
Berkeley
Routers with multicast support
•No duplicate packets
•Highly efficient bandwidth usage
Key Architectural Decision: Add support for multicast in IP layer
CSci5221: Overlay, P2P and CDN
15
Key Concerns with IP Multicast
• Scalability with number of groups
– Routers maintain per-group state
– Analogous to per-flow state for QoS guarantees
– Aggregation of multicast addresses is complicated
• Supporting higher level functionality is difficult
– IP Multicast: best-effort multi-point delivery service
– End systems responsible for handling higher level functionality
– Reliability and congestion control for IP Multicast complicated
• Deployment is difficult and slow
– ISP’s reluctant to turn on IP Multicast
16
Application-level Overlays
Site 2
Site 3 N
N ISP1
ISP2 N
Site 1
N
ISP3
N
• One per application
• Nodes are decentralized
• Network operations/management
may be centralized
CSci5221: Overlay, P2P and CDN
N Site 4
17
Example: End-System Multicast
• IP multicast still is not widely deployed
– Technical and business challenges
– Should multicast be a network-layer service?
• Multicast tree of end hosts
– Allow end hosts to form their own multicast tree
– Hosts receiving the data help forward to others
CSci5221: Overlay, P2P and CDN
18
End System Multicast
UMN
CMU
Stanford
Stan1
Stan2
Berk1
Berkeley
Overlay Tree
Berk2
Stan
1
CMU
Stan2
UMN
CSci5221: Overlay, P2P and CDN
Berk1
Berk2
19
DNS as an Application Overlay
DNS: Domain Name System:
• distributed database
implemented in hierarchy of
many name servers
• application-layer protocol host,
routers, name servers to
communicate to resolve names
(address/name translation)
– note: core Internet function
implemented as application-layer
protocol
– complexity at network’s “edge”
CSci5221: Overlay, P2P and CDN
• hierarchy of redundant
servers with time-limited
cache
• 13 root servers, each
knowing the global top-level
domains (e.g., edu, gov, com)
, refer queries to them
• each server knows the 13
root servers
• each domain has at least 2
servers (often widely
distributed) for fault
tolerance
• DNS has info about other
resources, e.g., mail servers
20
Refresher: Naming
• What do names do?
– identify objects
– define membership in a group
– specify a role
– convey knowledge of a secret
– ……
help locate objects: name-to-address binding & resolution
• Name vs. Address
According to Shoch (IEEE COMPCON’78)
– name: identifies what you want
– address: identifies where it is
– route: identifies how to get there
• Some recent alternative terminologies in literature:
– identifier, locator, ……
CSci5221: Overlay, P2P and CDN
21
Properties of Names
• Name space
– defines set of possible names
– consists of a set of name to value (attribute) bindings
– see, e.g., XML namespace [www.w3.org documentation]
•
•
•
•
•
•
Location transparent versus location-dependent
Flat versus hierarchical
Global versus local
Absolute versus relative
By architecture versus by convention
Unique versus ambiguous
CSci5221: Overlay, P2P and CDN
22
URI Syntax and Examples
• URI generic syntax:
foo://example.com:8042/over/there?name=ferret#nose
scheme
authority
path
query fragment
urn: example:animal:ferret:nose
•
•
URN namespace
URL examples:
http://cs.umn.edu
https://mybank.com:443/login_cgi?userid=me
mailto://[email protected]
sip:[email protected]:5060
ftp://ftp.foo.biz/public.txt
rtsp://unite.umn.edu:8080/csci5211_lecture_090606
URN examples:
urn:isbn:1-55860-832-X
urn:issn:0167-6423
urn:ietf:rfc:2141
urn:mpeg:mpeg7:schema:2001
urn:oid:2.16.840
urn:www.agxml.org:schemas:all:2:0
CSci5221: Overlay, P2P and CDN
23
Resolving Name, Locating Service/Object
URL
http://www-users.cselabs.umn.edu/classes/Spring-2013/csci5211/index.php?page=syllabus
UMN
DNS Server
tcp port 80
web
server
Service
Object
128.101.38.208
Network
File System
Server
128.101.38.208, tcp port 80
/web/classes/Spring-2013/csci5211/syllabus.html
CSci5221: Overlay, P2P and CDN
24
Domain Naming System
• Hierarchical Name Space
edu
umn
cs
com
mit
■■■
gov
mil
org
net
uk
fr
cisco ■ ■ ■yahoonasa ■ ■ ■nsf arpa ■ ■ ■navy acm ■ ■ ■ieee
ee physics
afer www
• Host Names
afer.cs.umn.edu
128.101.35.34
www.yahoo.com
non-authoritative answer
CSci5221: Overlay, P2P and CDN
www.yahoo.akadns.net
209.73.177.155
25
Name Servers
• Partition hierarchy into zones
edu
umn
cs
com
mit
■■
■
cisco
gov
yahoo nasa
■■
■
■■
■
mil
nsf arpa
■■
■
org
navy acm
net
uk
fr
ieee
■■
■
ee physics
afer www
• Each zone implemented
by two or more name
servers
Root
name server
UMN
name server
CS
name server ■ ■ ■
CSci5221: Overlay, P2P and CDN
■■■
Yahoo
name server
EE
name server
26
Domain Name Resolution and DNS
DNS: Domain Name System:
• distributed database
implemented in hierarchy of
many name servers
• application-layer protocol host,
routers, name servers to
communicate to resolve names
(address/name translation)
– note: core Internet function
implemented as application-layer
protocol
– complexity at network’s “edge”
CSci5221: Overlay, P2P and CDN
• hierarchy of redundant
servers with time-limited
cache
• 13 root servers, each
knowing the global top-level
domains (e.g., edu, gov, com)
, refer queries to them
• each server knows the 13
root servers
• each domain has at least 2
servers (often widely
distributed) for fault
tolerance
• DNS has info about other
resources, e.g., mail servers
27
DNS Name Servers
Why not centralize DNS?
• single point of failure
• traffic volume
• distant centralized
database
• maintenance
doesn’t scale!
CSci5221: Overlay, P2P and CDN
• no server has all nameto-IP address mappings
local name servers:
– each ISP, company has local
(default) name server
– host DNS query first goes to
local name server
authoritative name server:
– for a host: stores that host’s
IP address, name
– can perform name/address
translation for that host’s
name
28
DNS: Iterated Queries
recursive query:
• puts burden of name
resolution on
contacted name
server
• heavy load?
iterated query:
• contacted server
replies with name of
server to contact
• “I don’t know this
name, but ask this
server”
CSci5221: Overlay, P2P and CDN
root name server
iterated query
2
3
4
7
local name server intermediate name server
dns.aol.com
1
8
dns.umn.edu
5
6
authoritative name server
requesting host
dns.cs.umn.edu
homeboy.aol.com
afer.cs.umn.edu
29
Peer-to-Peer Networks:
Unstructured and Structured
• What is a peer-to-peer network/application?
• Unstructured Peer-to-Peer Networks
–
–
–
–
–
Napster
Gnutella
KaZaA
BitTorrent
Skype, pplive, …
• Distributed Hash Tables (DHT) and Structured
Networks
– Chord
– Kadmelia
– CAN, …
• What are the Key Challenges in P2P?
• Pros and Cons?
CSci5221: Overlay, P2P and CDN
30
Peer-to-Peer Applications:
How Did it Start?
• A killer application: Naptser
– “free” music over the Internet
• Key idea: share the content, storage and bandwidth of
individual (home) users
Internet
CSci5221: Overlay, P2P and CDN
31
P2P (Application) Model
• Each user stores a subset of files (content)
• Each user has access (can download) files from all
users in the system
Key Challenges in “pure” peer-to-peer model
• How to locate your peer & find what you want?
• Need some kind of “directory” or “look-up” service
E
F
D
E?
A
C
B
CSci5221: Overlay, P2P and CDN
– centralized
– distributed, using a hierarchal
structure
– distributed, using a flat
structure
– distributed, with no structure
(“flooding” based)
– distributed, using a “hybrid”
structured/unstructured
approach
32
Other Challenges
Technical:
•Scale: up to hundred of thousands or millions of machines
•Dynamicity: machines can come and go any time
•…
Social, economic & legal:
•Incentive Issues: free-loader problem
– Vast majority of users are free-riders
• most share no files and answer no queries
– A few individuals contributing to the “public good”
•Copyrighted content and piracy
•Trust & security issues
•…
CSci5221: Overlay, P2P and CDN
33
Unstructured P2P Applications
• Napster
– a centralized directory service
– peers directly download from other peers
• Gnutella
– fully distributed directory service
– discover & maintain neighbors, ad hoc topology
– flood & forward queries to neighbors (with bounded hops)
• KaZaA
–
–
–
–
–
exploit heterogeneity, certain peers as “super nodes”
two-tier hierarchy: when join, contact a super-node
smart query flooding
peer may fetch data from multiple peers at once
used by Skype (for directory service)
• Pros and Cons of each approach?
CSci5221: Overlay, P2P and CDN
34
Napster Architecture:
An Illustration
m5
E
m6
(centralized) directory service
F
E?
E
E?
m5
m1
m2
m3
m4
m5
m6
D
A
B
C
D
E
F
m4
C
A
B
m3
m1
m2
CSci5221: Overlay, P2P and CDN
35
Gnutella
• Ad-hoc topology
• Queries are flooded for bounded number of hops
• No guarantees on recall
xyz
xyz
Query: “xyz”
36
CSci5221: Overlay, P2P and CDN
KaZaA: Exploiting Heterogeneity
• Each peer is either a group
leader or assigned to a group
leader (supernode)
peer
supernode
– TCP connection between peer
and its group leader
– TCP connections between some
pairs of group leaders
• Group leader tracks the
content in all its children
• Q: how to select supernodes?
ordinary peer
group-leader peer
neighoring relationships
in overlay network
CSci5221: Overlay, P2P and CDN
37
BitTorrent & Video Distribution
• Designed for large file (e.g., video) downloads
– esp. for popular content, e.g. flash crowds
• Focused on efficient fetching, not search
– Distribute same file to many peers
– Single publisher, many downloaders
• Divide large file into many pieces
– Replicate different pieces on different peers
– A peer with a complete piece can trade with other peers
– Peer can (hopefully) assemble the entire file
• Allows simultaneous downloading
– Retrieving different parts of the file from different
peers at the same time
• Also includes mechanisms for preventing “free loading”
CSci5221: Overlay, P2P and CDN
38
BitTorrent Components
• Seed
– Peer with entire file
– Fragmented in pieces
• Leacher
– Peer with an incomplete copy of the file
• Torrent file
– Passive component
– Stores summaries of the pieces to allow peers
to verify their integrity
• Tracker
– Allows peers to find each other
– Returns a list of random peers
CSci5221: Overlay, P2P and CDN
39
BitTorrent: Overall Architecture
Web Server
Tracker
C
A
Peer
Peer
B
[Seed]
[Leech]
Downloade
r
CSci5221: Overlay, P2P and CDN
Peer
[Leech]
40
P2P Application Examples
• File Sharing Applications (esp.Mp3 music)
– Napster, Gnutella, KaZaA, eDonkey, eMule, KAD, …
• Video downloads, video-on-demand (VoD), largescale file distribution (e.g., software updates)
– BitTorrent & variants, …
• Skype (P2P VoIP & video conferencing)
• (real-time/near-real time) video broadcasting,
video streaming, …
– pplive, qqlive, …..
Please read the text books, and search on the Internet for more
details if you are interested in learning more about them!
CSci5221: Overlay, P2P and CDN
41
Structured P2P Networks
• Introduce a structured logical topology
• Abstraction: a distributed hash table data structure
– put(key, object); get (key)
– key: identifier of an object
– object can be anything: a data item, a node (host), a
document/file, pointer to a file, …
• Design Goals: guarantee on recall
– i.e., ensure that an item (file) identified is always found
– Also scale to hundreds of thousands of (or more) nodes
– handle rapid join/leave and failure of nodes
• Proposals
– Chord, CAN, Kademlia, Pastry, Tapestry, etc
CSci5221: Overlay, P2P and CDN
42
Key Ideas (Concepts & Features)
• Keys and node ids map to the same “flat” id space
– node ids are thus also (special) keys!
• Management (organization, storage, lookup, etc) of keys
using consistent hashing
– distributed, maintained by all nodes in the network
• (Logical) distance defined on the id space: structured!
– different DHTs use different distances/structures
• Look-up/Routing Tables (“finger table” in Chord)
– each node typically maintains O(log n) routing entries
– organizing using structured id space: more information
about nodes closer-by; less about nodes farther away
• Bootstrap, handling node joins/leaves or failures
– when node joins: needs to know at least one node in the system
• Robustness/resiliency through redundancy
CSci5221: Overlay, P2P and CDN
43
Lookup Service using DHT
DHT : distributed hash table
flat (“semantic-free”), circular
(a ring) id (identifier) space
object (name) space
m bits (e.g., m=160), M = 2m id’s
often with its own semantic
structure, e.g., domain names
map to id space via hashing
H(obj_name) = id_k
H(node_addr) = id_n
node (name/address) space
with its own physical topological
structure
CSci5221: Overlay, P2P and CDN
Why hashing?
44
Consistent Hashing:
Keys and their “Roots” or “Owner”
n39 Nodes
successor
• Each key has one or more
“root” or “owner”) nodes
• Each node has one or more
“closest“ neighboring nodes
Chord Example:
• root: called “successor”
k31
n10
k19
join
k44
n55
k1
k68
k253
- closest node following key
• when a new node joins: n234
- split key space and hand
k221
over keys
• when a node leaves/fails:
- successor takes over keys
Effect of hashing:
with N
nodes, K keys (objects), on average,
each node owns O(K/N) keys w/
high probability
CSci5221: Overlay, P2P and CDN
k79
n93
k114
n201
k189
n179
n179
leave k165
n137
k148
45
Kadmelia
• Logical distance: XOR(x,y) -> longest common prefix
• Structure: Kadmelia binary tree
– each node has subtrees of differing distances
subtrees of a node 0011……
CSci5221: Overlay, P2P and CDN
46
Kadmelia Routing Tables and Look-up
• Each node: maintains a “k-bucket” per subtree
– each k-bucket contains (routing info or entries about)
up to k nodes within its subtree (if the subtree is not
empty);
– empty if subtree is empty
• Each routing entry is a (IP, Port, Node id) triple
• Questions:
– how are the k-buckets and routing entries are built?
– which k nodes to maintain?
• Each node is also responsible for maintaining (key,
value) tuples that it “owns”, i.e.,
– those with keys that are closest to its node id
– also answering queries for these key-value tuples
• Routing/look-up operations
CSci5221: Overlay, P2P and CDN
47
Kadmelia Look-up Operations
• Look up the k-bucket with the subtree that “best” matches the
target id (longest prefix matching!)
• Pick one or more nodes in the k-bucket that are closest to target
– Kadmalia performs m parallel look-ups (e.g., m=3)
• Kadmelia uses iterative look-up: pick m best responses, continue …
When node 0011… wants to look up key 1110….
CSci5221: Overlay, P2P and CDN
48
Kadmalia/KAD Protocol: Some Details
Four Key Functions:
• PING: to test whether a node is online
• STORE: instruct a node to store a key
• FIND_NODE: takes an ID as an argument, a recipient returns k
closest nodes based on its current routing table
• FIND_VALUE: behaves like FIND_NODE, unless the recipient
has the <key, value> pair stored, it returns the stored value.
(Key, Value) publishing and maintenance:
• To store (key,value) pair, a participant locates the k closest
nodes to the key and sends them STORE request
• Additionally, each node re-publishes (key,value) pairs as
necessary to keep them alive
In all, a critical operation: finding k closest nodes to a given id!
An aside: use of caching to further improve performance!
CSci5221: Overlay, P2P and CDN
49
Other DHT Schemes
•
•
•
•
•
•
•
Chord
CAN (content addressable network)
Pastry
Tapestry
Viceroy
Leopard (locality-aware DHT)
……
(e.g., look up & read “DHT” in wikipedia)
CSci5221:
Overlay, P2P & CDN
50
Web Caching and CDN
CDN (content distribution network)
-- an application overlay (e.g., Akamai)
Design Space
• Caching (data-driven, passive)
– explicit
– transparent (hijacking connections)
• Replication (pro-active)
– server farms
– geographically dispersed (CDN)
Three Main CDN Providers (in North America, Europe):
• Akamai, Limelight, Level 3 CDN
CSci5221: Overlay, P2P and CDN
51
Dealing with (Internet) Scale : Content
Distribution Networks (CDNs)
Recall: one single “mega-server” can’t possibly handle all
requests for popular service
DNS
not enough bandwidth: Netflix video streaming at 2
Mbps per connection
 only 5000 connections over fastest possible
(10Gbs) connection to Internet at one server
 30 Million Netflix customers
too far from some users: halfway around the globe to
someone
reliability: single point of failure
A single server
doesn’t “scale”
52
Content Distribution Networks
• challenge: how to stream content (selected
from millions of videos) to hundreds of
thousands of simultaneous users?
• option 1: single, large “mega-server”
–
–
–
–
single point of failure
point of network congestion
long path to distant clients
multiple copies of video sent over outgoing link
….quite simply: this solution doesn’t scale
53
Content Distribution Networks
• challenge: how to stream content (selected from
millions of videos) to hundreds of thousands of
simultaneous users?
• option 2: store/serve multiple copies of videos at
multiple geographically distributed sites (CDN)
– enter deep: push CDN servers deep into many access
networks
• close to users
• used by Akamai, 1700 locations
– bring home: smaller number (10’s) of larger clusters in
POPs near (but not within) access networks
• used by Limelight
54
Content Distribution Network
aaa.com
bbb.com
ccc.com
Backend
servers
Cache
Geographically
distributed
surrogate
servers
Redirectors
Cl ients
CSci5221: Overlay, P2P and CDN
55
Redirection Overlay
Geographically distributed server clusters
R
R
R
R
Internet Backbone
R
R
R
R
Distributed request-redirectors
clients
CSci5221: Overlay, P2P and CDN
56
CDN: “simple” content access
scenario
Bob (client) requests video http://netcinema.com/6Y7B23V
 video stored in CDN at http://KingCDN.com/NetC6y&B23V
1. Bob gets URL for for video
http://netcinema.com/6Y7B23V
2. resolve http://netcinema.com/6Y7B23V
from netcinema.com
via Bob’s local DNS
2
web page
1
6. request video from 5
4&5. Resolve
KINGCDN server,
http://KingCDN.com/NetC6y&B23
streamed via HTTP
via KingCDN’s authoritative DNS,
3.
netcinema’s
DNS
returns
URL
netcinema.com
4
which returns IP address of KIingCDN
http://KingCDN.com/NetC6y&B23V
server with video
3
netcinema’s
authorative DNS
KingCDN.com
KingCDN
authoritative DNS
57
Redirection Techniques
• URL Rewriting
– embedded links
• DNS
– one name maps onto many addresses
• esp. useful for finding nearby servers & (coarse-grained)
locality-based load balancing
• Question: how to figure out geo-location of users (at DNS
query time)?
– works for both servers and reverse proxies
• HTTP
– requires an extra round trip
• Router, “Layer 4/7” (application) switches
– one address, select a server (reverse proxy)
– content-based routing (near client)
CSci5221: Overlay, P2P and CDN
58
CDN Cluster Selection Strategy
• challenge: how does CDN DNS select
“good” CDN node to stream to client
– pick CDN node geographically closest to client
– pick CDN node with shortest delay (or min # hops) to
client (CDN nodes periodically ping access ISPs,
reporting results to CDN DNS)
• alternative: let client decide - give client a
list of several CDN servers
– client pings servers, picks “best”
59
Akamai CDN: quickie
pioneered creation of CDNs circa 2000
now: 61,000 servers in 1,000 networks in
70 countries
delivers est 15-20% of all Internet traffic
runs its own DNS service (alternative to
public root, TLD, hierarchy)
hundreds of billions of Internet
interactions daily
more shortly….
3-60
Akamai CDN Overview
• More than 10s of thousands servers in more 1000s
networks globally
• Support a variety of services
– DNS resolution, web content delivery, web search, large
software update, media content distribution (music,
video, etc), …
basic operations
– user, local
DNS &
Akamai DNS
– User, Akamai
servers &
content
providers
local DNS
server
CSci5221: Overlay, P2P and CDN
61
Akamai CDN Architecture
CSci5221: Overlay, P2P and CDN
62
Case Study: Netflix
•
•
•
•
30% downstream US traffic in 2013
$1B quarterly revenue
2B hours viewing streamed video
owns very little infrastructure, uses 3rd party
services:
– own registration, payment servers
– Amazon (3rd party) cloud services:
• Netflix uploads studio master to Amazon cloud
• create multiple version of movie (different
encodings) in cloud, move to CDNs
• Cloud hosts Netflix web pages for user browsing
http://www.statisticbrain.com/netflix-statistics/
63
Case Study: Netflix
• uses three 3rd party CDNs to store, host,
stream Netflix content:
 Akamai, Limelight, Level-3
 can play each of the CDNs against each other in terms of
customer experience
 developing its own CDN (2012)
64
Case Study: Netflix
Amazon cloud
Netflix registration,
accounting servers
2. Bob browses
Netflix video
2
3
upload copies of
multiple versions of
video to CDNs
3. Manifest file
returned for
requested video
Akamai CDN
Limelight CDN
1
1. Bob manages
Netflix account
4. DASH
streaming
Level-3 CDN
65
Streaming Multimedia: DASH
• DASH: Dynamic, Adaptive Streaming over HTTP
• server:
– divides video file into multiple chunks
– each chunk stored, encoded at different rates
– manifest file: provides URLs for different chunks
• client:
– periodically measures server-to-client bandwidth
– consulting manifest, requests one chunk at a time
• chooses maximum coding rate sustainable given
current bandwidth
• can choose different coding rates at different points
in time (depending on available bandwidth at time)
66
Streaming Multimedia: DASH
• manifest file: provides URLs for different
chunks
Segment Info
Initialization Segment
http://www.e.com/ahs-5.3gp
Media Presentation
Media Segment 1
Period,
Period, start=0s
…
Period, start=100s
•start=100
•baseURL=http://www.e.com/
…
Representation 1
Representation 1
•bandwidth=500kbit/s
•width 640, height 480
…
500kbit/s
Segment Info
…
Representation 2
Period, start=295s
100kbit/s
duration=10s
Template:
./ahs-5-$Index$.3gs
…
…
start=0s
http://www.e.com/ahs-5Media 1.3gs
Segment 2
start=10s
http://www.e.com/ahs-5Media 2.3gs
Segment 3
start=20s
http://www.e.com/ahs-53.3gh
…
Media Segment 20
start=190s
http://www.e.com/ahs-5-20.3gs
Ack & ©: Thomas Stockhammer
67
Streaming Multimedia: DASH
• DASH: Dynamic, Adaptive Streaming over
HTTP
• “intelligence” at client: client determines
– when to request chunk (so that buffer starvation, or
overflow does not occur)
– what encoding rate to request (higher quality when more
bandwidth available)
– where to request chunk (can request from URL server
that is “close” to client or has high available bandwidth)
68
Let’s check out Netflix:

use wireshark to capture and view packets
being sent to/from my laptop

you’re not responsible for knowing how to use it,
but it’s a lot of fun
• logging in (authorization), content selection
– main webpage hosted by amazon cloud?
– use whois to see who server is
• select video and begin playing
– use whois to see who is serving video
– how far away is server (use traceroute, no info?
Ip2location.com)
69
Netflix: user adaptation to bandwidth
changes
• Question: how does Netflix client adapt to
changes in bandwidth (due to changing
congestion levels on network path)
• two client-based alternatives:
– get video from new CDN server (over new path)?
– change video streaming rate via DASH?
• experiment:
– connect to netflix
– systematically decrease bandwith from server to client
– measure how client responds: decrease rate or change servers
3-70
Netflix experiment
Akamai CDN
Public Internet
Bob’s home
network
Adjust bandwidth to Bob’s computer,
see how Netflix responds
Vijay Kumar Adhikari, Yang Guo, Fang Hao, Matteo Varvello,
Volker Hilt, Moritz Steiner, Zhi-Li Zhang, “Unreeling Netflix:
Understanding and Improving Multi-CDN Movie Delivery”,
INFOCOM, Orlando, FL, USA, March, 2012.
Limelight CDN
Level-3 CDN
71
Netflix experiment
Netflix seems to dynamically use alternate CDNs only for fail-over
72
Netflix experiment
Question: Does any one of Netflix’s three CDN
providers systematically outperform the other?
Answer: Non one CSN is uniformly better
Measurements taken at different locations: which CDN provides best Netflix service?
73
Video on Demand:
business perspective

home video delivery: quickly shifted from video
stores (e.g., Blockbuster) to Internet streaming
(Netflix, Amazon, Comcast, Apple, Google, AT&T, Hulu,
Verizon)
– Blockbuster: chapter11 bankruptcy, 2010
– Netflix DVD service: 15M in 2011, 7.5M in 2013
– Netflix VoD: 30M in 2013, adding ~2M customers annually
• two streaming business models both sending TV using
IP packets:
– over the top (OTT)
– in-network (Comcast Xfinity)
74
Video on Demand: OTT
Over-the-top: VoD provider uses public Internet
to deliver content (augmented by CDNs)
 use Internet best effort service (no
guarantees)
 ISPs (AT&T, Comcast, Version) relegated to
role of “bit pipes” - carrying traffic but not
offering “services”
 unicast HTTP (e.g., DASH)
 minimal infrastructure costs
 user subscription fee or advertising revenue
75
Video on Demand: in-network
in-network: access network owner (Comcast,
Verizon) provides VoD service to its customers
 high-quality user experience (QoE)
because ISP manages network
 servers in same edge network as viewers
 efficient network use: multicast possible
(one packet to many receivers)
 ISP pays infrastructure cost
OTT versus in-network: who will win, and why?
76
How to Meet the Requirements for
Today/Tomorrow’s Internet?
Some key requirements (“-ities”)
• Availability and reliability
– “Always on”, fault-tolerant, fast recovery from failures, …
• Quality-of-service (QoS) for applications
– fast response time, adequate quality for VoIP, IPTV, etc.
• Scalability
– millions or more of users, devices, …
• Mobility
– untethered access, mobile users, devices, …
• Security (and Privacy?)
– protect against malicious attacks, accountability of user actions?
• Manageability
– configure, operate and manage networks
– trouble-shooting network problems
• Flexibility, Extensibility, Evolvability, ……?
– ease of new service creation and deployment?
– evolvable to meet future needs?
CSci5221: Overlay, P2P and CDN
77
Key Issues, Challenges, Solutions …
A More Network-Centric View
• New Naming/Addressing?
– Separating “identifiers” and “locators” to better support
mobility
– “semantic-free” flat id space ?
– Data centric?
– Role of “search” on naming, etc.
• Scalable and Robust Routing
–
–
–
–
Better and more adaptive to failures, and other network events
Also better support for network management, security, …
how to perform routing on “flat id” space?
Or shall we decouple routing from “naming” or “addressing” ?
• Manageability
– “Centralized” approach
– …?
• Security (and Privacy?)
– More “accountable” networks, e.g., through “naming,” or id
management?
– …?
CSci5221:
Overlay, P2P and CDN
78