Content Distribution Networks

Download Report

Transcript Content Distribution Networks

Content Distribution Networks
1
Replication Issues
• Request distribution: how to transparently
distribute requests for content among
replication servers.
• Server selection: how to select a server
replica for a given request.
• Content placement: how to decide how
many replicas of given content to have
and on which servers to place these
replicas.
2
Request distribution
• Transparent replication: techniques that
require no user involvement or even
awareness of the underlying replication
scheme.
• Clients use a single logical name when
requesting content.
• Basic issue: redirection of logically
identical requests to distinct servers.
3
Request Redirection
•
•
•
•
•
4
At the client
At the intermediate proxies
At the DNS system
At the primary origin server
Somewhere in the network
Content-blind request distribution
with Full Replication
•
•
•
•
5
Client redirection
Redirection by a balancing switch
Redirection through DNS servers
anycast.
Client redirection (1/2)
• Use the Web client’s DNS.
6
Client redirection (2/2)
• Client redirection at the client.
• Use Web site DNS to return a list of server
replicas.
• The entire list is passed to the client by the
client DNS.
• Client selects server replica to contact.
7
Advantages of client schemes
• Client can make the server selection.
• Client can compare routing paths,
measure response times, packet loss
rates or other metrics.
• Does not interfere with the DNS caching.
• Problem: server selection in performed by
the client (sub-optimal selection of
servers).
8
Balancing switch redirection
• Hardware-based solution to redirect
requests.
• “Balancer” is placed in front of a server
farm.
• Balancer modifies
addresses.
9
Switch-based redirection
•
•
•
•
10
L3, L4 switch: IP balancing
Uses standard clients, DNS and Web servers.
Low geographical scalability.
IBM Network dispatcher.
Web Site DNS redirection
• Many DNS implementations allow the Web
site DNS to map a host domain name to a
set of IP addresses and chose one of
them for every query.
11
Web site DNS redirection: pros and
cons
• Scales well geographically.
• No modifications of the client or DNS are
needed.
• DNS response caching complicates the
management of caching replicas and
request distribution.
12
Anycast
• Multiple physical servers use a single IP
addr called anycast addr.
• Each server advertises both the anycast
addr and its regular addr.
• Routers build paths that lead to the
nearest anycast member-server.
13
Anycast
14
Content-blind request distribution
with partial replication
• Servers may have the entire replica of the
web site.
• Full replication is often used but imposes
considerable overhead.
• Typically only a small fraction of content is
responsible for most of the requests.
• Partial replication.
15
Surrogates as server replicas
• Content requests are distributed among
surrogates that are distinct from the origin
servers.
• A surrogate can fulfill the posed request.
• Otherwise the object is retrieved from the
origin server.
• DNS redirection is often used for
distributing requests among the
surrogates.
16
Partial replication through
surrogates
17
Back-end distributed file systems
• Use a distributed file system to allow each
replication server access the same,
shared file set.
18
Content Delivery Networks
• CDN: agents of content providers
• A CDN signs up individual content
providers for scalable content delivery and
delivers their content to any client that
accesses the respective Web sites.
• CDN customers, CDN clients.
• The latter download content from the CDN
that the former provide to the CDN.
19
Benefits to content providers
• Global reach: A CDN serves content from
multiple CDN servers deployed around the
globe. By signing up with a CDN, a content
provider gains instant presence around the
globe at virtually no upfront cost.
• Flash event protection: Sometimes a Web site
experiences a sudden increase in demand
called a flash event. A CDN offers an easy way
to prepare for a predictable flash event or to
protect from an unforeseen one.
20
CDN benefits
• A CDN is an intermediate layer of infrastructure between
origin servers and clients (middleware).
• A CDN can achieve scalable content delivery by
distributing load among its servers, by serving client
requests from servers that are close to requesters, and
by bypassing congested network paths.
• CDN infrastructure is shared among multiple content
provider sites.
• CDN has close relationship with the underlying networks.
• A technical consequence of shared infrastructure is that
CDNs must implement a mechanism for finding the
origin server for a given piece of content.
• close relationship with underlying networks: CDNs place
their servers within PoPs or backbone nodes of ISPs.
21
Types of CDNs
• Classification: relationship to ISPs, relationship
to customers, mechanisms for delivering
requests to CDN servers.
• Multi ISP, single ISP.
• CDN with hosting service (CDN servers for
relaying content from origin servers and the
origin servers themselves, customer maintains a
staging server).
• Relaying CDNs: origin servers remain external
to the CDN.
22
CDN types
23
Relaying CDN: 1st hit @ origin
24
Relaying CDN: 1st hit @ CDN
25
Request delivery mechanisms
• DNS outsourcing.
• Customer delegates the
DNS service for its
domain to the CDN.
• Customer’s DNS replies
(when queried) with a
point to the CDN DNS.
• CDN DNS resolves query
to one of the CDN
servers.
26
Relaying CDNs with 1st hit @ origin
• Embedded urls that the CDN is supposed
to deliver use host names that are
controlled by the CDN DNS.
• Alternatively, the embedded content uses
domain names that belong to the CDN.
Method adopted by Akamai.
27
Comparing CDN types
• Origin-first CDNs require that embedded
urls use distinct domains from container
pages.
• Relative urls are avoided.
• CDN-first CDNs allow embedded objects
to share the same domain name as the
container page.
• Origin-first CDN: two tcp connections are
required.
28
Request distribution in CDNs
• DNS/Balancing switch redirection.
• CDN DNS returns the IP addr of the
balancing switch.
29
Request distribution in CDNs
• Two-level DNS redirection
• This architecture distributes the DNS load
among leaf DNS servers and allows client
DNS servers to use nearby DNS servers for
queries while they cache high-level DNS
responses.
30
Problems in DNS-based request
distribution
• Originator problem: large networks often
have a few DNS servers to handle clients’
requests.
• The client may be far away from the client
DNS.
• Hidden load factor problem.
• Client DNS masking problem (recursive
resolution of requests).
31
Server Selection Metrics
•
•
•
•
Proximity metrics
Server load metrics
Aggregate metrics
Passive measurements obtain metrics by simply
observing the normal operation of the system.
• Active measurements involve actions that the
system performs only for the purpose of
obtaining the metric.
• Synchronous/Asynchronous measurements.
32
Proximity metrics
• Geographical distance
• Number of network routers
• Number of AS
http://www.caida.org/home/
33
Server Load Metrics
•
•
•
•
Number of connections
Number of requests
Ready queue length
Response time
>> uptime
5:12pm up 30 day(s),
0.11, 0.09, 0.09
34
7:01,
3 users,
load average:
Aggregate metrics
•
•
•
•
35
Tcp ping latency
Icmp ping latency
http request latency
Download latency, download time