Load Balancer

Transcript Load Balancer

CS433/533
Computer Networks
Lecture 12
CDN
2/16/2012
1
Admin
 Programming assignment 1 status
2
Recap: High-Performance Network Servers
 Avoid blocking (so that we can reach bottleneck
throughput)

Introduce threads
 Limit unlimited thread overhead
 Thread pool, async io
 Coordinating data access
 synchronization (lock, synchronized)
 Coordinating behavior: avoid busy-wait
 Wait/notify; FSM
 Extensibility/robustness
 Language support/Design for interfaces
3
Recap: Operational Laws
 Utilization law: U = XS
 Forced flow law: Xi = Vi X
 Bottleneck device: largest Di = Vi Si
 Little’s Law: Qi = Xi Ri
 Bottleneck analysis:
1
X ( N )  min{ Dmax
, DNZ }
R( N )  max{ D, NDmax  Z }
4
Recap: Why Multiple Servers?
 Scalability
 beyond single server capability and geolocation of a single
server
 Redundancy and fault tolerance
 Administration/maintenance (e.g., incremental upgrade)
 Redundancy (e.g., to handle failures)
 System/software architecture
 Resources may be naturally distributed at different
machines (e.g., run a single copy of a database server due to
single license; access to resource from third party)
 Security (e.g., front end, business logic, and database)
5
Recap: Load Direction: Basic Architecture
 Four components

Server state
monitoring
Site A
• Load (incl. failed or
not); what requests it
can serve

Path properties
between clients and
servers
Site B
?
Internet
• E.g., Bw, delay, loss,
network cost

Server selection alg.
• Alg. to choose site(s) and
server(s)

Server direction
mechanism
Client
• Inform/direct a client to
chosen server(s)
6
Recap: Load Direction
server
state
specific
request of
a client
path property
between
servers/clients
server
selection
algorithm
notify
client
about selection
(direction
mechanism)
7
Basic Direction Mechanisms
 Implicit
 IP anycast
• Same IP address shared by multiple servers and announced at
different parts of the Internet. Network directs different clients
to different servers (e.g., Limelight)


Load balancer (smart switch) indirection
Reverse proxy
 Explicit
 Mirror/server listing: client is given a list of candidate
DNS names
 DNS name resolution gives a list of server addresses
 A single server IP address may be a virtual IP address
for a cluster of physical servers (smart switch)
8
Direction Mechanisms are Often Combined
DNS name1
IP1
DNS name2
IP2
Cluster1
in US East
Cluster2
in US West
Cluster2
in Europe
Load
balancer
Load
balancer
Load
balancer
IPn
proxy
Load
balancer
servers
9
Example: Netflix
10
Example: Netflix Manifest File
 Client player authenticate and then downloads manifest file
from servers at Amazon Cloud
11
Example: Netflix Manifest File
12
Example: wikipedia architecture
http://wikitech.wikimedia.org/images/8/81/Bergsma_-_Wikimedia_architecture_-_2007.pdf
13
DNS Indirection and Rotation
157.166.226.25
router
157.166.226.26
IP address
of cnn.com
157.166.255.18
157.166.226.25
157.166.226.26
IP address
of cnn.com
157.166.226.26
157.166.226.25
DNS server
for cnn.com
14
Example: Amazon Elastic Load Balancing
 Use the elb-create-lb command
to create an Elastic Load Balancer.
 Use the elb-register-instances
-with-lb command to register the
Amazon EC2 instances that you
want to load balance with the
Elastic Load Balancer.
 Elastic Load Balancing automatically
checks the health of your load
balancing Amazon EC2 instances.
You can optionally customize the
health checks by using the
elb-configure-healthcheck command.
 Traffic to the DNS name provided by
the Elastic Load Balancer is automatically
distributed across your load balanced, healthy
Amazon EC2 instances.
http://aws.amazon.com/documentation/elasticloadbalancing/
15
Details: Step 1
1. Call CreateLoadBalancer with the following parameters:


AvailabilityZones = us-east-1a
Listeners
•
•
•
•
Protocol = HTTP
InstancePort = 8080
LoadBalancerPort = 80
LoadBalancerName = MyLoadBalancer
The operation returns the DNS name of your LoadBalancer. You can then
map that to any other domain name (such as www.mywebsite.com) using
a CNAME or some other technique.
PROMPT> elb-create-lb MyLoadBalancer --headers -listener "lb-port=80,instance-port=8080,protocol=HTTP" -availability-zones us-east-1a
Result:
DNS-NAME DNS-NAME
DNS-NAME MyLoadBalancer-2111276808.us-east-1.elb.amazonaws.com
http://docs.amazonwebservices.com/ElasticLoadBalancing/latest/DeveloperGuide/
16
Details: Step 2
2. Call ConfigureHealthCheck with the following parameters:


LoadBalancerName = MyLoadBalancer
Target = http:8080/ping
•




NoteMake sure your instances respond to /ping on port 8080 with an HTTP 200 status code.
Interval = 30
Timeout = 3
HealthyThreshold = 2
UnhealthyThreshold = 2
PROMPT> elb-configure-healthcheck MyLoadBalancer --headers --target
"HTTP:8080/ping" --interval 30 --timeout 3 --unhealthy-threshold 2 -healthy-threshold 2
Result:
HEALTH-CHECK TARGET INTERVAL TIMEOUT HEALTHY-THRESHOLD UNHEALTHY-THRESHOLD
HEALTH-CHECK HTTP:8080/ping 30 3 2 2
17
Details: Step 3
3. Call RegisterInstancesWithLoadBalancer with the following parameters:
 LoadBalancerName = MyLoadBalancer
 Instances = [ i-4f8cf126, i-0bb7ca62 ]
PROMPT> elb-register-instances-with-lb MyLoadBalancer --headers -instances i-4f8cf126,i-0bb7ca62
Result:
INSTANCE INSTANCE-ID
INSTANCE i-4f8cf126
INSTANCE i-0bb7ca62
18
Discussion
 Advantages and disadvantages of using DNS
19
Clustering with VIP: Basic Idea
 Clients get a single service IP address,
called virtual IP address (VIP)
 A virtual server (also referred to as load
balancer, vserver or smart switch) listens
at VIP address and port
 A virtual server is bound to a number of
physical servers running in a server farm
 A client sends a request to the virtual
server, which in turn selects a physical
server in the server farm and directs this
request to the selected physical server
20
VIP Clustering
Goals
server load balancing
failure detection
access control filtering
priorities/QoS
request locality
transparent caching
L4: TCP
L7: HTTP
SSL
etc.
Clients
virtual IP
addresses
(VIPs)
smart
switch
server array
What to switch/filter on?
L3 source IP and/or VIP
L4 (TCP) ports etc.
L7 URLs and/or cookies
L7 SSL session IDs
Big Picture
22
Load Balancer (LB): Basic Structure
RIP1
Server1
S=client D=VIP
Client
VIP
LB
RIP2
Server2
RIP3
Server3
Problem of the basic structure?
23
Problem
S=client D=VIP
 Client to server packet has VIP as destination
address, but real servers use RIPs
o
o
if LB just forwards the packet from client to a real
server, the real server may drop the packet
Reply from real server to client has real server IP as
source -> client will drop the packet
state: listening
address: {*.6789, *:*}
completed connection queue: C1; C2
sendbuf:
recvbuf:
state: established
address: {128.36.232.5:6789, 198.69.10.10.1500}
sendbuf:
recvbuf:
…
state: established
address: {128.36.232.5:6789, 198.69.10.10.1500}
sendbuf:
recvbuf:
…
Real Server TCP socket space
24
Solution 1: Network Address
Translation (NAT)
 LB does
rewriting/
translation
 Thus, the LB
is similar to a
typical NAT
gateway with
an additional
scheduling
function
Load Balancer
25
Example Virtual Server via NAT
LB/NAT Flow
27
LB/NAT Flow
28
SLB/NAT Flow: Details
1.
2.
3.
4.
5.
When a user accesses a virtual service provided by the server
cluster, a request packet destined for the virtual IP address
(the IP address to accept requests for virtual service) arrives at
the load balancer.
The load balancer examines the packet's destination address and
port number. If they match a virtual service in the virtual server
rule table, a real server is selected from the cluster by a
scheduling algorithm and the connection is added to hash table
that records connections. Then, the destination address and the
port of the packet are rewritten to those of the selected
server, and the packet is forwarded to the server. When an
incoming packet belongs to an established connection, the
connection can be found in the hash table and the packet is
rewritten and forwarded to the right server.
The request is processed by one of the physical servers.
When response packets come back, the load balancer rewrites
the source address and port of the packets to those of the
virtual service. When a connection terminates or timeouts, the
connection record is removed from the hash table.
A reply is sent back to the user.
29
LB/NAT Advantages and
Disadvantages
 Advantages:
o Only one public IP address is needed for the
load balancer; real servers can use private IP
addresses
o Real servers need no change and are not aware
of load balancing
 Problem
o The load balancer must on the critical path
o The load balancer may become the bottleneck
due to load to rewrite request and response
packets
• Typically, rewriting responses has a lot more load
because there are typically a lot more response
packets
LB with Direct Reply
VIP
Server1
Client
VIP
LB
Server2
Server3
Direct reply
Each real server uses VIP
as its IP address
31
LB/DR Architecture
load
balancer
Connected
by a single
switch
Why IP Address Matters?
VIP
 Each network interface card listens to an assigned MAC address
 A router is configured with the range of IP addresses connected
to each interface (NIC)
 To send to a device with a given IP, the router needs to
translate IP to MAC (device) address
 The translation is done by the Address Resolution Protocol
(ARP)
33
ARP Protocol
 ARP is “plug-and-play”:
o nodes create their ARP tables without
intervention from net administrator
 A broadcast protocol:
o Router broadcasts query frame, containing
queried IP address
• all machines on LAN receive ARP query
o
34
Node with queried IP receives ARP frame, replies
its MAC address
ARP in Action
S=client D=VIP
Router R
VIP
- Router broadcasts ARP broadcast query: who has VIP?
- ARP reply from LB: I have VIP; my MAC is MACLB
- Data packet from R to LB: destination MAC = MACLB
35
LB/DR Problem
Router R
VIP
VIP
VIP
VIP
ARP and race condition:
• When router R gets a packet with dest. address VIP, it broadcasts
an Address Resolution Protocol (ARP) request: who has VIP?
• One of the real servers may reply before load balancer
Solution: configure real servers to not respond to ARP request
36
LB via Direct Routing
 The virtual IP address is shared by real servers
and the load balancer.
 Each real server has a non-ARPing, loopback alias
interface configured with the virtual IP address,
and the load balancer has an interface configured
with the virtual IP address to accept incoming
packets.
 The workflow of LB/DR is similar to that of
LB/NAT:
o
the load balancer directly routes a packet to the selected
server
• the load balancer simply changes the MAC address of the data frame to that
of the server and retransmits it on the LAN (how to know the real server’s
MAC?)
o
When the server receives the forwarded packet, the server
determines that the packet is for the address on its loopback
alias interface, processes the request, and finally returns the
result directly to the user
LB/DR Advantages and
Disadvantages
 Advantages:
o Real servers send response packets to clients
directly, avoiding LB as bottleneck
 Disadvantages:
o Servers must have non-arp alias interface
o The load balancer and server must have one of
their interfaces in the same LAN segment
Example Implementation of LB
 An example open source implementation is
Linux virtual server (linux-vs.org)
• Used by
– www.linux.com
– sourceforge.net
– wikipedia.org
• More details on ARP problem:
http://www.austintek.com/LVS/LVSHOWTO/HOWTO/LVS-HOWTO.arp_problem.html
o
Many commercial LB servers from F5, Cisco, …
 More details please read chapter 2 of Load
Balancing Servers, Firewalls, and Caches
39
Question to Think About
 How do you test if Amazon ELB uses
LB/NAT or LB/DR?
40
Discussion: Problem of the
Load Balancer Architecture
Server1
S=client D=VIP
Client
VIP
LB
Server2
Server3
A major remaining problem is that the LB
becomes a single point of failure (SPOF).
41
Solutions
 Redundant load balancers
o
E.g., two load balancers
 Fully distributed load balancing
o e.g., Microsoft Network Load Balancing (NLB)
42
Microsoft NLB
 No dedicated load balancer
 All servers in the cluster receive all packets
 All servers within the cluster simultaneously run a
mapping algorithm to determine which server
should handle the packet. Those servers not
required to service the packet simply discard it.
 Mapping (ranking) algorithm: computing the
“winning” server according to host priorities,
multicast or unicast mode, port rules, affinity,
load percentage distribution, client IP address,
client port number, other internal load information
http://technet.microsoft.com/en-us/library/cc739506%28WS.10%29.aspx
43
Discussion
 Compare the design of using Load Balancer
vs Microsoft NLB
44
Forward Cache/Proxy
 Web caches/proxy
placed at entrance of
an ISP
 Client sends all http
requests to web cache


if object at web
cache, web cache
immediately returns
object in http
response
else requests object
from origin server,
then returns http
response to client
origin
server
client
client
Proxy
server
origin
server
45
Forward Web Proxy/Cache
app.
server
 Web caches give good
performance because very
often
 a single client
repeatedly accesses
the same
document
 a nearby client also
accesses the same
document
 Cache Hit ratio
increases
logarithmically with
number of users
C0
ISP
cache
ISP
cache
client 4
client 1
client 5
client 2
client 3
client 6
46
Benefits of Forward Web Caching
Assume: cache is “close”
to client (e.g., in same
network)
 smaller response time:
cache “closer” to
client
 decrease traffic to
distant servers

link out of
institutional/local ISP
network often
bottleneck
origin
servers
public
Internet
1.5 Mbps
access link
institutional
network
10 Mbps LAN
institutional
cache
47
What Went Wrong with Forward
Web Caches?
 Web protocols evolved extensively to
accommodate caching, e.g. HTTP 1.1
 However, Web caching was developed with a
strong ISP perspective, leaving content providers
out of the picture


It is the ISP who places a cache and controls it
ISPs only interest to use Web caches is to reduce
bandwidth
48
Outline
 Load direction/distribution
o
o
o
Basic load direction mechanisms
Path properties (to be covered later)
Case studies: Content Distribution Networks,
Akamai, and YouTube
49
Content Distribution Networks
 Content Distribution Networks (CDNs)
provide examples of Internet-scale load
distribution for content publishers
 CDN Design Perspective
 Performance scalability (high throughput, going
beyond single server throughput) and
 Geographic scalability (low propagation latency,
going to close-by servers)
 Low cost operation
50
Akamai
 Akamai – original and largest commercial CDN
operates around 91,000 servers in over 1,000
networks in 70 countries
 Akamai (AH kuh my) is Hawaiian for
intelligent, clever and informally “cool”.
Founded Apr 99, Boston MA by MIT students
 Akamai evolution:
o
o
o
o
Files/streaming (our focus at this moment)
Secure pages and whole pages
Dynamic page assembly at the edge (ESI)
Distributed applications
51
Akamai Scalability Bottleneck
See Akamai 2009 investor analysts meeting
52
Basic of Akamai Architecture
 Content publisher (e.g., CNN, NYTimes)
o
o
provides base HTML documents
runs origin server(s)
 Akamai runs
o edge servers for hosting content
• Deep deployment into 1000 networks
o
customized DNS redirection servers to select
edge servers based on
• closeness to client browser
• server load
53
Linking to Akamai
 Originally, URL Akamaization of embedded
content: e.g.,
<IMG SRC= http://www.provider.com/image.gif >
changed to
<IMGSRC = http://a661. g.akamai.net/hash/image.gif>
Note that this DNS redirection unit is per customer, not individual files.
 URL Akamaization is becoming obsolete and
supported mostly for legacy reasons
o
Currently most content publishers prefer to use
DNS CNAME to link to Akamai servers
• a CNAME is an alias
54
Akamai Load Direction Flow
Hierarchy of CDN
DNS servers
Internet
Customer DNS
servers
Multiple redirections to find
nearby edge servers
Web replica servers
(3)
(4)
Client is given 2 nearby web
(2)
Client gets CNAME
entryservers (fault
replica
tolerance)
with domain name in Akamai
Client requests
site
LDNS
(5)
(6)
(1)
Web client
More details see “Global hosting system”: FT Leighton, DM Lewin –
US Patent 6,108,703, 2000.
55
Exercise: Zoo machine
 Check any web page of New York Times and
find a page with an image
 Find the URL
 Use
%dig +trace +recurse
to see Akamai load direction
56
Akamai Load Direction
57
Akamai Load Redirection Framework
edge servers
clients
If the directed edge server does not have requested content,
the edge server goes back to the original server (source) .
58
Load Direction Formulation: Input
 Potential related input:
o
p(m, e): path properties (from a client site m to
an edge sever e)
• Akamai might use a one-hop detour routing (see
akamai-detour.pdf)
o
o
o
o
akm: request arrival rate from client site m to
publisher k
uk: service rate for requests for publisher k
xe: load on edge server e
caching state of a server e
59
Load Direction Formulation
 Details of Akamai algorithms are
proprietary
 So what we discuss is our formulation and
the measurements of some researchers.
Edge servers
Request client sites
akYale akATT1
60
Load Direction: Control Parameters
 Control interval: T
 Mapping from client
sites to servers
o
server pool Smk(t): the
pool of edge servers
that can be assigned to
client site m to
publisher k, at time t
akYale akATT1
61
Load Direction: Comments
 An algorithm (column 12 of Akamai Patent) at a
local DNS server
o
o
o
o
Compute the load to each publisher k (called serial
number)
Sort the publishers from increasing load
For each publisher, associate a list of random servers
generated by a hash function
Assign the publisher to the first server that does not
overload
 We can formulate more complex versions
62
Experimental Study of Akamai Load Balancing
 Methodology
o 2-months long measurement
o 140 PlanetLab nodes (clients)
• 50 US and Canada, 35 Europe, 18 Asia, 8 South America, the rest
randomly scattered
o
Every 20 sec, each client queries an appropriate CNAME
for Yahoo, CNN, Fox News, NY Times, etc.
Akamai Low-Level
DNS Server
Akamai
Web replica 1
Web client
Akamai
Web replica 2
.……
Akamai
Web replica 3
See http://www.aqualab.cs.northwestern.edu/publications/Ajsu06DBA.pdf
63
Server Pool: to Yahoo
Client 1: Berkeley
Target: a943.x.a.yimg.com (Yahoo)
Client 2: Purdue
Web replica IDs
Web replica IDs
day
06/1/05 16:16
night
64
Server Pool (to Yahoo)
65
Number of Akamai Web Replicas
Server Pool: Multiple Akamai Hosted Sites
Clients
66
Load Balancing Dynamics
Berkeley
Brazil
Korea
67
Redirection Effectiveness:
Measurement Methodology
9 Best
Akamai
Replica
Servers
………
ping
ping
Akamai Low-Level
DNS Server
ping
Planet Lab Node
ping
68
Do redirections reveal network
conditions?
 Rank = r1+r2-1
o
o
16 means perfect correlation
0 means poor correlation
MIT and Amsterdam
are excellent
Brazil is poor
69
Server Diversity for Yahoo
Majority of PL nodes
see between 10 and 50
Akamai edge-servers
Nodes far away
from Akamai
hot-spots
70
Good overlay-to-CDN
mapping candidates
Akamai Streaming Architecture
A content publisher (e.g., a
radio or a TV station) encodes
streams and transfer them
to entry points
Group a set of streams (e.g.,
some popular some not) into a
bucket called a portset. A set of
reflectors will distribute a given
portset.
When a user watches a stream
from an edge server, the
server subscribes to a reflector
Compare with Web architecture.
71
Akamai Streaming: Resource Naming
 Each unique stream is encoded by an URL
called Akamai Resource Locator (ARL)
portset
Stream ID
Customer#
(NBA)
Live
Media
service
Windows
media
player
mms://a1897.l3072828839.c30728.g.lm.akamaistream.net
/D/1897/30728/v0001/reflector:28839
72
Akamai Streaming Load Direction
 From ARL to edge server
o
Similar to web direction
 From edge server to reflector
o if (stream is active) then
forward to client
else if (VoD) then
fetch from original server
else
using Akamai DNS to query
portset+region code
73
Streaming Redirection Interval
40% use 30 sec
- 10% does not have any redirection (default edge server
cluster in Boston 72.246.103.0/24 and 72.247.145.0/24)
-
74
Overlapping of Servers
75
Testing Akamai Streaming Load Balancing
(a) Add 7 probing machines to the same edge server
(b) Observe slow down
(c) Notice that Akamai removed the edge server from DNS;
probing machines stop
76
http://video.google.com/videoplay?docid=-6304964351441328559#
You Tube
 02/2005: Founded by Chad Hurley, Steve
Chen and Jawed Karim, who were all early
employees of PayPal.
 10/2005: First round of funding ($11.5 M)
 03/2006: 30 M video views/day
 07/2006: 100 M video views/day
 11/2006: acquired by Google
 10/2009: Chad Hurley announced in a blog
that YouTube serving well over 1 B video
views/day (avg = 11,574 video views /sec )
77
Pre-Google Team Size
 2 Sysadmins
 2 Scalability software architects
 2 feature developers
 2 network engineers
 1 DBA
 0 chefs
78
YouTube Design Flow
while (true)
{
identify_and_fix_bottlenecks();
drink();
sleep();
notice_new_bottleneck();
}
79
YouTube Major Components
 Web servers
 Video servers
 Thumbnail servers
 Database servers
o
Will cover the social networking/database
bottleneck/consistency issues later in the
course
80
YouTube: Web Servers
 Components
 Netscaler load balancer; Apache;
Python App Servers; Databases
 Python
 Web code (CPU) is not
bottleneck
 JIT to C to speedup
 C extensions
 Pre-generate HTML responses
 Development speed more
important
NetScaler
Apache
Web
servers
Python
App Server
Databases
81
YouTube: Video Server
See “Statistics and Social Network of YouTube Videos”, 2008.
82
YouTube: Video Popularity
See “Statistics and Social Network of YouTube Videos”, 2008.
83
YouTube: Video Popularity
How to design
a system to handle
highly skewed
distribution?
See “Statistics and Social Network of YouTube Videos”, 2008.
84
YouTube: Video Server Architecture
 Tiered architecture
o
CDN servers (for popular videos)
• Low delay; mostly in-memory operation
o
YouTube servers (not popular 1-20 per day)
CDN
Most popular
YouTube
Colo 1
Request
Others
YouTube
Colo N
85
YouTube Redirection Architecture
YouTube servers
86
YouTube Video Servers
 Each video hosted by a mini-cluster
consisting of multiple machines
 Video servers use the lighttpd web server
for video transmission:
 Apache had too much overhead (used in the first few
months and then dropped)
 Async io: uses epoll to wait on multiple fds
 Switched from single process to multiple process
configuration to handle more connections
87
Thumbnail Servers
 Thumbnails are served by a few machines
 Problems running thumbnail servers
o A high number of requests/sec as web pages
can display 60 thumbnails on page
o Serving a lot of small objects implies
• lots of disk seeks and problems with file systems
inode and page caches
• may ran into per directory file limit
• Solution: storage switched to Google BigTable (we will
cover this later)
88
Thumbnail Server Software
Architecture
 Design 1: Squid in front of Apache
o
Problems
• Squid worked for a while, but as load increased
performance eventually decreased: Went from 300
requests/second to 20
• under high loads Apache performed badly, changed to
lighttpd
 Design 2: lighttpd by default (By default
lighttpd uses a single thread)
o
Problem: often stalled due to I/O
 Design 3: switched to multiple processes
contending on shared accept
o
Problems: high contention overhead/individual
caches
89
Thumbnails Server: lighttpd/aio
90
Discussion: Problems of Traditional
Content Distribution
DNS
app.
server
C0
client
1
client
2
client
3
client
n
91
Objectives of P2P
 Share the resources
(storage and
bandwidth) of
individual clients to
improve
scalability/robustness
Internet
 Bypass DNS to find
clients with resources!

examples: instant
messaging, skype
92
P2P
 But P2P is not new
 Original Internet was a p2p system:



The original ARPANET connected UCLA,
Stanford Research Institute, UCSB, and Univ.
of Utah
No DNS or routing infrastructure, just
connected by phone lines
Computers also served as routers
 P2P is simply an iteration of scalable
distributed systems
Backup Slides
94

Load Balancer

Transcript Load Balancer

Directory