13- Clustering and Load Balancing

Download Report

Transcript 13- Clustering and Load Balancing

Clustering and Load Balancing
Outline
• Introduction
• Linux Virtual Server
• Microsoft load balancing solution
Introduction
• Explosive Growth of the Internet
– 100% annual growth rate
• Sites receiving unprecedented workload
– Yahoo! 625 million views per day
– AOL Web cache system receiving 5 billion
requests per day
Introduction
• load balancing is a technique to spread work between many
computers, processes, disks or other resources in order to get optimal
resource utilization and decrease computing time.
• A load balancer can be used to increase the capacity of a server farm
beyond that of a single server.
• It can also allow the service to continue even in the face of server
down time due to server failure or server maintenance.
• A load balancer consists of a virtual server (also referred to as vserver
or VIP) which, in turn, consists of an IP address and port.
• virtual server is bound to a number of physical services running on the
physical servers in a server farm.
• A client sends a request to the virtual server, which in turn selects a
physical server in the server farm and directs this request to the
selected physical server.
Introduction (cont.)
• Different virtual servers can be configured for different sets of physical
services, such as TCP and UDP services in general.
• Application specific virtual server may exist to support HTTP, FTP,
SSL, DNS, etc.
• The load balancing methods manage the selection of an appropriate
physical server in a server farm.
• Persistence can be configured on a virtual server; once a server is
selected, subsequent requests from the client are directed to the same
server.
• Persistence is sometimes necessary in applications where client state is
maintained on the server, but the use of persistence can cause problems
in failure and other situations.
• A more common method of managing persistence is to store state
information in a shared database, which can be accessed by all real
servers, and to link this information to a client with a small token such
as a cookie, which is sent in every client request.
Introduction (cont.)
• Load balancers also perform server monitoring of services
in a web server farm.
• case of failure of a service, the load balancer continues to
perform load balancing across the remaining services that
are UP.
• In case of failure of all the servers bound to a virtual
server, requests may be sent to a backup virtual server (if
configured) or optionally redirected to a configured URL.
• In Global Server Load Balancing (GSLB) the load
balancer distributes load to a geographically distributed set
of server farms based on health, server load or proximity.
Introduction (cont.)
• Load balancing methods:
–
–
–
–
–
–
–
–
–
–
–
Least connections
Round robin
Least response time
Least bandwidth
Least packets
URL hashing
Domain name hashing
Source IP address
Destination IP address
Source IP - destination
Static proximity, used for GSLB
Web Server Load Balancing
• One major issue for large Internet sites is how to handle the load of the
large number of visitors they get.
• This is routinely encountered as a scalability problem as a site grows.
• There are several ways to accomplish load balancing
• For example in WikiMedia load is balanced as:
– Round robin DNS distributed page requests evenly to one of three Squid
Cache servers
– Squid cache servers used response time measurements to distribute page
requests between seven web servers.
– In addition, the Squid servers cached pages and delivered about 75% of all
pages without ever asking a web server for help.
– The PHP scripts which run the web servers distribute load to one of
several database servers depending on the type of request, with updates
going to a master database server and some database queries going to one
or more slave database servers.
Server Load Balancing and
redundancy
• Alternative methods include use of Layer 4 Router
• Linux virtual server, which is an advanced open
source load balancing solution for network
services.
• Network Load Balancing Services, which is an
advanced open source load balancing solution for
network services.
• Many sites are turning to the multi-homed
scenario; having multiple connections to the
Internet via multiple providers to provide a
reliable and high throughput service.
Linux Virtual Server
• Started in 1998, the Linux Virtual Server (LVS) project
combines multiple physical servers into one virtual server,
eliminating single points of failure (SPOF).
• Built with off-the-shelf components, LVS is already in use
in some of the highest-trafficked sites on the Web.
• Requirements for LVS:
– The service must scale: when the service workload increases, the
system must scale up to meet the requirements.
– The service must always be on and available, despite transient
partial hardware and software failures.
– The system must be cost-effective: the whole system must be
economical to build and expand.
– Although the whole system may be big in physical size, it should
be easy to manage.
LVS
• In LVS, a cluster of Linux servers appear as a single
(virtual) server on a single IP address.
• Client applications interact with the cluster as if it were a
single, high-performance, and highly-available server.
• Inside the virtual server, LVS directs incoming network
connections to the different servers according to
scheduling algorithms.
• Scalability is achieved by transparently adding or
removing nodes in the cluster.
• High availability is provided by detecting node or daemon
failures and reconfiguring the system accordingly, on-thefly.
• For transparency, scalability, availability and manageability,
LVS is designed around a three-tier architecture, as
illustrated in next figure
LVS architecture
• The load balancer,
servers, and shared
storage are usually
connected by a highspeed network, such
as 100 Mbps Ethernet
or Gigabit Ethernet,
so that the
intranetwork does not
become a bottleneck
of the system as the
cluster grows.
IPVS
• IPVS modifies the TCP/IP stack inside the
Linux kernel to support IP load balancing
technologies
Three ways to balance load
• IPVS supports following three ways to
balance loads:
– Virtual Server via NAT (VS/NAT)
– Virtual Server via Tunneling (VS/TUN)
– Virtual Server via Direct Routing (VS/DR)
Virtual Server via NAT (VS/NAT)
VS/NAT Workflow
1.
2.
3.
4.
5.
When a user accesses a virtual service provided by the server
cluster, a request packet destined for the virtual IP address (the IP
address to accept requests for virtual service) arrives at the load
balancer.
The load balancer examines the packet's destination address and
port number. If they match a virtual service in the virtual server rule
table, a real server is selected from the cluster by a scheduling
algorithm and the connection is added to hash table that records
connections. Then, the destination address and the port of the packet
are rewritten to those of the selected server, and the packet is
forwarded to the server. When an incoming packet belongs to an
established connection, the connection can be found in the hash
table and the packet is rewritten and forwarded to the right server.
The request is processed by one of the physical servers.
When response packets come back, the load balancer rewrites the
source address and port of the packets to those of the virtual service.
When a connection terminates or timeouts, the connection record is
removed from the hash table.
A reply is sent back to the user.
An example of Virtual Server via Nat
Packet rewriting flow
• The incoming packet for web service:
• The load balancer will choose a real server and
rewritten forwards the packet to it:
• Replies get back to the load balancer:
• The packet is rewritten and forwarded back to the
client
VS-NAT advantages and
disadvantages
• Advantages:
– Real servers can run any OS that supports TCP/IP
– Only an IP address is needed for the load balancer, real
servers can use private IP address
• Disadvantages
– The maximum number of server nodes is limited,
because both request and response packers are rewritten
by the load balancer. When the number of server nodes
increase up to 20, the load balancer will probably
become a new bottleneck
Virtual Server via IP Tunneling
(VS/TUN)
• IP tunneling (also called IP encapsulation) is a
technique to encapsulate IP datagrams within IP
datagrams, which allows datagrams destined for
one IP address to be wrapped and redirected to
another IP address.
• This technique can also be used to build a virtual
server: the load balancer tunnels the request
packets to the different servers, the servers process
the requests, and return the results to the clients
directly. Thus, the service appears as a virtual
service on a single IP address.
VS/TUN architecture
VS-TUN workflow
VS-TUN advantages and
disadvantages
• Advantages:
– Real servers send response packets to client directly,
which can follow different network routes
– Real servers can be in different networks, LAN/WAN
– Greatly increasing the scalability of Virtual Server
• Disadvantages:
– Real server must support IP tunneling protocol
Virtual Server via Direct Routing
(VS/DR)
• The load balancer and the real servers must have one of their interfaces
physically linked by an uninterrupted segment of LAN such as an
Ethernet switch.
• The virtual IP address is shared by real servers and the load balancer.
• Each real server has a non-ARPing, loopback alias interface
configured with the virtual IP address, and the load balancer has an
interface configured with the virtual IP address to accept incoming
packets.
• The workflow of VS/DR is similar to that of VS/NAT or VS/TUN. In
VS/DR, the load balancer directly routes a packet to the selected server
(the load balancer simply changes the MAC address of the data frame
to that of the server and retransmits it on the LAN).
• When the server receives the forwarded packet, the server determines
that the packet is for the address on its loopback alias interface,
processes the request, and finally returns the result directly to the user.
VS/DR architecture
VS-DR workflow
VS-DR advantages and
disadvantages
• Advantages:
– Real servers send response packets to clients
directly, which can follow different network
routes
– No tunneling overhead
• Disadvantages:
– Servers must have non-arp alias interface
– The load balancer and server must have one of
their interfaces in the same LAN segment
Comparison
VS/NAT
VS/TUN
VS/DR
any
Tunneling
Non-arp device
server network private
LAN/WAN
LAN
server number low (10~20)
High (100)
High (100)
server gateway load balancer
own router
Own router
Server
Note: those numbers are estimated based on the
assumption that load balancer and backend servers
have the same hardware configuration.
Scheduling algorithms
•
•
•
•
Round-Robin
Weighted Round-Robin
Least-Connection
Weighted Least-Connection
The LocalNode feature
• In a virtual server of only a few nodes(2,3
or more), it is a resource waste if the load
balancer is only used to direct packets.
• The LocalNode feature enable that the load
balancer not only can redirect packets, but
also can process some packets locally
LVS cluster management software
•
•
•
•
•
•
•
RedHat Cluster Server / Piranha
LVS+Piranha Cluster Management tools.
UltraMoney: Open-Source Server Farm
LVS+lvs-gui+heartbeat+ldirectord
heartbeat+ldirectord
heartbeat+mon
...
Some sites using LVS
•
•
•
•
•
•
•
UK National JANET Cache
(wwwcache.ja.net)
www.linux.com
sourceforge.net
One of largest PC manufacturers
www.netwalk.com
…
References
• Wikipedia
• http://www.linux-vs.org