Harvard Array of Clustered Computers (HACC)

Transcript Harvard Array of Clustered Computers (HACC)

Deployment of cluster
system and load balancing
technique
Junehwa Song
9/6 2001
Network Computing Lab EECS
KAIST
1
Contents (1)

Part I : Cluster system
Issues on Web Server
 Why need cluster
 Method of performance increase
 Overview of cluster
 Type of cluster
 Example of cluster

9/6 2001
2

Part II : Load balancing technique
Mirror
 Client based approach
 DNS based approach
 Dispatcher based approach






Server based approach



Single packet rewriting
Double packet rewriting
Network dispatcher
LVS connection scheduling
Http redirection
Packet redirection
Reference
9/6 2001
3
Issues on Web Server
Connections explosion

Due to rapid growth of WWW application on the
internet, a web server may encounter the
situation that a huge number of connection
requests in a very short time
Research trend on web server



Cluster system
Load Balancing
Distributed Scalable Web Server
9/6 2001
4
Why need cluster
Meet the demand for scalability and
availability
It is not always possible for a Web site to
accurately predict peak load and prepare
enough computing resource

Because client request rates tend to be bursty
and fluctuate dramatically
9/6 2001
5
Traffic increases 100%
per year
But server’s performance
increase doesn’t be
comparable to traffic
increase
9/6 2001
Network
Traffic
Server
Performance
6
Method of performance increase
Vertical performance increase
Upgrade CPU, Memory, HDD, etc

Horizontal performance increase

Using cluster -> Node addition
Cost problem



High performance server is very expensive
Performance/Cost < 1
Availability problem by fault
9/6 2001
7
Require

Low cost, high availability, High
performance, and extensibility
So it need Cluster
9/6 2001
8
Overview of Cluster
Single point of presence
Many small machines behave as one lar
ge machine
 Share a virtual IP address

Availability

The service as a whole must be
available despite of transient partial
hardware and software failure
9/6 2001
9
Fail over
Cluster automatically relocates an
application from failed node to a healthy
one
 When a fail over occur, client may see a
brief interruption of service but they are
not aware that application has been re
hosted on a different physical cluster
node

Scalability

When the load offered to the service
increases, system can be scaled to
meet the requirement
9/6 2001
10
availability grade by down time
Availability
9/6 2001
Accumulated
down time per year
Grade
90%
Under 1 Month
1
99%
Under 4 day
2
99.9%
Under 9 hour
3
99.99%
Within 1 hour
4
99.999%
Within 5 minute
5
99.9999%
Within 30 second
6
99.99999%
3 second
7
11
Type of cluster
Cluster for Scaling and availability



Loosely coupled
Horizontally scaling cluster
Systems are not aware of other systems
Cluster for Performance



For high performance computing(HPC) focus
on performance and scalability
Tightly coupled - no availability
Scientific cluster - Biology, Physics,
Engineering
9/6 2001
12
9/6 2001
13
Load balancer
Front end to the service as seen by the
outside world
 Direct network connections from clients
who know a single IP address

Server pool

Cluster of servers that implement the
actual services
Backend storage (optional)

Provide shared storage for the servers
9/6 2001
14
Example of Cluster
LVS(Linux virtual server)



Since 1998, open source
Connection Scheduling – Part II
Cluster management




Piranha by Red Hat
Lvs-gui + Heartbeat + ldirectord
Mon + heartbeat
Deployment

9/6 2001
Linux.com, sourceforge.net, www.zope.org,
wwwcache.ja.net
15
LVS - Architecture
Client
Linux Director
Real server 1
Interne
t
Fault-tolerant
File system
Heartbeat
Real server 2
Linux Director
Real server N
Load Balancer
9/6 2001
Server Cluster
File Server
16
Load Balancing Technique
Mirror
Client based approach
DNS-based approach
Dispatcher based approach
Server based approach
9/6 2001
17
Mirror
Replicate information across a
mirrored server architecture
User manually select alternative URL
Not user transparent
Don’t allow the Web-server system to
control request distribution
9/6 2001
18
Client Based Approach
Web Client


Web client selects a node of the cluster and
submits the request to the selected node
Netscape home(http://www.netscape.com) use
this technique


When user access this site, Navigator selects a
random number i between 1 and the number of
servers and directs the request to the node
wwwi.netscape.com
Limited practical applicability and is not
scalable
9/6 2001
19
Smart Client
Migrates server functionality to the client
through a Java applet
 Increase network traffic and network
delay

Client side Proxies

Web Cluster standpoint, proxy servers
are similar to clients
9/6 2001
20
DNS Based Approach
First prototype : NCSA scalable web server
DNS server maps the domain name to
multiple IP address
Returning more than one IP address for the
hostname or returning a different IP
address for each DNS request it receives
(Round robin)
User transparent
Simple and easy to implement
9/6 2001
21
9/6 2001
22
9/6 2001
23
Drawbacks



Unable to know the situation of the whole
system
Not really fair because DNS uses a simple
round robin
DNS may encounter TTL problem in IP-address
cache

9/6 2001
Between the client and the web server DNS, many
intermediate name servers can cache the logical
name to IP address mapping to reduce network traffic
and every web browser typically caches some
address resolution
24


Because of address caching, each address
can cause a burst of future requests to the
selected server and quickly obsolete the
current load information
Many DNS based solutions to this
problem
System-Stateless algorithms
 Server-State-based algorithms
 Client-State-based algorithms
 Adaptive TTL Algorithms

9/6 2001
25
Dispatcher based approach
9/6 2001
26
Centralize request scheduling and
completely control client-request routing
Request routing among server is
transparent-unlike DNS-based

DNS deals address at the URL level, the dispatcher has a
single, virtual IP address(IP-SVA)
Dispatcher uniquely identifies each server
in the system through a private address
Dispatcher typically use simple algorithms
to select the Web server
9/6 2001
27
Single packet rewriting
Double packet rewriting
Network Dispatcher
9/6 2001
28
Packet Single Rewriting
9/6 2001
29
TCP router acts as an IP address
dispatcher

Router tracks the source IP address for every
established TCP connection to route packets
regarding the same connection to the same
web server node
High System availability


When one of server fails, its address can be
removed from the router’s table
Can be combined with a DNS based solution
9/6 2001
30
Packet Double Rewriting
9/6 2001
31
Two solution using this approach
Magicrouter
 Cisco System’s Local Director

Because outgoing packets typically
outnumber incoming request packets,
dispatcher becomes bottleneck
9/6 2001
32
Network Dispatcher
Dispatcher forward packets to the
selected server using its physical
address without IP modification
9/6 2001
33
LVS connection scheduling
Round-Robin Scheduling

Treat all real servers as equals
regardless of number of connections or
response time
Weighted Round-Robin Scheduling
Treat the real servers of different
processing capacities.
 Each server can be assigned a weight

9/6 2001
34
Least connection scheduling

Direct network connection to the server
with the least number of active
connection
Weighted Least connection
scheduling
Superset of the least connection
scheduling
 Performance weight can be assigned to
each server
 The server with a higher weight value will
receive a larger percentage of active
connection

9/6 2001
35
Server based approach
Use two level dispatching mechanism
Integrating the DNS based approach
with redirection techniques executed by
Web server
 Solves most DNS scheduling problem

Two Solution
HTTP redirection
 Packet redirection

9/6 2001
36
HTTP Redirection
9/6 2001
37
Above figure server1 redirect the
request to server2.
Not client transparent !
Overhead of infra cluster
communication

Every server must periodically transmit
status information to cluster DNS
Increases response time in client side,
because of packet redirection
9/6 2001
38
Packet Redirection
Use a round robin DNS mechanism to
schedule the request among the Web
Server
Server reached by a request reroutes the
connection to another server through a
packet rewriting


Transparent to the client!
TCP handoff
Packet rewriting overhead
9/6 2001
39
Reference
[1] Cardellini, V.; Colajanni, M.; Yu, P.S. “Dynamic load balancing on Webserver systems” IEEE Internet Computing
Volume: 3 3 , May-June 1999 , Page(s): 28 -39
[2] Wow Linux. “Linux 기반의 고가용 로드밸런싱 웹 서비스 구축방안
[3] Wensong Zhang “Linux Virtual Server for Scalable Network Service”
www.linuxVirtualServer.org
[4] Sun Microsystems “Sun Cluster 3 architecture” www.sun.com
[5] Alistair A. Croll “Optimizing web server access for E-business” Intel Devcon
[6] Hong, H.C.; Chen, Y.C. “Design and practice of a dispatch server
architecture” Distributed Computing Systems, 1999. Proceedings. 7th
IEEE Workshop on Future Trends of , 1999 , Page(s): 246 -251
[7] Mourad, A.; Huiqun Liu “Scalable Web server architectures”
Computers and Communications, 1997. Proceedings., Second IEEE
Symposium on , 1997 , Page(s): 12 -16
9/6 2001
40

Harvard Array of Clustered Computers (HACC)

Transcript Harvard Array of Clustered Computers (HACC)

Directory