Introduction to Server Cluster
Download
Report
Transcript Introduction to Server Cluster
Module 2: Concepts of
Server Clusters
Introduction to Server Clusters
Clustering Techniques:
Availability and Scalability:
Introduction to Microsoft Windows 2000 Cluster Service
Key Concepts of a Server Cluster
Cluster Disks.
Quorum Resource.
Cluster Communications.
Groups and Resources.
Resource Dependency Trees.
Virtual Servers.
Virtual Server Name Resolution.
Failover and Failback.
Cluster Concepts
Choosing a Server Cluster Configuration
Active/Passive Configuration.
Active/Active Configuration.
Hybrid Configuration.
Single Node Virtual Server.
Applications and Services on Server Clusters
Applications.
Services.
File and Print Shares.
Identifying Performance Limitations.
Overview
Introduction to Server Clusters
Key Concepts of a Server Cluster
Choosing a Server Cluster Configuration
Applications and Services on Server Clusters
This module provides an explanation of
server cluster terms and key concepts. Topics
include considerations for choosing cluster
configuration options and determining which
applications and services will be included in
the server cluster. Information that is unique
to the installation of Microsoft® Cluster
service is covered, such as naming and
addressing conventions and how resources
and groups function within a server cluster.
After completing this module, you will be able to:
Explain the features of clustering technologies.
Define the key terms and concepts of a server
cluster.
Choose a server cluster configuration.
Describe how Cluster service supports
applications and services.
Introduction to Server Clusters
A server cluster is a group of computers and
storage devices that work together and can be
accessed by clients as a single system.
There are two types of network
communications in a server cluster. The
nodes communicate with each other over a
high performance, reliable network, and share
one or more common storage devices. Clients
communicate to logical servers, referred to as
virtual servers, to gain access to grouped
resources, such as file or print shares,
services such as Windows Internet Name
Service (WINS), and applications like
Microsoft Exchange Server.
When a client connects to the virtual server, the server
routes the request to the node controlling the requested
resource, service, or application. If the controlling node
fails, any clustered services or applications running on
the failed node will restart on a surviving designated
node.
There are three types of clustering techniques
commonly used: shared everything, mirrored
servers, and shared nothing. Microsoft
Cluster Service uses the shared nothing
model.
You can configure server clusters to address both
availability and scalability issues. The failover capability
of Microsoft Cluster Service makes resources more
available than in a non-clustered environment. It is also
an economical way to scale up when you need greater
performance.
Clustering Techniques
Shared Everything Model
Mirrored Servers
Shared Nothing Model
There are a variety of cluster implementation models
that are used widely in the computer industry. Common
models are shared everything, mirrored servers, and
shared nothing. It is possible for a cluster to support
both the shared everything model and the shared
nothing model. Typically, applications that require only
limited shared access to data work best in the shared
everything model. Applications that require maximum
scalability will benefit from the shared nothing cluster
model.
Shared Everything Model
In the shared everything, or shared device model,
software running on any computer in the cluster
can gain access to any hardware resource
connected to any computer in the cluster (for
example, a hard drive, random access memory
(RAM), and CPU).
The shared everything server clusters permit
every server to access every disk. Allowing
access to all of the disks originally required
expensive cabling and switches, plus specialized
software and applications. If two applications
require access to the same data, much like a
symmetric multiprocessor (SMP) computer, the
cluster must synchronize access to the data. In
most shared device cluster implementations, a
component called a Distributed Lock Manager
(DLM) is used to handle this synchronization.
The Distributed Lock Manager (DLM)
The Distributed Lock Manager (DLM) is a
service running on the cluster that keeps
track resources within the cluster. If multiple
systems or applications attempt to reference
a single resource, the DLM recognizes and
resolves the conflict. However, using a DLM
introduces a certain amount of overhead into
the system in the form of additional message
traffic between nodes of the cluster in
addition to the performance loss due to
serialized access to hardware resources.
Shared everything clustering also has
inherent limits on scalability, because DLM
contention grows geometrically as you add
servers to the cluster.
Mirrored Servers
An alternative to the shared everything and shared
nothing models is to run software that copies the
operating system and the data to a backup server. This
technique mirrors every change from one server to a
copy of the data on at least one other server. This
technique is commonly used when the locations of the
servers are too far apart for the other cluster solutions.
The data is kept on a backup server at a disaster
recovery site and is synchronized with a primary
server.
However, a mirrored server solution cannot deliver the
scalability benefits of clusters. Mirrored servers may
never deliver as high a level of availability and
manageability as shared-disk clustering, because there
is always a finite amount of time during the mirroring
operation in which the data at both servers is not
identical.
Shared Nothing Model
The shared nothing model, also known as the
partitioned data model, is designed to avoid the
overhead of the DLM in the shared everything
model. In this model, each node of the cluster
owns a subset of the hardware resources that
make up the cluster. As a result, only one node
can own and access a hardware resource at a
time. A shared-nothing cluster has software that
can transfer ownership to another node in the
event of a failure. The other node takes ownership
of the hardware resource so that the cluster can
still access it.
The shared nothing model is asymmetric. The
cluster workload is broken down into functionally
separate units of work that different systems
performed in an independent manner. For
example, Microsoft SQL Server™ may run on one
node at the same time as Exchange is running on
the other.
Shared Nothing Model (continue)
In this model, requests from client applications are
automatically routed to the system that owns the
resource. This routing extends to server applications
that are running on a cluster. For example, if a cluster
application such as Internet Information Services (IIS)
needs to access a SQL Server database on another
node, the node it is running on passes the request for
the data to the other node. Remote procedure call (RPC)
provides the connectivity between processes that are
running on different nodes.
Shared Nothing Model (continue)
A shared nothing cluster provides the same high level
of availability as a shared everything cluster and
potentially higher scalability, because it does not have
the inherent bottleneck of a DLM. An added advantage
is that it works with standard applications because
there are no special disk access requirements.
Examples of shared nothing clustering solutions
include Tandem NonStop, Informix Online/XPS, and
Microsoft Windows 2000 Cluster service.
Note: Cluster service uses the shared nothing model.
By default, Cluster service does not allow simultaneous
access from both nodes to the shared disks or any
resource. Cluster service can support the shared
device model as long as the application supplies a
DLM.
Availability and Scalability
Availability
Cluster Service Improves Availability of Applications and
Services
Scalability
Cluster Service Improves Scalability by Adding More
Computers to the Cluster
Microsoft Cluster service makes
resources, such as services and
applications, more available by providing
for restart and failover of the resource.
Another benefit of Cluster service is that
it provides greater scalability of the
resource because you can separate
applications and services to run on
different servers.
Availability
When a system or component in the cluster fails, the
cluster software responds by dispersing the work from
the failed system to the remaining systems in the
cluster.
Cluster service improves the availability of client/server
applications by increasing the availability of server
resources. Using Cluster service, you can set up
applications on multiple nodes in a cluster. If one node
fails, the applications on the failed node are available
on the other node. Throughout this process, client
communications with applications usually continue
with little or no interruption. In most cases, the
interruption in service is detected in seconds, and
services can be available again in less than a minute
(depending on how long it takes to restart the
application).
Availability (continue)
Clustering provides high availability with
static load balancing, but it is not a fault
tolerant solution. Fault tolerant solutions
offer error-free, nonstop availability, usually
by keeping a backup of the primary system.
This backup system remains idle and unused
until a failure occurs, which makes this an
expensive solution.
Scalability
When the overall load exceeds the
capabilities of the systems in the cluster,
instead of replacing an existing computer
with a new one with greater capacity, you
can add additional hardware components
to increase the node’s performance, while
maintaining availability of applications
that are running on the cluster. Using
Microsoft clustering technology, it is
possible to incrementally add smaller,
standard systems to the cluster as needed
to meet overall processing power
requirements.
Scalability (continue)
Clusters are highly scalable; you can add
CPU, input/output (I/O), storage, and
application resources incrementally to
efficiently expand capacity. A highly
scalable solution creates reliable access
to system resources and data, and
protects your investment in both hardware
and software resources. Server clusters
are affordable because they can be built
with commodity hardware (high-volume
components that are relatively
Multimedia: Microsoft Windows 2000 Cluster Service
Key Concepts of a Server Cluster
Server Cluster
Node A
A Group of Resources
Quorum
Private
Network
File Share
Disk 1
Disk 1
Virtual
Server
Print Share
Node B
Public
Network
Client
Server cluster architecture consists of physical cluster
components and logical cluster resources. Microsoft
Cluster service is the software that manages all of the
cluster-specific activity.
Microsoft Cluster service is the software that manages
all of the cluster-specific activity. Physical components
provide data storage and processing for the logical
cluster resources. Physical components are nodes,
cluster disks, and communication networks. Logical
cluster resources are groups of resources, such as
Internet Protocol (IP) addresses and virtual server
names, and services such as WINS. Clients interact with
the logical cluster resources.
Nodes
Nodes are the units of management for the
server cluster. They are also referred to as
systems and the terms are used
interchangeably. A node can be online or
offline, depending on whether it is currently in
communication with the other cluster nodes.
Note: Windows 2000 Advanced Server
supports two node server clusters. Windows
2000 Datacenter supports four node server
clusters.
Cluster Disks
Cluster disks are shared hard drives to which
both server cluster nodes attach by means of
a shared bus. You store data for file and print
shares, applications, resources, and services
on the shared disks.
Quorum Resource
The quorum resource plays a vital role in
allowing a node to form a cluster and in
maintaining consistency of the cluster
configuration for all nodes. The quorum
resource holds the cluster management data
and recovery log, and arbitrates between
nodes to determine which node controls the
cluster. The quorum resource resides on a
shared disk. It is best to use a dedicated
cluster disk for the quorum resource, so that
it will not be affected by the failover policies
of other resources, or by the space that other
applications require. It is recommended that
the quorum be on a disk partition of at least
500 MB.
Cluster Communications
A server cluster communicates on a public,
private, or mixed network. The public network
is used for client access to the cluster. The
private network is used for intracluster
communications, also referred to as node-tonode communications. The mixed network
can be used for either type of cluster
communications.
One of the types of communications on the
private network monitors the health of each
node in the cluster. Each node periodically
exchanges IP packets with the other node in
the cluster to determine if both nodes are
operational. This process is referred to as
sending heartbeats.
Resources
Resources are the basic unit that Cluster
service manages.
Examples of resources are physical hardware
devices, such as disk drives, or logical items,
such as IP addresses, network names,
applications, and services.
A cluster resource can run only on a single
node at any time, and is identified as online
when it is available for a client to use.
Groups
Groups are a collection of resources that
Cluster service manages as a single unit for
configuration purposes. Operations that are
performed on groups, such as taking groups
offline or moving them to another node, affect
all of the resources that are contained within
that group. Ideally, a group will contain all of
the elements that are needed to run a specific
application, and for client systems to connect
to the application.
Virtual Servers
Virtual servers have server names that appear
as physical servers to clients.
Cluster service uses a physical server to host
one or more virtual servers. Each virtual
server has an IP address and a network name
that are published to clients on the network.
Users access applications or services on
virtual servers in the same way that they
would if the application or service were on a
physical server.
Failover and Failback
Failover is the process of moving a group of
resources from one node to another in case of
a failure of a node, or one of the resources in
the group.
Failback is the process of returning a group of
resources to the node on which it was running
before the failover occurred.
Cluster Disks
Node A
Quorum
Disk 1
Disk 2
Node B
Disk 3
Disk 4
Each node must have a connection to a shared
storage area where shared cluster data, such as
configuration data, is stored. This shared storage
area is referred to as the cluster disk. The cluster
can gain access to a cluster disk through a Small
Computer System Interface (SCSI) bus or a Fibre
Channel bus. In addition, services and applications
that the cluster provides should keep shared data,
such as Web pages, on the cluster disk on the
shared bus.
Cluster service is based on the shared nothing
model of clustering. The shared nothing model
allows the Windows 2000 cluster file system model
to support the native NTFS file system, rather than
requiring a dedicated cluster file system.
Note: The cluster disks must be NTFS and basic
disks.
A single cluster member controls each file system
partition at any instant in time. However, because a
node places a SCSI reserve on a cluster disk rather
than a partition, the same node must own all of the
partitions on the same physical disk at any given
time. Each node can reserve a separate disk on the
same shared bus, so you can divide the cluster
disks on the bus between the nodes in the cluster.
For high-end configurations, you can achieve
additional I/O scaling through distributed striping
technology such as RAID 5. Using distributed
striping technology means that below a file system
partition on a single node, that partition can actually
be a stripe set whose physical disks span multiple
disks. Such striping must be hardware RAID. Cluster
service does not support any software fault tolerant
RAID arrays.
Quorum Resource
Data Storage
Arbitration
Quorum Ownership
Updates for Nodes Coming Online
Each cluster has a special resource known as the quorum
resource. You specify an initial location for the quorum
resource when you install the first node of a cluster. You can
use the cluster administration tools to change the quorum
location to a different storage resource.
The quorum resource contains cluster configuration files and
provides two vital functions: data storage and arbitration.
Only one node at a time controls the quorum. Upon startup
of the cluster, Cluster service uses the quorum resource
recovery logs for node updates.
For example: If Node B is offline and Node A makes a
change to the cluster, the change is saved in the registry of
Node A and also to the cluster configuration files on the
quorum. If Node A goes offline and Node B starts, Node B
will be updated from the cluster configuration files on the
quorum.
Data Storage
The quorum resource is vital to the
successful operation of a cluster because it
stores cluster management data, such as the
configuration database and recovery logs for
changes that are made to cluster data. It must
be available when you form the cluster, and
whenever you change the configuration
database. All of the nodes of the cluster have
access to the quorum resource by means of
the owning node.
Note: To ensure the availability of the cluster,
it is recommended that the quorum be on a
Redundant Array of Independent Disks (RAID)
5 array.
Arbitration
The Cluster service uses the quorum resource to
decide which node owns the cluster. Arbitration refers
to the decision-making function of the quorum
resource if both cluster nodes independently try to take
control of the cluster.
Consider the following situation in a two-node cluster.
The networks that are providing communication
between Nodes A and B fail. Each node assumes that
the other node has failed, and attempts to operate the
cluster as the remaining node. Arbitration determines
which node owns the quorum. The node that does not
own the quorum must take its resources offline. The
node that controls the quorum resource then brings all
of the cluster resources online.
Quorum Ownership
Only one node can control the quorum. When a node
restarts, Cluster service determines whether the owner of
the quorum is online. If there is no owner of the quorum,
Cluster service assigns ownership to the starting node. If
Cluster service finds that another node is online and owns
the quorum resource, it will join the starting node to the
cluster, and will not assign the ownership of the quorum to
this node.
Updates for Nodes Coming Online
Only one node can control the quorum. When
a node restarts, Cluster service determines
whether the owner of the quorum is online. If
there is no owner of the quorum, Cluster
service assigns ownership to the starting
node. If Cluster service finds that another
node is online and owns the quorum
resource, it will join the starting node to the
cluster, and will not assign the ownership of
the quorum to this node.
Caution: Do not modify the access permissions on the disk
that contains the quorum resource. Cluster service must
have full access to the quorum log. Cluster service uses the
quorum log file to write all of the cluster state and
configuration changes that cannot be updated if the other
node is offline. For this reason, you should never restrict
either node’s access to the folder \MSCS on the quorum
disk which contains the quorum log.
Cluster Communications
Private Network
Public Network
Mixed Network
It is strongly recommended that a cluster have more than
one network connection. A single network connection
threatens the cluster with a single point of failure. There are
three options for network configurations, private, public, and
mixed. Each network configuration requires its own
dedicated network card.
Private Network
Cluster nodes need to be consistently in communication
over a network to ensure that both nodes are online. Cluster
service can utilize a private network that is separate from
client communications. Once a connection is configured as
a private network it can only be used for internal cluster
communications, and is known as a private network or
interconnect. The private network will be the default route for
node-to-node communication. The cluster cannot use a
private network for client-to-node communication.
Private Network (continue)
Heartbeats
Each node in a cluster periodically exchanges sequenced
User Datagram Protocol (UDP) datagrams with the other
node in the cluster to determine if it is up and running
correctly, and to monitor the health of a network link. This
process is referred to as sending heartbeats.
Public Network
The public network connection is used as a dedicated clientto-node communication network. The cluster cannot use the
public network for node-tonodecommunication.
Mixed Network
Another configuration option is to create a network that is
used for both private and public communication. This is
called a mixed network. Using a mixed network does not
change the recommendation for two networks.
Important: The recommended configuration for server
clusters is a dedicated private network for node-to-node
communication and a mixed network. The mixed network
acts as a backup connection for node-to-node
communication should the private network fail. This
configuration avoids having any single point of network
failure.
Groups and Resources
Server Cluster
Node A
Group 1
Logical
Disk
\\Cluster1
10.0.0.3
Logical Disk
Disk 1
\\Server1
10.0.0.4
File Share
Printer Share
Disk 2
Group 2
Node B
Disk 3
\\Server2
10.0.0.6
Application
Group 3
A Microsoft clustered solution can contain many resources.
For administrative purposes, you can logically assign
resources to groups. Some examples of resources are
applications, services, disks, file shares, print shares,
Transmission Control Protocol/Internet Protocol (TCP/IP)
addresses, and network names. You may create multiple
groups within the cluster so that you can distribute
resources among nodes in the cluster. The ability to
distribute groups independently allows more than one
cluster node to handle the workload.
Groups
A group can contain many resources, but can only belong to
one physical disk. A physical disk can contain many groups.
Any node in the cluster can own and manage groups of
resources.
A group can be online on only one node at any time. All
resources in a group will therefore move between nodes as
a unit. Groups are the basic units of failover and failback.
The node that is hosting a group must have sufficient
capacity to run all of the resources in that group.
Groups (continue)
If you wish to set up several server applications, for example
SQL Server, Exchange, and IIS, to run on the same cluster,
you should consider having one group for each application,
complete with their own virtual server. Otherwise, if all of the
applications are in the same group they have to run on the
same node at the same time, so no load distribution across
the cluster is possible.
In the event of a failure within a group, the cluster software
transfers the entire group of resources to a remaining node
in the cluster. The network name, address, and other
resources for the moved group remain within the group after
the transfer. Therefore, clients on the network may still
access the same resources by the same network name and
IP address.
Resources
A resource represents certain functionality that is offered on
the cluster. It may be physical, for example a hard disk, or
logical, for example an IP address. Resources are the basic
management and failure units of Cluster service. Resources
may, under control of Cluster service, migrate to another
node as part of a group failover. If Cluster service detects
that a single resource has failed on a node, it may then
move the whole group to the other node.
Cluster service uses resource monitors to track the status of
the resources. Cluster service will attempt to restart or
migrate resources when they fail or when one of the
resources that they depend on fails.
Resource States
Cluster service uses five resource states to manage the health of the
cluster resources.
The resource states are as follows:
Offline – A resource is unavailable for use by a client or another
resource.
Online – A resource is available for use by a client or another
resource.
Online Pending – The resource is in the process of being
brought online.
Offline Pending – The resource is in the process of being
brought offline.
Failed – The service has tried to bring the resource online but it
will not start.
Resource States (continue)
Resource state changes can occur either manually (when
you use the administration tools to make a state transition)
or automatically (during the failover process). When a group
is failed over, Cluster service alters the states of each
resource according to their dependencies on the other
resources in the group.
Resource Dependencies
Recommended
Vertical
Dependency
FS-1
File Share
NN-1
Network Name
Not Recommended
Forking Dependency
FS-1
File Share
NN-1
Network Name
IP-1
IP Address
PD-1
Physical Disk
PD-1
Physical Disk
IP-1
IP Address
A dependency is a relationship between two resources in which one
resource depends upon the other to be online before it can be brought
online. For example, a network name cannot be brought online before an
IP address. This relationship requires that dependent resources reside in
the same group on the same node.
The administrator establishes resource dependencies within a group to
ensure availability of specific resources before other resources attempt
to go online. For troubleshooting purposes, it is recommended that you
create vertical dependencies for all of the cluster resources. Forking
dependencies provide multiple paths to troubleshoot when a resource
does not come online. A vertical dependency requires that all of the
dependent resources come online in sequence, starting with the
resource that is at the bottom of the dependency tree.
Resource Dependency Tree
The dependency tree is a useful diagram for visualizing the
dependency relationships between resources and
determining how they will interact. For example, if a
resource must be taken offline for an administrative task, its
location in the dependency tree will show what other
resources will be affected. The dependency tree indicates
the relative order in which resources will be taken offline and
brought online. Note that dependency tree diagrams are a
useful tool for designing and documenting a cluster
configuration, but you are not required to create these
diagrams to manage the server cluster. This is an optional
planning activity.
Resource Dependencies in Groups
The resources belonging to a dependency tree must all be contained in
the same Cluster service group. All of the resources in a group move
between nodes as a unit. Dependent or related resources never span a
single group boundary because they cannot be dependent on resources
in other groups. If this were possible, then all of the groups contained in
a dependency tree would have to fail over to the other node as a unit.
Because groups are the basic unit of failover in Cluster service,
resources in a dependency tree will always be online on the same node
and will fail over together.
Note: It is recommended that you do not use the cluster network name
and IP address resources, which are created automatically during
installation, as part of a user-defined dependency tree or group. These
two resources should be left in the default cluster group that is created
on installation of Cluster service.
Dependency Rules
The only dependency relationships that Cluster service recognizes are
relationships between resources. The following rules govern dependency
relationships:
The resources of a dependency tree are wholly contained in
one, and only one, cluster group.
A resource can depend on any number of other resources in
the same group.
Resources in the same dependency tree must all be online on
the same node of a cluster.
A resource can be active or online on only one node in the
cluster at a given time.
A resource is brought online only after all of the resources on
which it depends are brought online.
A resource is taken offline before any resource on which it
depends is taken offline.
Virtual Servers
Client Access to Virtual Servers
Virtual Server Environment
Virtual Server Naming
Named Pipe Remapping
Registry Replication
Applications run as a resource within a virtual server
environment. A virtual server environment masks the
physical server name to the application that is running on a
virtual server. Masking the name of the physical server
provides the application with virtual services, a virtual
registry, and a virtual name space. When an application
migrates to another node, it appears to the application that it
restarted on the same virtual server.
The virtual server environment provides applications,
administrators, and clients with the illusion of a single, stable
environment, even if the resource group migrates to the
other node of the cluster.
One benefit of virtual servers is that many instances of an
application can be executed on a single node, each within
its own virtual server environment. The ability to execute
many instances of an application allows two SQL Servers or
two SAP environments to execute as two virtual servers on
one physical node.
Client Access to Virtual Servers
A virtual server resource group requires a network name
resource (NetBIOS and Domain Name System (DNS)), and
an IP address resource. Together, these provide clients with
a consistent name for accessing the virtual server. The
virtual server name and virtual IP address migrate among
several physical nodes. The client connects to the virtual
server by using the virtual server name, without regard to
the physical location of the server.
Virtual Server Environment
Each virtual server environment provides a namespace and
configuration space that is separated from other virtual
servers running on the same node: registry access, service
control, named pipe communication, and RPC endpoints.
Having a separate namespace and configuration space
prevents a conflict over access to configuration data or
internal communication patterns between two instances of
an application service that are running on the same node
but in separate virtual service environments.
Three features provide virtual server transparency:
Virtual server naming
Named pipe remapping
Registry replication
Virtual server naming. System services (such as
GetComputerName) return the network name that is
associated with the virtual server instead of the host node
name.
Named pipe remapping. When an application service
consists of several components that use interprocess
communication to access each other’s services, the
communication endpoints must be named relative to the
virtual server. To achieve named pipe remapping, named
pipe names are translated by Cluster service from
\\virtual_node\service to
\\host\$virtual_node\service.
Registry replication. The Windows 2000 registry stores
most application configuration data. To allow applications
that run in separate virtual servers to run on the same host
node, you must map registry trees to separate virtual server
registry trees and store them in the cluster registry on the
node. Each unique tree represents a single virtual server
and is internally dependent on the name associated with the
virtual server. The trees are also stored in registry file format
on the quorum device. When the virtual server migrates, the
local tree is rebuilt from the registry files on the quorum
device.
When encapsulating applications on virtual servers, it is
difficult to know all of the dependencies on node-specific
resources, as applications often make use of dynamic-link
libraries (DLLs) in ways that introduce new naming
dependencies. Cluster service masks the complexity of
resource dependencies and allows for seamless failovers of
virtual servers.
Virtual Server Name Resolution
WINS
\\VirtualServer = 10.0.0.25
Node A
Group 1
Active Directory
Share = \\VirtualServer\Share
or
Share = \\VirtualServer.mwtraders.msft
Disk 1
Quorum
Disk 1
Node B
\\VirtualServer
\\VirtualServer.nwtraders.msft
\\10.0.0.25
DNS
\\VirtualServer.nwtraders.msft = 10.0.0.25
To support proper failover, it is critical that clients connect by
using the virtual server names only, rather than directly to
the cluster nodes. You must devise naming conventions to
differentiate the different types of server names.
Network names associated with virtual servers are
registered with WINS and the browser service in the same
way as physical servers. For most applications, it is
impossible to distinguish between servers that are virtual
and servers that are physical. Clients can access a virtual
server by using a NetBIOS name, a DNS name, or an IP
address. Clients can also access the virtual server by
querying Active Directory™ directory service.
Important: You need to publish the virtual server file share
in Active Directory in the same mann eras a file share from a
physical server.
WINS
In a WINS environment, Microsoft Cluster service registers
the virtual server names and IP addresses with a WINS
server. Clients that are using the virtual server’s NetBIOS
name will query a WINS server. The WINS server will
answer the query with the IP address of the virtual server as
if the WINS server were a virtual server.
DNS
In a Windows 2000 environment with a DNS dynamic
update protocol Cluster service registers the virtual server
names of the cluster in the same zone as the server cluster
nodes. Clients querying a common DNS server will resolve
the virtual server’s IP address.
Active Directory
You can publish clustered resources, such as shared files, in Active
Directory. Publish shared folders in an organizational unit by either the
NetBIOS name of the virtual server (\\VirtualServer\share) or by the fully
qualified domain name (\\VirtualServer.nwtraders.msft\share). Clients
can browse or query Active Directory to gain access to file share
resources. When Active Directory responds, the client will then need to
perform the name resolution to find the IP address of the virtual server
where Cluster service has stored the requested resource.
Note: When clients access the virtual server directly by an IP address,
and this address changes, you must notify all of the clients who want
access to the virtual server. When clients access a virtual server by
name, if the server’s IP address or subnet changes, the client would still
be able to resolve the name to the IP address by using WINS or DNS.
Failover and Failback
Failover
Failback
Group 1
Node A
Node A
Disk 1
Quorum
Disk 1
Quorum
Disk 1
Group 1
Node B
Node B
Disk 1
Microsoft Cluster service provides your system with the
ability to reassign control of groups of logical resources in
the event of a failure. If a resource fails, Cluster service will
attempt to restart that service. You configure the failover and
failback policies to determine when groups should transfer
ownership from one node to another.
Failover
Failover occurs when a resource or a node fails. All
resources are configured for automatic failover by default.
In the case of a node failure, the resources and groups that
this node controls fail over to the other node in the cluster.
For example, in a cluster where file and print resources are
running on Node A, and Node A fails, these services will fail
over to Node B of the cluster.
Failover (continue)
If a resource fails, Cluster service will attempt to restart the
resource. If the resource does not start, it will fail over (with
its group) to the other node. If the resource will not start after
failover, it will fail over again to the original node and try to
start. This process will be repeated six times within ten
hours by default. If the resource still does not start, Cluster
service fails over the resource and all of the resources that
depend on the failed resource.
Failback
Failback occurs when a node has failed and its resources
have failed over to the other node. Failback is the process of
returning a group of resources to the node on which it was
running before a failover occurred. Failback is not
configured by default. You must set a preferred owner for
the group. Using the preceding example, when Node A
comes back online, the file and print services can fail back
to Node A if Node A is set as the preferred owner. This
process can be performed automatically or manually.
Failback (continue)
The administrator determines when and if a group should
fail back to the preferred owner. You might not want the
application to fail back during peak load times. For example,
if Node A fails, the resources in a group could take five
minutes or more to restart on Node B. To avoid additional
delays in responding to client requests, the administrator
can choose to fail back this group to Node A during off-peak
hours, or leave the ownership of the group with Node B.
Demonstration: Cluster Concepts
In this demonstration, you will view different name resolution capabilities
for clients accessing resources from the cluster. The steps of the
demonstration are:
View the Cluster Group Owner.
Create a public folder share from a Terminal Services session.
Create a file share resource.
Test WINS name resolution for the public share.
Test DNS name resolution.
Publish a shared folder in Active Directory.
Demonstrate a failover of the public share.
Test WINS name resolution after failover.
Test DNS name resolution after failover.
Test Active Directory shared folders after failover.
Choosing a Server Cluster Configuration
Virtual Server
Failover Capability
Performance
Considerations
Cluster
Configuration
No Cluster Needed
Single Node
Virtual Server
Active/Passive
Active/Active
You can configure server clusters to meet specific
requirements. The configuration that you choose depends
on the scalability features of your application and your
availability objective for the resources. Each configuration
also has a failover policy that dictates when the resources
should return to their preferred owner after the failed node
has been restored.
The most common configurations are:
Active/Passive configuration
Active/Active configuration
Hybrid configuration
Single node virtual server
Note: In this section we will not be discussing active/active
or active/passive software configurations.
Active/Passive Configuration
Group 1
Cluster Service
Disk 1
Node A manages virtual
server \\ACCOUNTING.
Node B is configured as a
hot spare and will take
ownership of \\accounting
if Node A goes offline.
\\ACCOUNTING
Node A
Quorum
Disk 1
Node B
The active/passive configuration contains one node that is
actively providing resources to clients. The other node is
passive and standing by in case of a failover. This
configuration can provide high performance and failover
capacity. One node of the cluster makes all of the resources
available. To achieve optimal performance for any failed
over group, it is recommended that the passive node have
the same specifications, such as CPU speed, as the active
node that controlled the resource.
The disadvantage of this configuration is that it is an
expensive allocation of hardware. One of the two servers
will not be servicing any clients at any time.
The advantage of this configuration is that after failover, the
applications running on the group that fails over do not
interfere with any other applications running on the node,
and therefore the application can run at maximum capacity.
In the slide, Node A has control of Group 1. The
administrator has configured Node B as the hot spare with
the capability to control Group 1. If Node A goes offline,
Node B will control Group 1. When Node A returns to an
online state, Node A becomes the passive system and
Group 1 remains with Node B. Because failback does not
occur, this configuration provides maximum availability by
reducing the time that the service or application is
unavailable.
If Node B does not have equal capacity to Node A, you may
need to configure a failback for Group 1 during nonpeak
load times.
Considerations for Choosing this Configuration
Choose this configuration when you need to provide critical
applications and resources with high availability. For
example, an organization that is selling products on the
World Wide Web could justify the expense of having an idle
server by guaranteeing continuous high performance access
to customers.
Considerations for Choosing this Configuration
(continue)
Choose the active/passive configuration if your needs meet the following:
You only require one group for all of the applications and
services.
You want failover capability for the applications and services.
To avoid downtime after failover, the applications and services
may not fail back to the other node.
The applications and services support a cluster environment.
Note: If you want the applications and services to run at maximum
capacity on either node, both nodes will need to provide the same
capacity.
Availability
This configuration provides very high availability by not
failing back when the nodes are of equal capacity, with the
added benefit of no performance degradation during failover.
Failover Policy
If the passive system provides identical performance to the
failed node, you do not need to configure the failback policy.
If the passive system does not provide identical
performance, you can configure the group to fail back when
the preferred node is online.
Active/Active Configuration
Cluster Service
Group 1
Capacity to
fail over
Group 1
Group 2
Disk 1
Disk 2
Capacity to
fail over
\\ENGINEERING Group 2
\\ACCOUNTING
Node A
Quorum
Disk 1
Disk 2
Node B
The active/active configuration contains two nodes that are
providing resources to cluster clients. This configuration
provides optimum performance because it balances
resources across both nodes in the cluster. Each node
controls a different resource group. The active/active
configuration can also provide static load balancing, which
refers to a failback policy. If one node fails, the other node
will temporarily take on all of the groups. When the failed
node comes back online, the group fails back to the original
node, allowing performance to return to normal. In general,
this configuration is the one most often used in server
clusters.
Depending on the resources and the capacity of the nodes,
performance may suffer when a failover occurs and a single
node must run all of the resources.
In this slide, Node A is the primary owner of Group 1, and
Node B is the primary owner of Group 2. If Node B goes
offline, Group 2 will fail over to Node A. When Node B goes
back online, Group 2 will fail back to Node B. Performance
is restored when the failed node comes back online and the
group fails back to its original node.
Considerations for Choosing this Configuration
Choose this configuration when you need to provide multiple resources
simultaneously from a single cluster, provided that you can accept
reduced performance during a failover.
Choose the active/active configuration if your needs meet the following:
You require multiple groups for applications and services.
You want failover capability for the applications and services.
You want all the groups to fail back to their preferred owners
when the failed node returns online to redistribute the load.
The applications and services support a cluster environment.
Availability
This configuration provides high performance until failover.
When a single node runs all of the resources, performance
will degrade.
Failover Policy
Configure all of the groups to fail over, and then to fail back
when the original node owner is back online.
Hybrid Configuration
Group 1
Capacity to
fail over
\\engineering
Disk 1
\\ENGINEERING
Node A
Node B
Quorum
Disk 1
Node A providing
DNS outside
the cluster
Node B providing
File/Print outside
the cluster
The hybrid configuration allows either node of a server
cluster to perform server duties that are independent of the
cluster.
In this slide, Nodes A and B are both performing services
outside of the cluster. Node A is running Microsoft Domain
Name System (DNS) service and Node B is configured as a
file/print server. These services will not fail over if their
respective nodes fail. But those services within the cluster
would fail over if their respective node fails. In the slide
example, Group 1 will fail over to Node B if Node A fails.
Note: The hybrid configuration could run as an
active/passive or active/active configuration.
Considerations for Choosing this Configuration
Choose this configuration when you must install applications
or services that Cluster service does not support on one or
more nodes of the cluster.
Choose the hybrid configuration if your needs meet the
following:
You need to run applications on a node independent
of the cluster.
You want failover capability for the clustered
applications and services.
You can configure a failback policy if it meets your
requirements.
Availability
The server cluster resources that you configure for failover
have high availability. The applications or services that are
running independently of Cluster service will not fail over,
and therefore do not have high availability.
Failover Policy
The failover policy for a hybrid configuration depends on
whether it is an active/passive or active/active configuration.
Note: Although you can run some services and applications
outside of the cluster, it is recommended that you specialize
the cluster servers to the applications that run within the
cluster. Consider a hybrid solution when you need to
perform domain controller functionality from both nodes of a
cluster.
Single Node Virtual Server
Cluster Service
\\ENGINEERING
Group 1
Clients can access
any virtual server
in the cluster.
Disk 1
\\ACCOUNTING
Group 2
Disk 2
Node A
Quorum
Disk 1
Disk 2
Not Shared
The single node configuration allows clients to access
resources through a virtual server. Because this
configuration uses only one node, resources cannot be
failed over to another node. Administrators use this
configuration to group resources for organizational or
administrative purposes. It is not intended to provide the
availability that other server configurations have.
The advantage of this configuration is that services grouped
together are easier to manage and easier for clients to
access. If the administrator or the clients want higher levels
of availability, the administrator can add another node to
create a two-node cluster. Because the administrator has
already created the groups of resources, they will need to
configure only the failover policies.
A common use of this configuration is for server
consolidation. For example, if one server has the capacity to
replace four existing servers, you can migrate the services
and applications running on the old servers to their
respective virtual servers. Clients will access the virtual
servers without any apparent changes.
In this slide, \\engineering is a virtual server in
Group 1, and \\accounting is a virtual server in Group
2. Even though users in the Engineering and Accounting
departments think that they are accessing different servers,
both resources are housed on the same physical server.
This configuration provides the flexibility for adding another
node to the cluster, which would provide fault tolerance for
these applications and services.
Considerations for Choosing this Configuration
Consider choosing this configuration when you need to
manage resources by grouping them together with virtual
servers. You might also configure a single server as a virtual
server if you anticipate adding another server to create a
server cluster.
Considerations for Choosing this Configuration
(continue)
Choose the single node virtual server configuration if your
needs meet the following:
You need one or more virtual servers.
You do not require failover capability for the
applications and services.
The applications and services support a cluster
environment.
Availability
This configuration does not provide availability.
Failover Policy
There is no failover policy because the cluster has only one
node.
Applications and Services on Server Clusters
Applications
Services
File and Print Shares
Identifying Performance Limitations
Microsoft Cluster service can provide failover capabilities for
file and print services, applications, such as Microsoft
Exchange, and services, such as WINS. An administrator in
an enterprise environment will need to decide when server
clusters are an appropriate solution. For example, if an
organization considers Microsoft Exchange to be missioncritical, Microsoft Cluster service’s failover capabilities can
provide a high degree of availability. A service, such as
Active Directory, has built-in redundancy, and therefore
would not benefit from failover capability.
An administrator also needs to consider how the
applications and services will impact the node during a
failover condition. Resources can change ownership
dynamically, so you need to consider performance capacity
when looking for performance limits.
Applications
Cluster-Aware Applications
Cluster-Unaware Applications
Applications are either cluster-aware or cluster-unaware.
Cluster service can more efficiently manage applications
that are cluster-aware. Cluster-unaware applications can run
on a cluster if they are configured as generic resource types.
Cluster service can manage generic applications, but not to
the same level of detail as with the cluster-aware
applications.
Important: For an application to run on a server cluster, the
application must use TCP/IP as a network protocol.
Cluster-Aware Applications
You select cluster-aware applications to obtain the best
performance and reliability from your system. Common uses
of cluster-aware applications are database applications,
transaction processing applications, and file and print server
applications. You can configure other groupware
applications to be cluster-aware. Cluster-aware applications
can take advantage of features that Cluster service offers
through the cluster application programming interface (API).
Cluster-Aware Applications (continue)
An application is capable of being cluster-aware if it:
Maintains data in a configurable location.
Supports transaction processing.
Supports the Cluster service API.
Reports health status upon request to the Resource
Monitor.
Responds to requests from Cluster service to be
brought online or be taken offline without data loss.
Cluster-Aware Applications (continue)
A cluster-aware application runs on a node in the cluster and
can take advantage of administration and availability
features of Cluster service. Typically, a cluster-aware
application communicates with Cluster service by using the
cluster API or a cluster application-specific resource DLL.
Cluster-aware applications must use the cluster API registry
functions and control codes, not the standard Microsoft
Win32® API functions.
To take advantage of the registry replication of the cluster
hive between the nodes of the cluster, cluster-aware
applications benefit by placing their registry settings in the
cluster hive of the registry instead of the System Registry.
Cluster-Aware Applications (continue)
When a cluster-aware application restarts on another server
following a failure, it does not restart from a completely
separate copy of the application. The new server starts the
application from the same physical disks as the original
server. Ownership of the application's disks on the shared
SCSI bus was moved from the failed server to the new
owner as one of the first steps in the failover process. This
approach assures that the application always restarts from
its last known state as recorded on its installation drive, and,
optionally, its registry keys.
Cluster-Unaware Applications
Applications that do not use the cluster or resource APIs and
cluster control code functions are unaware of clustering and
have no knowledge that Cluster service is running. Their
relationship with Cluster service is solely through the
resource DLL.
You must configure cluster-unaware applications as generic
resource types if Cluster service is to manage them. Cluster
service can poll these generic applications and services to
determine whether they are running or have failed, and can
fail over the application or resource to the other node if it
detects a failure.
Cluster-Unaware Applications (continue)
Cluster-unaware applications use the System Registry
instead of the local cluster registry. If an application is
configured as a generic resource type, Cluster service can
replicate its keys in the System Registry of the other node in
the cluster.
Cluster-Unaware Applications (continue)
Using the generic application resource type for cluster-unaware
applications has some limitations:
When the resource goes offline, Cluster service terminates the
process without performing any clean shutdown operations.
The application is not configurable via the Cluster
Administrator tools.
The Cluster service writes a registry checkpoint of changes
made to the cluster. Any changes made via a separate
administration tool when the resource is not online will not be
propagated.
Cluster service can monitor only up to 30 registry subtrees.
Generic applications and services can report only a limited
amount of information to Cluster service because Resource
Monitor can report only whether the application is running, not
whether it is running properly.
Services
DFS
DHCP
WINS
Services, like applications, are either cluster-aware or
cluster-unaware. You can also configure cluster-unaware
services as generic resource types. You need to install a
cluster-unaware service on both nodes and set it as active in
the registry before you can configure it to run in the cluster.
Cluster-aware and clusterunaware services have the same
advantages and disadvantages as cluster-aware and
cluster-unaware applications.
The following services included in Windows 2000 Advanced
Server are clusteraware.
Distributed File System (DFS)
When you install DFS in a server cluster, the DFS root is fault tolerant.
Having a DFS root that is fault tolerant will allow clients to access data
that is stored on multiple systems though a \\VirtualServer\Share
mapping that either of the nodes in the Windows 2000 server cluster can
host. If the node that is currently hosting the DFS root fails, the other
node will host the DFS root. Failover is a significant advantage when
many enterprise clients need to continuously access data that DFS
hosts.
Windows 2000 provides a domain DFS root that can provide fault
tolerance by replicating the data to other computers in the domain. A
nonclustered server running Windows 2000 can provide a stand-alone
DFS root. However, if that server becomes inactive, the stand-alone DFS
root cannot be accessed by clients. A Windows 2000 clustered DFS root
provides a stand-alone DFS root with fault tolerance, making the DFS
root available from a virtual server with failover capability.
Dynamic Host Configuration Protocol (DHCP)
You can use the Windows 2000 (Advanced Server only)
Cluster service for DHCP servers to provide higher
availability, easier manageability, and greater scalability.
Windows Clustering allows you to install a DHCP server as
a virtual server so that if one of the clustered nodes fails, the
DHCP service is transferred to the second node. Failing
over the DHCP service means clients can still receive and
renew TCP/IP addresses from the virtual server.
Dynamic Host Configuration Protocol (DHCP)
(continue)
Clustering uses IP addresses efficiently by removing the
need to split scopes. A database stored on a remote disk
tracks address assignment and other activity so that if the
active cluster node goes down, the second node becomes
the DHCP server, using the same database as the original
node. Only one node at a time runs as a DHCP server, with
the Windows 2000 clustering database providing
transparent transition when needed.
Windows Internet Name Service (WINS)
By maintaining and assigning secondary WINS servers for
clients, you can reduce, if not fully eliminate, the effects of a
single WINS server being offline. In addition, clustering can
provide further fault tolerance. In an enterprise WINS
environment, you can reduce the number of redundant
WINS servers and rely on Microsoft Cluster service for fault
tolerance. WINS running on a Microsoft Cluster service
cluster will also eliminate any WINS replication traffic.
Note: You should not create static network name to IP
address mappings for any cluster names in a WINS
database. WINS is the only name resolution method that will
cause problems when using static mappings, because
WINS static mappings use the MAC address of the network
card as part of the static mapping.
File and Print Shares
Client
If mission-critical file and print shares are on a physical
server, the environment has a potential single point of
failure. Using Microsoft Cluster service, you can locate file
and print queues on one of the nodes in the cluster and
access it by means of a virtual server. The failover feature of
Cluster service will provide for continuous file and print
service to clients.
The following considerations apply when file and print
services are on a virtual server:
Both nodes of the cluster must be members of the
same domain for the permissions to be available
when either node has the resource online.
The user account that Cluster service uses must
have at least read access to the directory. If the user
account does not have at least read access, the
Cluster service will be unable to bring the file share
online.
The administrator must set share permissions so
that they fail over with the resource.
Identifying Performance Limitations
Group 1
Node A
Exchange
2000
Quorum
Disk 1
Group 2
Node B
File / Print
Windows 2000 Advanced Server uses an adaptable
architecture and is largely self-tuning when it comes to
performance. Additionally, Advanced Server is able to
allocate resources dynamically as needed to meet changing
usage requirements.
The goal in identifying performance limits is to look at the
applications that are running on the cluster to determine
which hardware resource will experience the greatest
demand, and then adjust the configuration to handle that
demand and maximize total throughput.
You need to consider all of the nodes in a cluster when
looking for performance. You must consider how a node will
run when resources with different performance requirements
fail over from the other node. For example, you would fail
over Group 2 from Node B to Node A and run a performance
benchmark on Node A. You would check RAM, CPU, disk
utilization, and network utilization to see if they are beyond
capacity limitations. You repeat these steps to check the
performance limitations of Node B.
File and Print Services
If the primary role of a cluster is to provide high availability of
file and print services, high disk use will be incurred due to
the large number of files being accessed. File and print
services also cause a heavy load on network adapters
because of the large amount of data that is being
transferred. It is important to make sure that your network
adapter and cluster subnet can handle the load. In this
example, RAM typically does not carry a heavy load,
although memory usage can be heavy if a large amount of
RAM is allocated to the file system cache. Processor
utilization is also typically low in this environment. In such
cases, memory and processor utilization usually do not need
the optimizing that other components need. You can often
use memory effectively to reduce the disk utilization,
especially for disk read operations.
Applications
A server-application environment (such as Microsoft
Exchange) is much more processor-intensive and RAMintensive than a typical file or print server environment,
because much more processing is taking place at the
server. In these cases, it is best to use high-end
multiprocessor servers. Server cluster solutions use little of
the system resources, either for host-to-host
communications, or for the operation of the cluster itself.
Review
Introduction to Server Clusters
Key Concepts of a Server Cluster
Choosing a Server Cluster Configuration
Applications and Services on Server Clusters