Clustering Technology in Windows NT Server, Enterprise Edition
Download
Report
Transcript Clustering Technology in Windows NT Server, Enterprise Edition
Clustering Technology
In Windows NT Server,
Enterprise Edition
Jim Gray
Microsoft Research
[email protected]
Today’s Agenda
Windows NT® clustering
MSCS (Microsoft Cluster Server) Demo
MSCS background
Design goals
Terminology
Architectural details
Setting up a MSCS cluster
Hardware considerations
Cluster application issues
Q&A
Extra Credit
Included in your presentation
materials but not covered
in this session
Reference materials
SCSI primer
Speakers notes included
Hardware Certification
MSCS In Action
High Availability Versus
Fault Tolerance
High Availability: mask outages
through service restoration
Fault-Tolerance: mask local faults
RAID disks
Uninterruptible Power Supplies
Cluster Failover
Disaster Tolerance: masks
site failures
Protects against fire, flood, sabotage,..
Redundant system and service at
remote site
Windows NT Clusters
What is clustering to Microsoft?
Group of independent systems that
appear as a single system
Managed as a single system
Common namespace
Services are “cluster-wide”
Ability to tolerate component failures
Components can be added
transparently to users
Existing client connectivity is not
effected by clustered applications
Microsoft Cluster Server
2-node available 97Q3
Commoditize fault-tolerance
(high availability)
Commodity hardware
(no special hardware)
Easy to set up and manage
Lots of applications work out of the box.
Multi-node Scalability in NT5 timeframe
MSCA Initial Goals
Manageability
Availability
Manage nodes as a single system
Perform server maintenance without affecting users
Mask faults, so repair is non-disruptive
Restart failed applications and servers
Un-availability ~ MTTR / MTBF , so quick repair
Detect/warn administrators of failures
Reliability
Accommodate hardware and software failures
Redundant system without mandating a dedicated
“stand by” solution
MSCS Cluster
Client PCs
Server A
Server B
Heartbeat
Disk cabinet A
Cluster management
Disk cabinet B
Failover Example
Browser
Server 1
Server 2
Web
site
Web
site
Database
Database
Web site files
Database files
Basic MSCS Terms
Resource - basic unit of failover
Group - collection of resources
Node - Windows NT® Server
running cluster software
Cluster - one or more closely-coupled
nodes, managed as a single entity
MSCS Namespace
Cluster view
Cluster name
Node name
Node name
Virtual
server name
Virtual
server name
Virtual
server name
Virtual
server name
MSCS Namespace
Outside world view
Cluster
Node 1
Node 2
Virtual
Virtual
Virtual
server 1 server 2 server 3
Internet
Information
Server
SQL
IP address:
1.1.1.1
Network
name:
WHECCLUS
IP address:
1.1.1.2
Network
name:
WHECNode1
IP address:
1.1.1.3
Network
name:
WHECNode2
IP address:
1.1.1.4
Network
name:
WHEC-VS1
MTS
“Falcon”
IP address:
1.1.1.5
Network
name:
WHEC-VS2
Microsoft
Exchange
IP address:
1.1.1.6
Network
name:
WHEC-VS3
Windows NT Clusters
Target applications
Application & Database servers
E-mail, groupware,
productivity applications server
Transaction processing servers
Internet Web servers
File and print servers
MSCS Design Philosophy
Shared nothing
Remoteable tools
Windows NT manageability enhancements
Simplified hardware configuration
Never take a “cluster” down: shell game
rolling upgrade
Microsoft® BackOffice™ product support
Provide clustering solutions for all levels
of customer requirements
Eliminate cost and complexity barriers
MSCS Design Philosophy
Availability is core for all releases
Single server image for administration,
client interaction
Failover provided for unmodified server
applications, unmodified clients
(cluster-aware server applications
get richer features)
Failover for file and print are default
Scalability is phase 2 focus
Non-Features Of MSCS
Not lock-step/fault-tolerant
Not able to “move” running applications
MSCS restarts applications that are failed over to
other cluster members
Not able to recover shared state between
client and server (i.e., file position)
All client/server transactions should
be atomic
Standard client/server development
rules still apply
ACID always wins
Setting Up MSCS
Applications
Attributes Of Cluster- Aware
Applications
A persistence model that supports
orderly state transition
Client application support
Database example
ACID transactions
Database log recovery
IP clients only
How are retries supported?
No name service location dependencies
Custom resource DLL is a good thing
MSCS Services For
Application Support
Name service mapper
GetComputerName resolves
to virtual server name
Registry replication
Key and underlying keys and values
are replicated to the other node
Atomic
Logged to insure partitions
in time are handled
Application Deployment
Planning
System configuration is crucial
Adequate hardware configuration
You can’t run Microsoft BackOffice
on a 32-MB 75mhz Pentium
Planning of preferred group owners
Good understanding of single-server
performance is critical
See Windows NT Resource Kit
performance planning section
Understand working set size
What is acceptable performance to the
business units?
Evolution Of ClusterAware Applications
Active/passive - general out-of- the-box
applications
Active/active - applications that can run
simultaneously on multiple nodes
Highly scalable - extending
the active/active through I/O shipping,
process groups, and other techniques
Application Evolution
Application
Microsoft SQL Server
Node 1 Node 2
Microsoft Transaction
Server (MTS)
Internet Information
Server (IIS)
Microsoft Exchange
Server
Evolution Of ClusterAware Applications
Application
Node 1 Node 2 Node 3 Node 4
Internet Information
Server (IIS)
Microsoft Exchange
Server
Microsoft SQL Server
Microsoft Transaction
Server (MTS)
Resources
What are they?
Resources are basic system
components such as physical disks,
processes, databases, IP addresses,
etc., that provide a service to clients
in a client/server environment
They are online in only one place
in the cluster at a time
They can fail over from one system
in the cluster to another system
in the cluster
Resources
MSCS includes resource DLL support for:
Physical and logical disk
IP address and network name
Generic service or application
File share
Print queue
Internet Information Server virtual roots
Distributed Transaction Coordinator (DTC)
Microsoft Message Queue (MSMQ)
Supports resource dependencies
Controlled via well-defined interface
Group: offers a “virtual server”
Cluster Service To Resource
Windows NT
cluster service
Initiate changes
Resource events
Resource
monitor
Physical disk
resource DLL
IP address
resource DLL
Generic app
resource DLL
Database
resource DLL
Disk
Network
App
Database
Cluster Abstractions
Cluster
Resource
Group
Resource
Resource: program or device managed by a cluster
e.g., file service, print service, database server
can depend on other resources (startup ordering)
can be online, offline, paused, failed
Resource Group: a collection of related resources
hosts resources; belongs to a cluster
unit of co-location; involved in naming resources
Cluster: a collection of nodes, resources, and groups
cooperation for authentication, administration, naming
Resources
Cluster
Group
Resource
Resources have...
Type: what it does (file, DB, print, Web…)
An operational state (online/offline/failed)
Current and possible nodes
Containing Resource Group
Dependencies on other resources
Restart parameters (in case
of resource failure)
Resource
Fails over (moves) from one
machine to another
Logical disk
IP address
Server application
Database
May depend on another resource
Well-defined properties
controlling its behavior
Resource Dependencies
A resource may depend
on other resources
A resource is brought online after
any resources it depends on
A resource is taken offline before
any resources it depends on
All dependent resources must
fail over together
Dependency Example
Database
resource DLL
IP address
resource DLL
Drive E:
resource DLL
Generic
application
resource DLL
Drive F:
resource DLL
Group Example
Payroll group
Database
resource DLL
IP address
resource DLL
Drive E:
resource DLL
Generic
application
resource DLL
Drive F:
resource DLL
MSCS Architecture
Cluster
API
Cluster administrator
Cluster API DLL
Cluster API stub
Cluster.Exe
Cluster API DLL
Log
Manager
Database
Manager
Event
Processor
Checkpoint
Manager
Failover
Manager
Application
resource DLL
Global
Update
Manager
Object
Manager
Resource
Manager
Resource
monitors
Membership
Manager
Node
Manager
Resource
API
Physical
Logical
Application
resource DLL resource DLL resource DLL
Network
Reliable Cluster
Transport + Heartbeat
MSCS Architecture
Cluster service is comprised of the
following objects
Failover Manager (FM)
Resource Manager (RM)
Node Manager (NM)
Membership Manager (MM)
Event Processor (EP)
Database Manager (DM)
Object Manager (OM)
Global Update Manager (LM)
Checkpoint Manager (CM)
More about these in the next session
Setting Up An
MSCS Cluster
MSCS Key Components
Two servers
Shared SCSI bus
SCSI HBAs, SCSI RAID HBAs, HW RAID boxes
Interconnect
Multi versus uniprocessor
Heterogeneous servers
Many types can be supported
Remember, two NICs per node
PCI for cluster interconnect
Complete MSCS HCL configuration
MSCS Setup
Most common problems
Duplicate SCSI IDs on adapters
Incorrect SCSI cabling
SCSI Card order on PCI bus
Configuration of SCSI Firmware
Let’s walk through getting
a cluster operational
Test Before You Build
Bring each system up independently
Network adapters
Cluster interconnect
Organization interconnect
SCSI and disk function
NTFS volume(s)
Top Ten Setup “Concerns”
10. SCSI is not well known. Please use the MSCS and
IHV setup documentation. Consider the SCSI book
reference for this session
9. Build a support model that will support clustering
requirements. For example, in clustering
components are paired exactly (i.e., SCSI bios
revision levels. Include this in your plans)
8. Build extra time into your deployment planning to
accommodate cluster setup, both for hardware and
software. Hardware examples include SCSI setup.
Software issues would include installation across
cluster nodes
7. Know the certification process
and its support implications
Top Ten Setup “Concerns”
6. Applications will become more cluster-aware through
time. This will include better setup, diagnostics, and
documentation. In the meantime, plan and test accordingly
5. Clustering will impact your server maintenance
and upgrade methodologies. Plan accordingly
4. Use multiple network adapters and hubs to eliminate
single points of failure (everywhere possible)
3. Today’s clustering solutions are more complex
to install and configure than single servers. Plan
your deployments accordingly
2. Make sure that your cabinet solutions and peripherals both
fit and function well. Consider the serviceability
implications
1. Cabling is a nightmare. Color coded, heavily
documented, Y cable inclusive, maintenance-designed
products are highly desirable
Cluster Management Tools
Cluster administrator
Cluster CLI/COM
Monitor and manage cluster
Command line and COM interface
Minor modifications to existing tools
Performance monitor
Add ability to watch entire cluster
Disk administrator
Add understanding of shared disks
Event logger
Broadcast events to all nodes
MSCS
Reference Materials
In Search of Clusters; The Coming Battle
In Lowly Parallel Computing
Gregory F. Pfister
ISBN 0-13-437625-0
The Book of SCSI
Peter M. Ridge
ISBN 1-886411-02-6
The Basics Of SCSI
Why SCSI?
Types of
interfaces?
Caching and
performance…
RAID
The future…
Why SCSI?
Faster then IDE - intelligent card/drive
Uses less processor time
Can transfer data up to 100 MB/sec.
More devices on a single chain up to 15
Wider variety of devices
DASD
Scanners
CD-ROM writers and optical drives
Tape drives
Types Of Interfaces
SCSI and SCSI II
Wide SCSI
68-pin, 16-bit, max transfer = 20 MB/s
Internal transfer rate = 7 to 15.5 MB/s
Ultra SCSI
50-pin, 8-bit, max transfer = 10 MB/s
(early 1.5 to 5 MB/s )
Internal transfer rate = 4 to 8 MB/s
50-pin, 8-bit, higher transfer rate,
max transfer = 20 MB/s
Internal transfer rate = 7 to 15.5 MB/s
Ultra wide
68-pin, 16-bit, max transfer rate = 40 MB/s
Internal transfer rate = 7 to 30 MB/s
Performance Factors
Cache on the drive or controller
Caching in the OS
Different variables
Seek time
Transfer rates
Redundant Array Of
Inexpensive Disks (RAID)
Developed from paper published in 1987
at University of California Berkeley
The idea is to combine multiple inexpensive drives
(eliminate SLED - single large expensive drive)
Provided redundancy by storing parity information
Raid Types A.K.A
Description
RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5
the fastest RAID - data is "stripped" across multiple volumes , no redundancy
a simple pair of drives with data replicated on both, writes are slower
these sector stripe data across drives with some storing ECC info - done in HW now
Sector striping but one drive dedicated to storing parity information for the set
identical to RAID 3 but large stripes
Best for Multi-user environments, parity is spread across 3 or more drives
Striping
Mirroring
The Future For SCSI
Faster interfaces - why?
Fibre Channel
Optical standard
Proposed as part of SCSI III (not final)
Up to 100 MB/s transfer
Still using ultra-wide SCSI
inside enclosures
Drives with optical interfaces not
available yet in quantity, higher cost
than SCSI
The Future Of SCIS
Fibre Channel-arbitrated loop
Ring instead of bus architecture
Can support up to 126 devices/hosts
Hot pluggable through the use
of a port bypass circuit
No disruption of the loop as devices
are added/removed
Generally implemented using
a backplane design
HCL List For MSCS
Servers on normal Windows NT HCL
MSCS SCSI component HCL
Self-test of MP machines soon
Tested by WHQL
Must pass Windows NT HCT as well
MSCS interconnect HCL
Tested by WHQL
Not required to pass 100% of HCT
I.e., point-to-point adapters
MSCS System Certification
Process
Windows NT 4.0+
SCSI
HCL
Windows NT 4.0+
Network
HCL
Windows NT 4.0+
MSCS
SCSI
HCL
Complete MSCS configuration ready
for self-test
Windows NT 4.0+
Server
HCL
Testing Phases
HW compatibility (24 hours)
One-node testing (24 hours)
Eight clients
Two-node with failover (72 hours)
SCSI and interconnect testing
Eight-client with asynchronous failovers
Stress testing (24 hours)
Dual initiator I/O, split-brain problems
Simultaneous reboots
Final MSCS HCL
Only complete configurations
are supported
Self test results sent to Microsoft
Logs checked and configuration reviewed
HCL updated on Web and for
next major Windows NT release
For more details see the MSCS
Certification document