Transcript Servers

1
2
High Availability Software for Windows NT
NeoCLUSTER
WHY : Demand for Availability
WHAT : Technology and Product
HOW : Configuration
3
Demand for Availability
• Information is a capital asset of an
organization.
• The server systems for archiving, processing,
and conveying information must be constantly
monitored and carefully managed to provide
reliable, timely, and continuous services.
• Down time is inevitable
– Scheduled and Unscheduled
4
Trends
• Distributed processing and multi-tier
client/server applications
• Multiple servers are collaborated to improve
– Load sharing
– Performance
– Availability
• Windows NT is becoming a major server
platform for mission-critical applications.
5
Factors of System Availability
•
•
•
•
CPU, memory, I/O cards
Disk
Application software
Common hardware &
system software
• Human error
24%
27%
22%
21%
6%
Source : Strategic Research Division of Find SPV
6
System Availability Hierarchy
Applications
Cost
Hosts
I/O Paths
Storage Subsystem
Disk & Tape
Technology
7
Ask Your Customers
• Do you use Windows NT as the platform of
your mission critical applications?
• Does system downtime mean losses to you?
• Do you need a technically and economically
affordable solution to make your NT servers
fault resilient?
• Do you need a guarding angel to watch over
your NT servers around the clock so that you
can sleep better at night?
8
YES!!
9
Level of System Availability
• Non-stop systems: Stratus, Tandem, Netware
SFT III
– Tightly coupled, fully duplicated configuration
– Proprietary OS
• Non-redundant systems
– Hot-plug and self-diagnostic hardware components
– Auto-retry and pro-active software
10
Level of System Availability
• High Availability systems
– A cluster of loosely coupled servers
– Software based implementation
– Provide better availability/price ratio than non-stop
systems
11
Cluster
• Server farm : Single Network Identity
– Database Cluster : Cluster Manager, Distributed
Lock Manager
– Computing Cluster : Parallel Computing
12
Realtime Data Replication
13
NeoCLUSTER
• A pure software solution for building highly
available server cluster
• Microsoft Windows NT server standard edition
version 4.0
• I386 and Alpha platforms
• Functions
– Cluster configuration and administration
– Failure detection, logging, notification, isolation,
and recovery
14
Features
• Technically and economically affordable
• Fully compatible with Windows NT
• Require no software modification or
proprietary hardware
• No single point of failure
• Reliable and efficient mechanism for error
detection and fault recovery
15
Features
•
•
•
•
Intuitive and user friendly Windows GUI
Fully user configurable
Support automatic and manual switch back
Negligible impact on resource consumption
and server performance.
• Minimum human intervention
• No intrusion to routine workflow
16
Operation Scenario:Hardware Perspective
17
Servers
• Active Server is a pre-designated computer
responsible for providing critical services that
will be guarded by NeoCLUSTER.
• Backup Server is a pre-designated computer
that will takeover the active server under the
administration of NeoCLUSTER.
• Neither identical configured servers nor
dedicated backup server is required
18
Private Network
• Dedicated interconnect for inter-server
communication.
• Three types of interconnect for redundancy
– TCP/IP : back to back or LAN connection of two
network interface cards
– RS-232 : serial cable with null modem support to
connect two COM ports
– Disk volume : two dedicate partitions on the shared
disks
19
Private Network
• All instances of private net were unavailable
– A server can still rely on the public net to detect the
availability of the peer server.
– If the peer server is still available, no takeover
action will be triggered.
– If the peer server was unavailable, a takeover
action will be activate immediately.
20
Public Network
• Dedicated network for clients to access servers.
• TCP/IP and NetBEUI protocols
• Each active server will carry a switchable
network ID(i.e., IP address or computer name)
– The original network IDs of both servers can
remain intact.
– Clients will connect to the switchable network ID.
– If the active server was unavailable, the backup
server will takeover the switchable network ID.
21
Public Network
• NeoCLUSTER provides built-in mechanism to
identify network failure problem.
– Self-diagnostic of network availability
– Supported NICs : Intel EtherExpress PRO/100B,
3Com 3C905B, DEC 21x4x.
– Supported NIC add-on software : NIC Express
from IPMetrics(load balancing and fault-tolerance).
22
Private Drives and Public Drives
• Private drives are disk volumes for storing OS
and the data that is not required to be
accessible by the backup server.
• Public drives are disk volumes on the shared
disks for storing the application software and
related data that must be accessible by the
backup server.
– Shared SCSI bus or independent host channels
– Mirroring or RAID subsystems.
23
Clients
• Computer systems that access the active
servers via TCP/IP or NetBEUI protocols.
24
Operation Scenario:Software Perspective
• Block diagram
Administration Tool
Cluster Monitor Service
Resource Object
Agent
Script
Cluster Service
Windows NT
25
Operation Scenario:Software Perspective
• Module interaction of NeoCLUSTER
Active Server
Resource Object
Resource
Monitoring
Agent
Cluster Service
Agent
Heartbeat
Cluster
Monitor Service
Server
Heartbeat
Backup Server
Cluster Service
26
Cluster Service and Cluster Monitor Service
• The core processes of NeoCLUSTER
• Two mutual-guarded NT services
– user transparent auto-restart
• Functions
– Resource objects management
– Event logging and notification
– Fault isolation and recovery
27
Server Heartbeat
• Periodic messages
• Servers exchange heartbeats with each other
over the private net
• Inform the receiving server the availability of
the sending server
28
Resource Object
• Components of mission critical services
– Repository of service related files : Volume
– Switchable network identity for clients to access
the services : IP Address or Computer Alias Name
– The service itself : File Share, NT Services, or User
Defined
29
Resource Object
• Volume
– Disk partitions on the public drives.
– The drive letter mapping and partition information
of a volume must be identical when viewed from
both servers. This ensures that no matter which
server is the active server, the volume can be
accessed with the same drive letter.
– NeoCLUSTER provides “volume locking” to
ensure exclusive volume access.
30
Resource Object
• IP Address
– A switchable network identity for TCP/IP.
• Computer Alias Name
– A switchable network identity for NetBEUI.
• File Share
– Shared directories that are accessible by clients.
– Both servers must use the same share name.
31
Resource Object
• NT Services
– Most application software for Windows NT are
implemented as NT services.
• User Defined
– For configuring the application software that is not
implemented as NT services.
– For grouping related resource objects into resource
hierarchy.
32
Resource Hierarchy
• Each mission critical service is formulated and
manipulated as a resource hierarchy
33
Resource Hierarchy
• A resource hierarchy is an integrated entity.
• A resource hierarchy identifies the required
resource objects and the proper sequence to
activate those resource objects.
• A single resource object is a generic resource
hierarchy.
34
Agents
• Windows NT executable files
• Availability monitoring and error detection
– Intelligent and light-weighted
» Least system resource consumption
» Minimum impact on system performance
– Efficient and reliable
» No critical failure will be neglected
» Real-time respond to failure to reduce downtime
» No false alarm
35
Agents
• Built-in agents
– Server, public net, public drives
– Resource objects
• Agent API and template
– Custom agent development
– An open interface to communicate and interact
with other programmable third party hardware and
software management tools
36
Agent Heartbeat
• Periodic messages
• Agent send heartbeats to the Cluster Service to
inform the Cluster Service the availability of
the resource object monitored by the agent
37
Scripts
• Windows NT executable files
• Auto-initiated
– Start a series of programs
– Terminate a series of programs
– Monitoring a series of programs
– Trigger event notification programs
38
Administration Tool
•
•
•
•
•
Intuitive and user friendly
Interactive point-and-click Windows GUI
Menu-driven and form-based interface
Icon-based real-time status monitoring
Support dynamic configuration and real-time
synchronization
• Remote administration using Web browser is
freely available from third parties
39
Administration Tool
40
Availability Recovery
• Critical factors of failover/takeover : Volume,
NT Service, User Defined
• Mechanisms
– Failover is initiated by the active server
– Takeover is initiated by the backup server
• Failover/Takeover
– The active server deactivate corresponding
resource hierarchy
– The backup server reactivate the resource hierarchy
41
Availability Recovery
• Switch back/Fail back
– Switch a resource hierarchy back to the original
active server from the backup server
» The original active server has recovered
» The backup server detects that the active server has
recovered
– Retain the original load distribution
» Asymmetric configuration : active/backup servers with
different capacity
» Symmetric configuration : two active, mutual takeover
42
Clients
• Client-end applications will connect to
switchable network IDs
• No need to reconfigure or modify the clientend applications
• Reconnection after a failover operation is
application dependent
43
Clients
• Stateless applications
– NFS service or UDP-based applications
– User transparent
• Stateful applications
– Client/server RDBMS applications or TCP-based
applications
– The client applications will loose their connection
to the server
– Manually reconnect to server is required
44
Supported Application
•
•
•
•
•
File Sharing
Printer Spooler
Internet Servers(FTP, WWW, etc.)
RDBMS(Microsoft, Oracle, Sybase, Informix)
Microsoft Exchange Server, Lotus Notes
Server
• NT Service-based application software
• TCP/IP or NetBEUI-based client/server
applications
45
Future Improvements
• Multiple error notification facilities
– Server side visual and audio alarm
– Message broadcasting
– E-mail
– Pager
– SNMP agent
• Simplified GUI
• N to 1 cluster configuration
46
Supported Configurations
• Active/Backup
47
Supported Configuration
• Active/Active
48
Supported Configuration