Transcript Data

MSG389
Achieving High Availability
with Windows Server and
Exchange Server
Anthony Quigney,
Application Solution Centre Manager, Dell EMEA
Brian Hayden,
Senior Systems Consultant, Application Solution
Centre, Dell EMEA
Agenda
Availability – Why is it important?
Availability – Defined
Availability - Business / IT Challenges
Availability Solutions
Windows Server 2003
Exchange Server 2003
Clustering
Dell High Availability Solutions
Effects Of Downtime
Today, computing resources are the axis on which business revolves.
When these resources are unavailable to an organization, it is at risk
of losing its competitive edge.
Lost Systems . . .
Leads to lost . . .
Revenue
Customers
Productivity
Data
data
Decision Capability
???
The Cost of Downtime
Industry Sector
Energy
Telecommunications
Manufacturing
Financial institutions
Insurance
Retail
Pharmaceuticals
Banking
Food/beverage processing
Consumer products
Transportation
Utilities
Health care
Professional services
Construction and engineering
Media
Hospitality and travel
Average
Revenue Per
Hour
$2,817,846
$2,066,245
$1,610,654
$1,495,134
$1,202,444
$1,107,274
$1,082,252
$996,802
$804,192
$785,719
$668,586
$643,250
$636,030
$532,510
$389,601
$340,432
$330,654
$1,010,536
Source: IT Performance Engineering & Measurement Strategies: Quantifying Performance Loss, Meta Group, October 2000
Revenue Per
Employee-Hour
$569.20
$186.98
$134.24
$1,079.89
$370.92
$244.37
$167.53
$130.52
$153.10
$127.98
$107.78
$380.94
$142.58
$99.59
$216.18
$119.74
$38.62
$205.55
Four Levels of Continuity
 High Availability: Maintaining the availability of systems critical to ongoing
operations during a failure or service outage
 Disaster Recovery: Recovering from unplanned, catastrophic events or
disasters in a predetermined manner based on the importance of the system
Site Recovery 
Site
Beyond the Building
Remote or commercial recovery facilities
Site/Datacenter Failover  
Re-route users and data to replicated sites
Application Failover/ Load Balancing 
Application
Continuous application access via clustering
System Interaction
Redundant Systems 
Continuous server, storage, network access
Data
SAN, NAS & DAS 
Beyond the Box
Continuous data access
Backup and Restore 
Real-time tape backup, Off-site storage
Platform
Rapid Equipment Replacement 
In the Box
Vendor services and financing programs
High Availability System Features 
Hot- swappable, redundant components with Mission-critical support
Increasing cost, functionality and complexity
The Causes of Downtime
When a failure occurs, it makes an impact. Avoiding downtime results
from properly planning, designing and implementing
multiple levels of protection.
Causes of Failure
Examples
Impacts…
Component failure
Bad memory chip, fan, power,
HDD, data path, controller
Platform, data
Software
defects/failures
Driver hangs, OS hangs/reboots,
virus, file corruption
Platform, data, applications
Planned administrative
downtime
Upgrade components, firmware,
drivers, O/S, software
Platform, data, applications
Operator error and
malicious users
Accidental or intentional file deletion,
unskilled operation, experimentation
Platform, data, applications
System
outage/maintenance
Software/systems requiring
reboot, system board failure
Applications
Building/site disaster
Fire, storms, collapse, explosion,
and other localized disasters
Site
Metropolitan disaster
Earthquake, hurricanes, floods,
other regional natural catastrophes
Site
Availability
To measure availability, we need to know
How often a failure is expected, or the Mean Time to
Failure (MTTF)
What is the time it takes to recover from a failure, or
the Mean Time to Recover
The calculation for availability is
Availability =
MTTF
MTTF + MTTR
To achieve high availability
MTTF must be as high as possible
MTTR must be as low as possible
In addition, you must consider business impact when
calculating availability
Levels of Availability
What businesses are saying
“having my email work is more important to
me than having a dial tone” – Fortune 50
CIO
“In the next 24 hours, 8 million e-mail
messages will be exchanged among
employees in the Boeing network”1
1 http://www.boeing.com/companyoffices/aboutus/quickfacts.html
Email is business critical
What analysts are saying
Email is mission critical and must be
efficient: in 2003 businesses will send 3.5
Trillion emails --over 13 Billion emails/day
Gartner predicts that the volume of daily
emails sent worldwide will reach 36 billion
by 2005 – more than three times the
number of emails sent in 20011
1 Gartner Dataquest Perspective, Market Analysis, “From Content to Knowledge: The Growing Gap, March 4, 2003
.
Top Concerns of Today’s
Messaging Environment
Reliability
Quick Recovery
Security
Privacy
Business Integrity
Windows Server 2003
Exchange Server 2003
Advanced Features
New Features of Windows
Server 2003
8 node failover clusters
Shutdown tracker (log reasons for shutdown,
restart)
Diskpart – grow basic volumes
Volume Shadow Copy
Mount points (in Cluster)
/USERVA = 3030 (boot.ini switch)
Improved AD performance
Better Memory Management
New Features of Exchange
Server 2003
Improved OWA (more like Outlook)
Improved Virus Scanning API (VSAPI)
Exchange Management Pack for MOM included
New Migration tools
Increased Network Performance
Decreased network & processing costs
Replication
IPSec support between front-end and back-end
clusters
With Exchange Server 2003 on
Windows Server 2003…
Enable Server and Site
Consolidation
Improve Management and
Administration
Enhance User Experience and
Information Management
Improve Client and Server
Communications (sync)
Increase the User productivity
AD & OS Compatibility Matrix
Compatible operating
systems
Supported Active Directory
environments
Exchange
version
Windows
Windows
2000 Server Server 2003
SP3+
Windows
2000 Server
SP3+
Windows
Server 2003
Exchange
2003
Yes
Yes
Yes
Yes
Exchange
2000 + SP3
Yes
No
Yes
Yes
Exchange
2000 + SP2
Yes
No
Yes
Yes
Exchange
5.5 + SP3
Yes
No
Not required
Not required
Exchange Server 2003
Clustering
High Availability Cluster: Goals
Availability
Data, application, service
Scalability
CPU, storage, # nodes
Application Recovery
failover, restart
Manageability
Single Point of Administration
Eliminate Single Point of Failure
Redundancy throughout
MSCS: Virtual Servers
Clients connect
to Virtual
Servers (VS). If
a cluster node
running a VS
fails, the other
server will run
the VS
Client
Virtual Server #1
Name: CLUSTERIP
IP: 192.168.1.11
App: Quorum
MSCS
Client
Virtual Server #2
Name: EXG1
IP: 192.168.1.12
App: Exchange
Cluster Node A
Name: CLUSTER_A
IP: 192.168.1.1
APP: MSCS
Quorum
Virtual Server #X
Name: EXG2
IP: 192.168.1.13
App: Exchange
Cluster Node B
Name: CLUSTER_B
IP: 192.168.1.2
APP: MSCS
EXG1
EXG2
Virtual Servers
typically
include the
following
resources: a
disk, IP
address,
network name,
and application
service(s)
Clients do not
connect to
physical
nodes. Admins
connect for
administration
Cluster Services – Active N+I
Active (N) + Passive (I) combinations
Clusters of smaller servers will continue to
overtake larger proprietary systems
Less $$ for hardware
Scale better
Faster Failover
Exchange Server 2003 Clusters
Server Version
Active (N+I)
ActiveN
Windows 2K AS
2 node
3+1 node
7+1 node
7+1 node
2 node
3 nodes
7 nodes
7 nodes
Windows 2K DC
Windows 2K3 EE
Windows 2K3 DC
Exchange 2003 Installation on MSCS
Easier to create cluster or add nodes using
Cluster Administrator in Windows Server 2003
Microsoft Exchange Server 2003 automatically
detects presence of MSCS cluster and installs
necessary components.
Microsoft® Exchange Failover
Cluster Node
Fails
Failure Detected
by Cluster
Heartbeat
Restart Exchange
Resources
Virtual Server
Restore
Communications
Client side retry
Surviving Node
acquires Disk
Reservations
Check and
mount the file
systems
Exchange 2000 Dependency Tree
SMTP
HTTP
IMAP4
POP3
MSSearch
Exchange Store
Message
Transfer
Agent
System Attendant
Network
Name
IP
Address
Physical
Disk
Routing
Exchange 2003 Dependency Tree
Flattened dependency hierarchy of
Exchange services
Faster recovery times after failover
SMTP
Message
Transfer
Agent
HTTP
IMAP4
Exchange
Store
System Attendant
Network
Name
IP
Address
Physical
Disk
MSSearch
Routing
Dell | EMC Storage
Advanced Features
Typical Storage Environment
LAN
Exchange 2000
40GB
DLT7000
Tape
Library
80GB
SQL Server2000
40GB
45GB
File & Print
15GB
80GB
Other
15GB
DDS-4
60GB
What are the IT challenges with this environment ?
Consolidated Storage
Environment
LAN
Exchange 2000
SQL Server2000
Tape
Library
Consolidating Storage
File & Print
Other
High Availability Level
Storage = Achilles’ Heel
Application
Operating System
Server
HOST BUS ADAPTER
STORAGE CONTROLLER
RAID LEVEL
DISK PORT
Consolidated Storage
Environment
LAN
Exchange 2000
SQL Server2000
Tape
Library
Consolidating Storage
File & Print
Other
Redundant
Storage Area Network (SAN)
Redundant Storage System
Multi-Path IO with failover (PowerPath)
Redundant Storage Processors (RAID
controllers)
Protected write cache
Mirroring
SPS
Vaulting
RAID 1, 3, 5, 1+0
Dual Fibre Channel loops on storage
system back-end
PowerPath
Load balance I/O across multiple paths to the
same RAID controller
I/O Path failover for redundant paths
I/O’s are divided
across both paths to
SPB
SnapView - Snapshot
SnapView creates logical point-in-time views
of production information
Production
Host
100 GB
Takes only seconds to create a complete
snapshot
Copy on first write
Production
Data
Snap
10 GB
Snapshot allows access for test, backup, etc.,
without compromising the production data
Snapshot
SnapView - Clone
Production
Host
SnapClone creates full
point in time copy of
another Volume
100 GB
Production
Data
Snap
100 GB
Snap Clone
Backup
Server or
Testing
Host
Snapshots & SnapClone
Array based product – no burden on host
Read / write mountable by a secondary host
for increased productivity
Minimizes time that production data is
unavailable to users
Can eliminate scheduled downtime for
backup
Requires less disk space than a full mirror
MirrorView
Maintains synchronous remote mirroring
between two Dell | EMC arrays
Transparent to server, operating
system, and applications
Protects from unavailability and data loss
Primary and secondary site can be
remote storage for each other
Failover production environment to
remote site
Backup
Dell | Quantum
Storage
Data Protection: Value Tradeoffs with Different Solutions
Short
High
Low
Restore
Time
Availability
Safety
Long
Low
High
Short
One
Mirroring
Primary
Disk
Replication
Snapshots
Secondary
Disk
Time
MultiRetained Vendor
Backup
Tape
Archiving
Long
Many
Prioritizing Data Based on Value
Business slowed if data is
unavailable; stopped if data is lost
Important
Business can operate with limited
data availability; significant
disruption if data is lost
Lower priority
Business can operate with minimal
data availability; some disruption if
data is lost
Impact of Data Loss
Essential
High
Availability
Lifeblood
Business stops if data is
unavailable or lost
Low
Value
Aligning Data Protection Needs with
Technologies
Local
Tape
Backup and Remote
Tape Archive
Snapshots
Important
Disk-Based
Backup
Local/Remote
Tape Archive
Impact of Data Loss
Essential
Asynchronous
Mirrored or Replicated
RAID
High
Availability
Lifeblood
Synchronous Mirrored
RAID
Tape Autoloader
Non-essential
Low
General Purpose
NAS
Value
Disk and Tape:
Both Have a Role to Play
0101000001001101001
0101000001001101001
Backup Server with
Backup Software
01
01
Disk-based hardware optimized
for data protection
PV 136T
Tape Library
and Dell Solutions:
Meeting the Challenge Together
Lifeblood
Essential
SDLT in
Large Automation
DLT/SDLT
Midrange
Automation
Libraries
Important
Power Vault DLT autoloaders
Lower priority
Dell DLT/SDLT Drives
DLTtape &
Super DLTtape
Media
Exchange DR Demo
On View at Dell Stand
Disaster Recovery Site
Production Site
Domain Controller
Domain Controller
Promote
Boot
Update
Remote
Storage
Remote
DRMirrors
Groups
Server
Exchange 2000
Exchange 2000
SITE
Storage
Groups
OS Boot
Disk
OS Boot
Disk mirror
Exchange
Logs
Exchange
Logs Mirror
FAILURE!
Exchange
Store
MirrorView
Exchange
Store Mirror
Fibre Switch
Fibre Switch
CX600
FC4700
Dell EMC Nortel BT
Business Continuity Solution
Dell Application Solution
Centre
Fibre Connectivity
EMC Solutions Operation
Centre
Exchange High Availability Solution
Domain Controller
Exchange Data
Volumes
Host boot
volumes
MirrorView
A
B
C
D
Domain Controller
100 Miles
Existing SAN
ESAT BT
DWDM
Managed
Service
Port 3
(MirrorView)
CX600-A
Host A
Clustered
Host B
Clustered
Port 3
CX600-B
Nortel Optera
Nortel Optera
Existing SAN
Extended VLAN
Host C
Clustered
Mgt Host
Host D
Clustered
Dell Limerick
DELL/EMC/Nortel/ESAT BT DWDM Installation
EMC Cork
Mgt Host
More Information
Dell HA Clustering website
www.dell.com/clusters
Dell Solutions website
www.dell.com/solutions
Dell Power Solutions Magazine (online)
www.dell.com/powersolutions
Dell ROI Online Calculators
www.dell.com/roi
Ask The Experts
Get Your Questions Answered
Ask the Experts area Wednesday 9-11
Dell Stand (All Week)
Thank You
Community Resources
Community Resources
http://www.microsoft.com/communities/default.mspx
Most Valuable Professional (MVP)
http://www.mvp.support.microsoft.com/
Newsgroups
Converse online with Microsoft Newsgroups, including Worldwide
http://www.microsoft.com/communities/newsgroups/default.mspx
User Groups
Meet and learn with your peers
http://www.microsoft.com/communities/usergroups/default.mspx
evaluations
© 2003 Microsoft Corporation. All rights reserved.
This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.
Backup Slides
SAN Copy
No host CPU cycles involved
Can copy to/from
CLARiiON (Dell | EMC)
Symmetrix
Usage
Upgrade - one time
migration to another storage
system
Test – routine copy to
secondary storage for test
Content Distribution – copy
to multiple targets
Source can be snapshot,
clone, fractured mirror
Copy data from LUN to LUN
Target LUN must be > or =
source LUN
Primary Causes of Data Loss
 A data protection solution should protect you
against all causes of data loss
Natural Other
Software Disasters 3%
4%
Failure
5%
Theft/
Sabotage
7%
Human
Error
38%
Viruses
10%
Power
Failure/ Surges
12%
Hardware Failure
20%
Source: Quantum analysis
Protection from Mirrored Disk
• Purely disk-based backup systems do not offer adequate protection
against human error, viruses, hackers or natural disasters
• Removable media such as
tape provides full protection
Natural Other
Disasters 3%
Software 4%
Failure
Theft/ 5%
Sabotage
7%
Human
Error
38%
Viruses
10%
Power Failure/
Surges
12%
Source: Quantum analysis
= Protected by
mirrored disk
Hardware
Failure
20%
= Not fully protected
without removable
tape media