Transcript Slides

Scott Schnoll
Principal Technical Writer
Microsoft Corporation
Session Code: UNC307
Agenda
Exchange 2010 High Availability Vision/Goals
Exchange 2010 High Availability Features
Exchange 2010 High Availability Deep Dive
Deploying Exchange 2010 High Availability
Features
Transitioning to Exchange 2010 High Availability
High Availability Design Examples
Exchange 2010 High Availability
Vision and Goals
Vision: Deliver a fast, easy-to-deploy and operate,
economical solution that can provide messaging
service continuity for all customers
Goals
Deliver a native solution for high availability/site resilience
Enable less expensive and less complex storage
Simplify administration and reduce support costs
Increase end-to-end availability
Support Exchange Server 2010 Online
Support large mailboxes at low cost
Exchange Server 2003
Outlook
OWA, ActiveSync, or
Outlook Anywhere
San Jose
Front End Server
NodeA
(active)
Complex site
resilience and
recovery
Clustered Mailbox
Server had to be
created manually
NodeB
(passive)
Dallas
DB1
DB2
Standby
Cluster
DB3
Third-party data
replication needed
for site resilience
Clustering
knowledge required
DB1
DB4
DB2
DB5
DB3
DB6
Failover at Mailbox
server level
Exchange Server 2007
Outlook
OWA, ActiveSync, or
Outlook Anywhere
SCR
Client Access
Server
Standby
Cluster
DB3
No GUI to
manage SCR
NodeB
(passive)
CCR
DB1
DB2
Clustered Mailbox
Server can’t co-exist
with other roles
San Jose
NodeA
(active)
Dallas
Complex activation
for remote server /
datacenter
Clustering
knowledge required
DB1
DB4
DB1
DB4
DB2
DB5
DB2
DB5
DB3
DB6
DB3
DB6
Failover at Mailbox
server level
Exchange Server 2010
Dallas
All clients connect
via CAS servers
DB1
DB3
Mailbox
Server 6
San Jose
DB5
Easy to extend
across sites
Client Access
Server
Failover managed
by/with Exchange
Mailbox
Server 1
Mailbox
Server 2
Mailbox
Server 3
Mailbox
Server 4
Mailbox
Server 5
DB1
DB4
DB2
DB5
DB3
DB2
DB5
DB3
DB1
DB4
DB3
DB1
DB4
DB2
DB5
Database
level failover
Exchange 2010 High Availability
Terminology
High Availability – Solution must provide data
availability, service availability, and automatic recovery
from failures
Disaster Recovery – Process used to manually recover
from a failure
Site Resilience – Disaster recovery solution used for
recovery from site failure
*over – Short for switchover/failover; a switchover is a
manual activation of one or more databases; a failover
is an automatic activation of one or more databases
after a failure
Exchange 2010 High Availability
Feature Names
Mailbox Resiliency – Name of Unified High
Availability and Site Resilience Solution
Database Mobility – The ability of a single mailbox
database to be replicated to and mounted on other
mailbox servers
Incremental Deployment – The ability to deploy high
availability /site resilience after Exchange is installed
Exchange Third Party Replication API – An Exchangeprovided API that enables use of third-party replication
for a DAG in lieu of continuous replication
Exchange 2010 High Availability
Feature Names
Database Availability Group – A group of up to 16
Mailbox servers that host a set of replicated databases
Mailbox Database Copy – A mailbox database (.edb file
and logs) that is either active or passive
RPC Client Access service – A Client Access server
feature that provides a MAPI endpoint for Outlook
clients
Shadow Redundancy – A transport feature that
provides redundancy for messages for the entire time
they are in transit
Exchange 2010 *overs
Within a datacenter
Database or server *overs
Datacenter level: switchover
Between datacenters
Database or server *overs
Assumptions:
Each datacenter is a separate Active Directory site
Each datacenter has live, active messaging services
Standby datacenter must be active to support single
database *over
Exchange 2007 Concepts Brought
Forward
Extensible Storage Engine (ESE)
Databases and log files
Continuous Replication
Log shipping and replay
Database seeding
Store service/Replication service
Database health and status monitoring
Divergence
Automatic database mount behavior
Concepts of quorum and witness
Concepts of *overs
Exchange 2010 Cut Concepts
Storage Groups
Databases identified by the server on which they live
Server names as part of database names
Clustered Mailbox Servers
Pre-installing a Windows Failover Cluster
Running Setup in Clustered Mode
Moving a CMS network identity between servers
Shared Storage
Two HA Copy Limits
Requirement of Two Networks
Concepts of public, private and mixed networks
Fast Recovery
HA/Backup Strategy Changes
HW/SW Failures
Mailbox Resiliency
Data Center
Failures
Data Retention
Accidentally
Deleted Items
Fast recovery
Data redundancy
Single Item
Recovery
Guaranteed item retention
Lagged Copy
Past point-in-time DB copy
Administrator Error
Mailbox Corruption
Long Term
Data Retention
Personal Archive +
Retention Policies
Alternate mailbox for older data
Exchange 2010 HA Fundamentals
Database Availability Group
Server
Database
Database Copy
Active Manager
RPC Client Access
DAG
Database Availability Group (DAG)
Base component of high availability and site
resilience
A group of up to 16 servers that host a set of
replicated databases
“Wraps” a Windows Failover Cluster
Manages membership (DAG member = node)
Provides heartbeat of DAG member servers
Active Manager stores data in cluster database
Defines a boundary for:
Mailbox database replication
Database and server *overs
Active Manager
DAG Requirements
Windows Server 2008 SP2 Enterprise Edition or
Windows Server 2008 R2 Enterprise Edition
Exchange Server 2010 Standard Edition or
Exchange Server 2010 Enterprise Edition
Standard supports up to 5 databases per server
Enterprise supports up to 100 databases per server
At least one network card per DAG member
Active Manager
Exchange component that manages *overs
Runs on every server in the DAG
Selects best available copy on failovers
Is the definitive source of information on where a
database is active
Stores this information in cluster database
Provides this information to other Exchange components
(e.g., RPC Client Access and Hub Transport)
Two Active Manager roles: PAM and SAM
Active Manager client runs on CAS and Hub
Active Manager
Primary Active Manager (PAM)
Runs on the node that owns the cluster group
Gets topology change notifications
Reacts to server failures
Selects the best database copy on *overs
Standby Active Manager (SAM)
Runs on every other node in the DAG
Responds to queries about which server hosts the active
copy of the mailbox database
Both roles are necessary for automatic recovery
If Replication service is stopped, automatic recovery will
not happen
Active Manager
Selection of Active Database Copy
Active Manager selects the “best” copy to become
active when existing active fails
1. Ignores servers that are unreachable or activation is
temporarily or regularly blocked
2. Sorts copies by currency to minimize data loss
3. Breaks ties during sort based on Activation Preference
4. Selects from sorted listed based on copy status of each copy
Active Manager
Selection of Active Database Copy
Active Manager selects the “best” copy to become
active when existing active fails
10
87
65
9
Catalog
Copy status
Crawling
Healthy
Healthy, DisconnectedAndHealthy,
DisconnectedAndResynchronizing, or
SeedingSource
CopyQueueLength < 10
50
ReplayQueueLength
ReplayQueueLength < 50
Automatic Recovery Process
When a failure occurs that affects a database:
Active Manager determines the best copy to activate
The Replication service on the target server attempts to copy missing
log files from the source (ACLL)
If successful, then the database will mount with zero data loss
If unsuccessful (lossy failure), then the database will mount based on the
AutoDatabaseMountDial setting
The mounted database will generate new log files (using the same log
generation sequence)
Transport Dumpster requests will be initiated for the mounted database
to recover lost messages
When original server or database recovers, it will run through
divergence detection and either perform an incremental resync or
require a full reseed
Example: Database Failover
Database failure occurs
Failure item is raised
Active Manager moves active database
Database copy is restored
Similar flow within and across datacenters
DAG
Mailbox
Server 1
Mailbox
Server 2
Mailbox
Server 3
Mailbox
Server 4
Mailbox
Server 5
DB1
DB4
DB2
DB5
DB3
DB2
DB5
DB3
DB1
DB4
DB3
DB1
DB4
DB2
DB5
Example: Server Failover
Server failure occurs
Cluster notification of node down
Active Manager moves active databases
Server is restored
Cluster notification of node up
Database copies resynchronize with active databases
Similar flow within and across datacenters
DAG
Mailbox
Server 1
Mailbox
Server 2
Mailbox
Server 3
Mailbox
Server 4
Mailbox
Server 5
DB1
DB4
DB2
DB5
DB3
DB2
DB5
DB3
DB1
DB4
DB3
DB1
DB4
DB2
DB5
Example: RCA service and AM
Outlook tries
to reconnect
again
Outlook1
Outlook2
Outlook3
CAS Array
RPC Client Access
Server
RPC Client Access
Server
Active
Manager
Client
Active
Manager
Client
If failover is
progress
DBin
Active
failover
Outlook’s
Manager
is
complete
reconnect
Returns
& AM
Where’s
the
AM returns old
server
&
triggers
returns
Mailbox
new
new
Server1
AM
server
request
mounted?
connectDB
fails
MAPI
RPC
Store
Active
Manager
RPC Client Access
Server
MAPI
RPC
Store
Active
Manager
Active
Manager
Client
CAS
Disk
Fails
Fails
MAPI
RPC
Store
DAG
Active
Manager
MAPI
RPC
Store
Active
Manager
DAG Lifecycle
DAG is created initially as empty object in Active
Directory
Continuous replication or 3rd party replication using Third Party Replication
mode
DAG is given a name and one or more IP addresses (or configured to use DHCP)
When first Mailbox server is added to a DAG
A Windows failover cluster is formed with a Node Majority quorum
using the name of the DAG
The server is added to the DAG object in Active Directory
A cluster network object (CNO) for the DAG is created in the built-in
Computers container
The Name and IP address of the DAG is registered in DNS
The cluster database for the DAG is updated with info on configured
databases, including if they are locally active (which they should be)
DAG Lifecycle
When second and subsequent Mailbox server is
added to a DAG
The server is joined to cluster for the DAG
The quorum model is automatically adjusted
Node Majority - DAGs with odd number of members
Node and File Share Majority - DAGs with even number of members
File share witness cluster resource, directory, and share are
automatically created by Exchange when needed
The server is added to the DAG object in Active Directory
The cluster database for the DAG is updated with info on
configured databases, including if they are locally active
(which they should be)
DAG Lifecycle
After servers have been added to a DAG
Configure the DAG
Network Encryption
Network Compression
Configure DAG networks
Network subnets
Enable/disable MAPI traffic/replication
Create mailbox database copies
Seeding is performed automatically
Monitor health and status of database copies
Perform switchovers as needed
DAG Lifecycle
Before you can remove a server from a DAG,
you must first remove all replicated databases
from the server
When a server is removed from a DAG:
The server is evicted from the cluster
The cluster quorum is adjusted as needed
The server is removed from the DAG object in
Active Directory
Before you can remove a DAG, you must first
remove all servers from the DAG
Deploying Exchange 2010 HA
Features
Legacy Deployment Steps (CCR/SCC)
Exchange 2010 Incremental Deployment
1. Prepare hardware, install proper OS,
and update
Extra for SCC: configure storage
2. Build Windows Failover Cluster
Extra for SCC: configure storage
3. Configure cluster quorum, file share
witness, and public and private
networks
4. Run Setup in Custom mode and install
clustered mailbox server
5. Configure clustered mailbox server
Extra for SCC: configure disk
resource dependencies
6. Test *overs
1. Prepare hardware, install proper OS,
and update
2. Run Setup and install Mailbox role
3. Create a DAG and replicate databases
4. Test *overs
Exchange 2010 Incremental
Deployment
Create a DAG
New-DatabaseAvailabilityGroup -Name DAG1 –WitnessServer EXHUB1 WitnessDirectory C:\DAG1FSW -DatabaseAvailablityGroupIpAddresses
10.0.0.8
New-DatabaseAvailabilityGroup -Name DAG2 DatabaseAvailablityGroupIpAddresses 10.0.0.8,192.168.0.8
Add first Mailbox Server to DAG
Add-DatabaseAvailbilityGroupServer -Identity DAG1 -MailboxServer
EXMBX1
Add second and subsequent Mailbox Server
Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer
EXMBX2
Add a Mailbox Database Copy
Add-MailboxDatabaseCopy -Identity MBXDB1 -MailboxServer EXMBX3
Extend as needed
Transition Steps
Verify that you meet requirements for Exchange 2010
Deploy Exchange 2010
Use Exchange 2010 mailbox move features to migrate
Unsupported Transitions
In-place upgrade to Exchange 2010 from any previous
version of Exchange
Using database portability between Exchange 2010 and
non-Exchange 2010 databases
Backup and restore of earlier versions of Exchange
databases on Exchange 2010
Using continuous replication between Exchange 2010 and
Exchange 2007
High Availability Design Example
Branch/Small Office Design
8 processor cores
recommended
with a maximum
of 64GB RAM
UM role not
recommended for
co-location
Client Access
Hub Transport
Mailbox
Client Access
Hub Transport
Mailbox
DB2
Member servers of DAG can
host other server roles
2-server DAGs should
use RAID
High Availability Design Example
Double Resilience – Maintenance + DB Failure
2 servers
outSite
-> manual
Single
activation of server 3
3 Nodes
In 3 server DAG, quorum is lost
3
HA
Copies
DAGs with more servers sustain more
JBOD– greater
-> 3 physical
Copies
failures
resiliency
Mailbox
Server 1
Mailbox
Server 2
Mailbox
Server 3
X
Database Availability Group
High Availability Design Example
Double Node/Disk Failure Resilience
Mailbox
Server 1
Mailbox
Server 2
Mailbox
Server 3
X
Database Availability Group (DAG)
Mailbox
Server 4
High Availability on JBOD
6 Servers, 3 Racks, 3 Copy DAG
24,000 Mailboxes
Heavy Profile: 100
Messages/day
.1 IOPS/Mailbox
2GB Mailbox Size
MAPI network
8 Cores
48 GB RAM
8 Cores
48 GB RAM
Replication network
Mbx Server 1
Mbx Server 2
DB1
DB2
DB3
DB4
DB5
DB6
DB31
DB32
DB33
DB46
DB47
DB48
DB49
DB50
DB51
DB52
DB53
DB54
DB7
DB8
DB9
DB10
DB11
DB12
DB34
DB35
DB36
DB55
DB56
DB57
DB58
DB59
DB60
DB61
DB62
DB63
DB65
DB66
DB67
DB68
DB69
DB70
DB71
DB72
DB74
DB75
DB76
DB77
DB78
DB79
DB80
DB81
DB83
DB84
DB85
DB86
DB87
DB88
DB89
DB90
DB13
DB14
DB15
DB16
DB17
DB18
DB37
DB38
DB39
DB19
DB20
DB21
DB22
DB23
DB24
DB40
DB41
DB42
DB25
DB26
DB27
DB28
DB29
DB30
DB43
DB44
DB45
D
B DB64
1
D DB73
B
1 DB82
Database Availability Group (DAG)
Active copy
Passive copy
Legend
Spare Disk
4,000 Active Mbxs/Svr
6 Servers, 3 Copies =
double server failure
resiliency
4,000 Active Mbxs/Svr
1st failure: ~5,000 active
2nd failure: 6,000 active
Soft active limit: 24
1TB 7.2k SATA disks
JBOD: 48 Disks/node
Online Spares (3)
288 disks total
30 TB of db space
Battery Backed
Caching Array
Controller
Key Takeaways
Greater end-to-end availability with Mailbox
Resiliency
Unified framework for high availability and site
resilience
Faster and easier to deploy with Incremental
Deployment
Reduced TCO with core ESE architecture
changes and more storage options
Supports large mailboxes for less money
Resources
www.microsoft.com/teched
www.microsoft.com/learning
Sessions On-Demand & Community
Microsoft Certification & Training Resources
http://microsoft.com/technet
http://microsoft.com/msdn
Resources for IT Professionals
Resources for Developers
Complete an evaluation
on CommNet and enter to
win an Xbox 360 Elite!
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should
not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.