Module 04 - Exchange 2010 High Availability

Transcript Module 04 - Exchange 2010 High Availability

Exchange 2010 High Availability
Exchange Deployment Planning Services
Exchange 2010 High Availability
Ideal audience for this workshop
 Messaging SME
 Network SME
 Security SME
Exchange 2010 High Availability
During this session focus on the following :
 How will we leverage this functionality in our
organization?
 What availability and service level
requirements do we have around our
messaging solution?
Agenda
• Review of Exchange Server 2007 Availability
Solutions
• Overview of Exchange 2010 High Availability
• Exchange 2010 High Availability Fundamentals
• Exchange 2010 High Availability Deep Dive
• Exchange 2010 Site Resilience
Exchange Server 2007 Single Copy
Clustering
•
•
SCC out-of-box provides little high availability value
− On Store failure, SCC restarts store on the same machine;
no CMS failover
− SCC does not automatically recover from storage failures
− SCC does not protect your data, your most valuable asset
− SCC does not protect against site failures
− SCC redundant network is not leveraged by CMS
Conclusion
− SCC only provides protection from server hardware failures
and bluescreens, the relatively easy components to recover
− Supports rolling upgrades without losing redundancy
Exchange Server 2007
Continuous Replication
2. Inspect logs
Database
Log
Log
E00.log
E0000000012.log
E0000000011.log
1. Copy logs
Local
Cluster
Database
3. Replay logs
Standby
File
Share
Log shipping to a local
disk
Log shipping within a cluster
Log shipping to a standby
server or cluster
Exchange Server 2007 HA Solution (CCR + SCR)
Outlook (MAPI)
client
OWA, ActiveSync, or
Outlook Anywhere
AD site: San Jose
Client Access
Server
CCR #1
Node A
CCR #1
Node B
Windows cluster
Manual
“activation” of
remote mailbox
server
AD site: Dallas
Client Access
Server
DB4
DB5
Standby
Server
Mailbox server
can’t co-exist
with other roles
CCR #2
Node A
DB6
SCR
CCR #2
Node B
Windows cluster
DB1
DB1
DB4
DB4
DB2
DB2
DB5
DB5
DB3
DB3
DB6
DB6
SCR managed
separately; no
GUI
Clustering
knowledge
required
Database failure
requires server
failover
Exchange 2010 High Availability
Goals
•
•
•
•
•
•
Reduce complexity
Reduce cost
Native solution - no single point of failure
Improve recovery times
Support larger mailboxes
Support large scale deployments
Make High Availability Exchange
deployments mainstream!
Exchange 2010 High Availability Architecture
AD site: Dallas
Client
All clients connect
via CAS servers
Client Access
Server
DB3
Mailbox
Server 6
AD site: San Jose
DB1
DB5
Easy to
extend across
sites
Client Access
Server (CAS)
Mailbox
Server 1
Mailbox
Server 2
Mailbox
Server 3
Mailbox
Server 4
Mailbox
Server 5
DB1
DB4
DB2
DB5
DB3
DB2
DB5
DB3
DB1
DB4
DB3
DB1
DB4
DB2
DB5
Failover
managed within
Exchange
Database
(DB) centric
failover
Exchange 2010 High Availability
Fundamentals
•
•
•
•
•
•
Database Availability
Group
Server
Database
Database Copy
Active Manager
RPC Client Access
service
DAG
Exchange 2010 High Availability
Fundamentals
Database Availability Group
• A group of up to 16 servers hosting a set of replicated
•
•
•
•
databases
Wraps a Windows Failover Cluster
− Manages servers’ membership in the group
− Heartbeats servers, quorum, cluster database
Defines the boundary of database replication
Defines the boundary of failover/switchover (*over)
Defines boundary for DAG’s Active Manager
Mailbox
Server 1
Mailbox
Server 2
Mailbox
Server 3
Mailbox
Server 4
Mailbox
Server 16
Exchange 2010 High Availability
Fundamentals
Server
•
•
•
•
Unit of membership for a DAG
Hosts the active and passive copies of multiple mailbox databases
Executes Information Store, CI, Assistants, etc., services on active
mailbox database copies
Executes replication services on passive mailbox database copies
Mailbox
Server 1
Mailbox
Server 2
Mailbox
Server 3
DB1
DB4
DB3
DB2
DB1
DB4
DB3
DB2
Exchange 2010 High Availability
Fundamentals
Server (Continued)
•
•
Provides connection point between Information Store and RPC Client Access
Very few server-level properties relevant to HA
− Server’s Database Availability Group
− Server’s Activation Policy
RCA
Mailbox
Server 1
Mailbox
Server 2
Mailbox
Server 3
DB1
DB4
DB3
DB2
DB1
DB4
DB3
DB2
Exchange 2010 High Availability
Fundamentals
Mailbox Database
•
•
•
Unit of *over
A database has 1 active copy – active copy can be
mounted or dismounted
Maximum # of passive copies == # servers in DAG – 1
Mailbox
Server 1
Mailbox
Server 2
Mailbox
Server 3
DB1
DB4
DB3
DB2
DB1
DB4
DB3
DB2
DB1
Exchange 2010 High Availability
Fundamentals
Mailbox Database (Continued)
− ~30 seconds database *overs
− Server failover/switchover involves moving all
active databases to one or more other servers
− Database names are unique across a forest
− Defines properties relevant at the database level
− GUID: a Database’s unique ID
− EdbFilePath: path at which copies are located
− Servers: list of servers hosting copies
Exchange 2010 High Availability
Fundamentals
Active/Passive vs. Source/Target
•
•
Availability Terms
− Active: Selected to provide email
services to clients
− Passive: Available to provide email
services to clients if active fails
Replication Terms
− Source: Provides data for copying to
a separate location
− Target: Receives data from the
source
Exchange 2010 High Availability
Fundamentals
Mailbox Database Copy
−
−
−
−
−
Scope of replication
A copy is either source or target of replication at any given time
A copy is either active or passive at any given time
Only 1 copy of each database in a DAG is active at a time
A server may not host >1 copy of a any database
Mailbox
Server 1
DB1
Mailbox
Server 2
X
DB1
DB2
DB2
DB1
DB3
DB3
Exchange 2010 High Availability
Fundamentals
Mailbox Database Copy
Defines properties applicable to an individual database copy
− Copy status: Healthy, Initializing, Failed, Mounted, Dismounted, Disconnected,
Suspended, FailedandSuspended, Resynchronizing, Seeding
ActiveCopy
− CopyQueueLength
ActivationSuspended
− ReplayQueueLength
Exchange 2010 High Availability
Fundamentals
Active Manager
• Exchange-aware resource manager (high availability’s
brain)
− Runs on every server in the DAG
− Manages which copies should be active and which should
be passive
− Definitive source of information on where a database is
active or mounted
−
−
Provides this information to other Exchange components (e.g.,
RPC Client Access and Hub Transport)
Information stored in cluster database
Exchange 2010 High Availability
Fundamentals
Active Manager
• Active Directory is still primary source for configuration info
• Active Manager is primary source for changeable state
•
information (such as active and mounted)
Replication service monitors health of all mounted
databases, and monitors ESE for I/O errors or failure
Exchange 2010 High Availability
Fundamentals
Continuous Replication
•
Continuous replication has the following basic steps:
−
−
−
−
Database copy seeding of target
Log copying from source to target
Log inspection at target
Log replay into database copy
Exchange 2010 High Availability
Fundamentals
Database Seeding
•
There are several ways to seed the target instance:
− Automatic Seeding
− Update-MailboxDatabaseCopy cmdlet
−
Can be performed from active or passive copies
− Manually copy the database
− Backup and restore (VSS)
Exchange 2010 High Availability
Fundamentals
Log Shipping
•
•
•
•
Log shipping in Exchange 2010 leverages TCP sockets
− Supports encryption and compression
− Administrator can set TCP port to be used
Replication service on target notifies the active instance
the next log file it expects
− Based on last log file which it inspected
Replication service on source responds by sending the
required log file(s)
Copied log files are placed in the target’s Inspector
directory
Exchange 2010 High Availability
Fundamentals
Log Inspection
•
•
•
The following actions are performed to verify the
log file before replay:
− Physical integrity inspection
− Header inspection
− Move any Exx.log files to ExxOutofDate folder that exist
on target if it was previously a source
If inspection fails, the file will be recopied and
inspected (up to 3 times)
If the log file passes inspection it is moved into the
database copy’s log directory
Exchange 2010 High Availability
Fundamentals
Log Replay
• Log replay has moved to Information Store
• The following validation tests are performed prior to log
replay:
•
− Recalculate the required log generations by inspecting the
database header
− Determine the highest generation that is present in the log
directory to ensure that a log file exists
− Compare the highest log generation that is present in the directory
to the highest log file that is required
− Make sure the logs form the correct sequence
− Query the checkpoint file, if one exists
Replay the log file using a special recovery mode (undo
phase is skipped)
Exchange 2010 High Availability
Fundamentals
Lossy Failure Process
• In the event of failure, the following steps will occur for the
failed database:
− Active Manager will determine the best copy to activate
− The Replication service on the target server will attempt to copy
missing log files from the source - ACLL
−
−
If successful, then the database will mount with zero data loss
If unsuccessful (lossy failure), then the database will mount based on
the AutoDatabaseMountDial setting
− The mounted database will generate new log files (using the same
log generation sequence)
− Transport Dumpster requests will be initiated for the mounted
database to recover lost messages
− When original server or database recovers, it will run through
divergence detection and perform an incremental reseed or require
a full reseed
Exchange 2010 High Availability
Fundamentals
Backups
•
•
Streaming backup APIs for public use have been cut, must use VSS for
backups
− Backup from any copy of the database/logs
− Always choose Passive (or Active) copy
− Backup an entire server
− Designate a dedicated backup server for a given database
Restore from any of these backups scenarios
Database Availability Group
Mailbox
Server 1
Mailbox
Server 2
Mailbox
Server 3
DB1
DB1
DB1
DB2
DB2
DB2
DB3
DB3
DB3
VSS requestor
Multiple Database Copies Enable
New Scenarios
•
Site/server/disk failure
Archiving/compliance
Recover deleted items
•
•
Exchange 2010 HA
E-mail archive
Extended/protected dumpster
retention
Database Availability Group
Mailbox
Server 1
Mailbox
Server 2
Mailbox
Server 3
7-14 day lag copy
DB1
DB1
DB1
DB2
DB2
DB2
DB3
DB3
DB3
X
Mailbox Database Copies
•
•
Create up to 16 copies of each mailbox
database
Each mailbox database must have a unique
name within Organization
− Mailbox database objects are global
configuration objects
− All mailbox database copies use the same
GUID
− No longer connected to specific Mailbox
servers
Mailbox Database Copies
•
•
Each DAG member can host only one copy
of a given mailbox database
− Database path and log folder path for copy
must be identical on all members
Copies have settable properties
− Activation Preference
− RTM: Used as second sort key during best copy
selection
− SP1: Used for distributing active databases; used as
primary sorting key when using Lossless mount dial
− Replay Lag and Truncation Lag
− Using these features affects your storage design
Lagged Database Copies
•
•
•
•
•
A lagged copy is a passive database copy with a replay
lag time greater than 0
Lagged copies are only for point-in-time protection, but
they are not a replacement for point-in-time backups
− Logical corruption and/or mailbox deletion prevention scenarios
− Provide a maximum of 14 days protection
When should you deploy a lagged copy?
− Useful only to mitigate a risk
− May not be needed if deploying a backup solution (e.g., DPM
2010)
Lagged copies are not HA database copies
− Lagged copies should never be automatically activated by system
− Steps for manual activation documented at
http://technet.microsoft.com/en-us/library/dd979786.aspx
Lagged copies affect your storage design
DAG Design
Two Failure Models
• Design for all database copies activated
•
− Design for the worst case - server architecture handles
100 percent of all hosted database copies becoming
active
Design for targeted failure scenarios
− Design server architecture to handle the active mailbox
load during the worst failure case you plan to handle
−
−
1 member failure requires 2 or more HA copies and 2 or more
servers
2 member failure requires 3 or more HA copies and 4 or more
servers
− Requires Set-MailboxServer <Server> MaximumActiveDatabases <Number>
DAG Design
It’s all in the layout
•
Consider this scenario
− 8 servers, 40 databases with 2 copies
Server 1
Server 2
Server 3
Server 4
Server 5
Server 6
Server 7
Server 8
DB1
DB6
DB11
DB16
DB21
DB26
DB31
DB36
DB2
DB7
DB12
DB17
DB22
DB27
DB32
DB37
DB3
DB8
DB13
DB18
DB23
DB28
DB33
DB38
DB4
DB9
DB14
DB19
DB24
DB29
DB34
DB39
DB5
DB10
DB15
DB20
DB25
DB30
DB35
DB40
DB36’
DB31’
DB26’
DB21’
DB16’
DB11’
DB6’
DB1’
DB37’
DB32’
DB27’
DB22’
DB17’
DB12’
DB7’
DB2’
DB38’
DB33’
DB28’
DB23’
DB18’
DB13’
DB8’
DB3’
DB39’
DB34’
DB29’
DB24’
DB19’
DB14’
DB9’
DB4’
DB40’
DB35’
DB30’
DB25’
DB20’
DB15’
DB10’
DB5’
DAG Design
It’s all in the layout
•
If I have a single server failure
− Life is good
Server 1
Server 2
Server 3
Server 4
Server 5
Server 6
Server 7
Server 8
DB1
DB6
DB11
DB16
DB21
DB26
DB31
DB36
DB2
DB7
DB12
DB17
DB22
DB27
DB32
DB37
DB3
DB8
DB13
DB18
DB23
DB28
DB33
DB38
DB4
DB9
DB14
DB19
DB24
DB29
DB34
DB39
DB5
DB10
DB15
DB20
DB25
DB30
DB35
DB40
DB36’
DB31’
DB26’
DB21’
DB16’
DB11’
DB6’
DB1’
DB37’
DB32’
DB27’
DB22’
DB17’
DB12’
DB7’
DB2’
DB38’
DB33’
DB28’
DB23’
DB18’
DB13’
DB8’
DB3’
DB39’
DB34’
DB29’
DB24’
DB19’
DB14’
DB9’
DB4’
DB40’
DB35’
DB30’
DB25’
DB20’
DB15’
DB10’
DB5’
DAG Design
It’s all in the layout
•
If I have a double server failure
− Life could be good…
Server 1
Server 2
Server 3
Server 4
Server 5
Server 6
Server 7
Server 8
DB1
DB6
DB11
DB16
DB21
DB26
DB31
DB36
DB2
DB7
DB12
DB17
DB22
DB27
DB32
DB37
DB3
DB8
DB13
DB18
DB23
DB28
DB33
DB38
DB4
DB9
DB14
DB19
DB24
DB29
DB34
DB39
DB5
DB10
DB15
DB20
DB25
DB30
DB35
DB40
DB36’
DB31’
DB26’
DB21’
DB16’
DB11’
DB6’
DB1’
DB37’
DB32’
DB27’
DB22’
DB17’
DB12’
DB7’
DB2’
DB38’
DB33’
DB28’
DB23’
DB18’
DB13’
DB8’
DB3’
DB39’
DB34’
DB29’
DB24’
DB19’
DB14’
DB9’
DB4’
DB40’
DB35’
DB30’
DB25’
DB20’
DB15’
DB10’
DB5’
DAG Design
It’s all in the layout
•
If I have a double server failure
− Life could be bad…
Server 1
Server 2
Server 3
Server 4
Server 5
Server 6
Server 7
Server 8
DB1
DB6
DB11
DB16
DB21
DB26
DB31
DB36
DB2
DB7
DB12
DB17
DB22
DB27
DB32
DB37
DB3
DB8
DB13
DB18
DB23
DB28
DB33
DB38
DB4
DB9
DB14
DB19
DB24
DB29
DB34
DB39
DB5
DB10
DB15
DB20
DB25
DB30
DB35
DB40
DB36’
DB31’
DB26’
DB21’
DB16’
DB11’
DB6’
DB1’
DB37’
DB32’
DB27’
DB22’
DB17’
DB12’
DB7’
DB2’
DB38’
DB33’
DB28’
DB23’
DB18’
DB13’
DB8’
DB3’
DB39’
DB34’
DB29’
DB24’
DB19’
DB14’
DB9’
DB4’
DB40’
DB35’
DB30’
DB25’
DB20’
DB15’
DB10’
DB5’
DAG Design
It’s all in the layout
•
Now let’s consider this scenario
− 4 servers, 12 databases with 3 copies
DB1
DB4’’
DB7’
Server 1
DB2
DB3
DB5’’
DB6’
DB9’’
DB10’
DB4
DB1’
DB8’
Server 2
DB5
DB6
DB3’’
DB7’’
DB11’ DB12’’
Server 3
DB7
DB8
DB9
DB2’’
DB3’
DB4’
DB10’’ DB11’’ DB12’
DB10
DB1’’
DB6’’
Server 4
DB11 DB12
DB2’
DB5’
DB8’’
DB9’
DB10
DB1’’
DB6’’
Server 4
DB11 DB12
DB2’
DB5’
DB8’’ DB9’
DB10
DB1’’
DB6’’
Server 4
DB11 DB12
DB2’
DB5’
DB8’’ DB9’
− With a single server failure:
DB1
DB4’’
DB7’
Server 1
DB2
DB3
DB5’’
DB6’
DB9’’
DB10’
DB4
DB1’
DB8’
Server 2
DB5
DB6
DB3’’
DB7’’
DB11’ DB12’’
DB7
DB2’’
DB10’’
Server 3
DB8
DB9
DB3’
DB4’
DB11’’ DB12’
− With a double server failure:
DB1
DB4’’
DB7’
Server 1
DB2
DB3
DB5’’
DB6’
DB9’’ DB10’
DB4
DB1’
DB8’
Server 2
DB5
DB6
DB7
DB3’’
DB7’’
DB2’’
DB11’ DB12’’ DB10’’
Server 3
DB8
DB9
DB3’
DB4’
DB11’’ DB12’
Deep Dive on
Exchange 2010
High Availability
Basics
Quorum
Witness
DAG Lifecycle
DAG Networks
Quorum
Quorum
•
•
•
•
Used to ensure that only one subset of members is
functioning at one time
A majority of members must be active and have
communications with each other
Represents a shared view of members (voters and some
resources)
Dual Usage
− Data shared between the voters representing configuration, etc.
− Number of voters required for the solution to stay running
(majority); quorum is a consensus of voters
−
−
When a majority of voters can communicate with each other, the
cluster has quorum
When a majority of voters cannot communicate with each other, the
cluster does not have quorum
Quorum
•
•
•
Quorum is not only necessary for cluster functions, but it is
also necessary for DAG functions
− In order for a DAG member to mount and activate databases, it
must participate in quorum
Exchange 2010 uses only two of the four available cluster
quorum models
− Node Majority (DAGs with an odd number of members)
− Node and File Share Majority (DAGs with an even number of
members)
Quorum = (N/2) + 1 (whole numbers only)
−
−
−
−
6 members: (6/2) + 1 = 4 votes for quorum (can lose 3 voters)
9 members: (9/2) + 1 = 5 votes for quorum (can lose 4 voters)
13 members: (13/2) + 1 = 7 votes for quorum (can lose 6 voters)
15 members: (15/2) + 1 = 8 votes for quorum (can lose 7 voters)
Witness and Witness
Server
Witness
•
•
A witness is a share on a server that is
external to the DAG that participates in
quorum by providing a weighted vote for the
DAG member that has a lock on the
witness.log file
− Used only by DAGs that have an even number
of members
Witness server does not maintain a full
copy of quorum data and is not a member
of the DAG or cluster
Witness
•
Represented by File Share Witness
resource
− File share witness cluster resource, directory, and
share automatically created and removed as needed
− Uses Cluster IsAlive check for availability
− If witness is not available, cluster core resources are
failed and moved to another DAG member
− If other DAG member does not bring witness resource
online, the resource will remain in a Failed state, with
restart attempts every 60 minutes
−
See http://support.microsoft.com/kb/978790 for details on this
behavior
Witness
•
If in a Failed state and needed for quorum,
cluster will try to online File Share Witness
resource once
− If witness cannot be restarted, it is considered
failed and quorum is lost
− If witness can be restarted, it is considered
successful and quorum is maintained
•
− An SMB lock is placed on witness.log
− Node PAXOS information is incremented and the
updated PAXOS tag is written to witness.log
If in an Offline state and needed for
quorum, cluster will not try to restart –
quorum lost
Witness
•
•
When witness is no longer needed to
maintain quorum, lock on witness.log is
released
Any member that locks the witness, retains
the weighted vote (“locking node”)
− Members in contact with locking node are in
majority and maintain quorum
− Members not in contact with locking node are in
minority and lose quorum
Witness Server
•
•
•
No pre-configuration typically necessary
− Exchange Trusted Subsystem must be member
of local Administrators group on Witness Server
if Witness Server is not running Exchange 2010
Cannot be a member of the DAG (present
or future)
Must be in the same Active Directory forest
as DAG
Witness Server
•
•
•
•
Can be Windows Server 2003 or later
− File and Printer Sharing for Microsoft Networks must be
enabled
Replicating witness directory/share with DFS not
supported
Not necessary to cluster Witness Server
− If you do cluster witness server, you must use Windows
2008
Single witness server can be used for multiple
DAGs
− Each DAG requires its own unique Witness
Directory/Share
Database Availability
Group Lifecycle
Database Availability Group Lifecycle
• Create a DAG
New-DatabaseAvailabilityGroup -Name DAG1 –WitnessServer
EXHUB1 -WitnessDirectory C:\DAG1FSW
-DatabaseAvailabilityGroupIpAddresses 10.0.0.8
•
New-DatabaseAvailabilityGroup -Name DAG2
-DatabaseAvailabilityGroupIpAddresses 10.0.0.8,192.168.0.8
Add Mailbox Servers to DAG
Add-DatabaseAvailabilityGroupServer -Identity DAG1
-MailboxServer EXMBX1
•
Add-DatabaseAvailabilityGroupServer -Identity DAG1
-MailboxServer EXMBX2
Add a Mailbox Database Copy
Add-MailboxDatabaseCopy -Identity DB1 -MailboxServer
EXMBX2
Database Availability Group
Lifecycle
•
DAG is created initially as empty object in Active Directory
− Continuous replication or 3rd party replication using
Third Party Replication mode
− Once changed to Third Party Replication mode, the DAG
cannot be changed back
− DAG is given a unique name and configured for IP
addresses (or configured to use DHCP)
Database Availability Group
Lifecycle
•
When the first Mailbox server is added to a DAG
− A failover cluster is formed with the name of DAG using
Node Majority quorum
− The server is added to the DAG object in Active
Directory
− A cluster name object (CNO) for the DAG is created in
default Computers container using the security context
of the Replication service
− The Name and IP address of the DAG is registered in
DNS
− The cluster database for the DAG is updated with info
about local databases
Database Availability Group
Lifecycle
•
When second and subsequent Mailbox server is added to
a DAG
− The server is joined to cluster for the DAG
− The quorum model is automatically adjusted
− The server is added to the DAG object in Active
Directory
− The cluster database for the DAG is updated with info
about local databases
Database Availability Group
Lifecycle
•
After servers have been added to a DAG
− Configure the DAG
− Network encryption
− Network compression
− Replication port
− Configure DAG networks
− Network subnets
− Collapse DAG networks in single network with
multiple subnets
− Enable/disable MAPI traffic/replication
− Block network heartbeat cross-talk
(Server1\MAPI !<-> Server2\Repl)
Database Availability Group
Lifecycle
•
After servers have been added to a DAG
− Configure DAG member properties
− Automatic database mount dial
− BestAvailability, GoodAvailability, Lossless,
custom value
− Database copy automatic activation policy
− Blocked, IntrasiteOnly, Unrestricted
− Maximum active databases
− Create mailbox database copies
− Seeding is performed automatically, but you have
options
− Monitor health and status of database copies
and perform switchovers as needed
Database Availability Group
Lifecycle
•
•
•
Before you can remove a server from a
DAG, you must first remove all replicated
databases from the server
When a server is removed from a DAG:
− The server is evicted from the cluster
− The cluster quorum is adjusted
− The server is removed from the DAG object in
Active Directory
Before you can remove a DAG, you must
first remove all servers from the DAG
DAG Networks
DAG Networks
•
•
A DAG network is a collection of subnets
All DAGs must have:
− Exactly one MAPI network
− MAPI network connects DAG members to network
resources (Active Directory, other Exchange servers,
etc.)
− Zero or more Replication networks
− Separate network on separate subnet(s)
− Used for/by continuous replication only
− LRU determines which replication network to use
when multiple replication networks are configured
DAG Networks
•
Initially created DAG networks based on
enumeration of cluster networks
− Cluster enumeration based on subnet
− One cluster network is created for each subnet
DAG Networks
Server / Network
IP Address / Subnet Bits
Default Gateway
EX1 – MAPI
192.168.0.15/24
192.168.0.1
EX1 – REPLICATION 10.0.0.15/24
N /A
EX2 – MAPI
192.168.0.1
192.168.0.16/24
EX2 – REPLICATION 10.0.0.16/24
N /A
Name
Subnet(s)
Interface(s)
MAPI Access
Enabled
Replication Enabled
DAGNetwork01
192.168.0.0/24
EX1 (192.168.0.15)
EX2 (192.168.0.16)
True
True
DAGNetwork02
10.0.0.0/24
EX1 (10.0.0.15)
EX2 (10.0.0.16)
False
True
DAG Networks
Server / Network
IP Address / Subnet Bits
Default Gateway
EX1 – MAPI
192.168.0.15/24
192.168.0.1
EX1 – REPLICATION
10.0.0.15/24
N/A
EX2 – MAPI
192.168.1.15/24
192.168.1.1
EX2 – REPLICATION
10.0.1.15/24
N/A
Name
Subnet(s)
Interface(s)
MAPI Access
Enabled
Replication Enabled
DAGNetwork01
192.168.0.0/24
EX1 (192.168.0.15)
True
True
DAGNetwork02
10.0.0.0/24
EX1 (10.0.0.15)
False
True
DAGNetwork03
192.168.1.0/24
EX2 (192.168.1.15)
True
True
DAGNetwork04
10.0.1.0/24
EX2 (10.0.1.15)
False
True
DAG Networks
•
To collapse subnets into two DAG networks and
disable replication for the MAPI network:
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 Subnets 10.0.0.0,10.0.1.0
Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03
Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
Name
Subnet(s)
Interface(s)
MAPI Access
Enabled
Replication Enabled
DAGNetwork01
192.168.0.0/24
EX1 (192.168.0.15)
True
True
DAGNetwork02
10.0.0.0/24
EX1 (10.0.0.15)
False
True
DAGNetwork03
192.168.1.0/24
EX2 (192.168.1.15)
True
True
DAGNetwork04
10.0.1.0/24
EX2 (10.0.1.15)
False
True
DAG Networks
•
To collapse subnets into two DAG networks and
disable replication for the MAPI network:
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 Subnets 10.0.0.0,10.0.1.0
Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03
Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
Name
Subnet(s)
Interface(s)
MAPI Access
Enabled
Replication Enabled
DAGNetwork01
192.168.0.0/24
192.168.1.0/24
EX1 (192.168.0.15)
EX2 (192.168.1.15)
True
False
DAGNetwork02
10.0.0.0/24
10.0.1.0/24
EX1 (10.0.0.15)
EX2 (10.0.1.15)
False
True
DAG Networks
•
Automatic network detection occurs only when members
added to DAG
− If networks are added after member is added, you must perform
discovery
•
•
•
Set-DatabaseAvailabilityGroup -DiscoverNetworks
DAG network configuration persisted in cluster registry
− HKLM\Cluster\Exchange\DAG Network
DAG networks include built-in encryption and compression
− Encryption: Kerberos SSP EncryptMessage/DecryptMessage APIs
− Compression: Microsoft XPRESS, based on LZ77 algorithm
DAGs use a single TCP port for replication and seeding
− Default is TCP port 64327
− If you change the port and you use Windows Firewall, you must
manually change firewall rules
Deeper Dive on
Exchange 2010
High Availability
Advanced Features
Active Manager
Best Copy Selection
Datacenter Activation Coordination Mode
Active Manager
Active Manager
•
Exchange component that manages *overs
− Runs on every server in the DAG
− Selects best available copy on failovers
− Is the definitive source of information on where
a database is active
− Stores this information in cluster database
− Provides this information to other Exchange
components (e.g., RPC Client Access and Hub
Transport)
Active Manager
•
•
Active Manager roles
− Standalone Active Manager
− Primary Active Manager (PAM)
− Standby Active Manager (SAM)
Active Manager client runs on CAS and
Hub
Active Manager
•
Transition of role state logged into
Microsoft-ExchangeHighAvailability/Operational event log
(Crimson Channel)
Active Manager
•
Primary Active Manager (PAM)
− Runs on the node that owns the cluster core
resources (cluster group)
− Gets topology change notifications
− Reacts to server failures
− Selects the best database copy on *overs
− Detects failures of local Information Store and
local databases
Active Manager
•
Standby Active Manager (SAM)
− Runs on every other node in the DAG
− Detects failures of local Information Store and
local databases
− Reacts to failures by asking PAM to initiate a failover
•
− Responds to queries from CAS/Hub about
which server hosts the active copy
Both roles are necessary for automatic
recovery
− If the Replication service is stopped, automatic
recovery will not happen
Best Copy Selection
Best Copy Selection
•
•
Process of finding the best copy to activate
for an individual database given a list of
status results of potential copies for
activation
Active Manager selects the “best” copy to
become the new active copy when the
existing active copy fails
Best Copy Selection – RTM
•
•
•
Sorts copies by copy queue length to
minimize data loss, using activation
preference as a secondary sorting key if
necessary
Selects from sorted listed based on which
set of criteria met by each copy
Attempt Copy Last Logs (ACLL) runs and
attempts to copy missing log files from
previous active copy
Best Copy Selection – SP1
•
•
•
Sorts copies by activation preference when
auto database mount dial is set to Lossless
− Otherwise, sorts copies based on copy queue
length, with activation preference used a
secondary sorting key if necessary
Selects from sorted listed based on which
set of criteria met by each copy
Attempt Copy Last Logs (ACLL) runs and
attempts to copy missing log files from
previous active copy
Best Copy Selection
•
•
Is database mountable? Is copy queue
length <= AutoDatabaseMountDial?
− If Yes, database is marked as current active
and mount request is issued
− If not, next best database tried (if one is
available)
During best copy selection, any servers that
are unreachable or “activation blocked” are
ignored
Best Copy Selection
Criteria
Copy Queue
Length
Replay Queue
Length
Content Index
Status
1
< 10 logs
< 50 logs
Healthy
2
< 10 logs
< 50 logs
Crawling
3
N /A
< 50 logs
Healthy
4
N /A
< 50 logs
Crawling
5
N /A
< 50 logs
N /A
6
< 10 logs
N /A
Healthy
7
< 10 logs
N /A
Crawling
8
N /A
N /A
Healthy
9
N /A
N /A
Crawling
10
Any database copy with a status of Healthy,
DisconnectedAndHealthy, DisconnectedAndResynchronizing, or
SeedingSource
Best Copy Selection – RTM
•
•
Four copies of DB1
DB1 currently active on Server1
Server1
X
DB
1
Database
Copy
Server2
Server3
Server4
DB
1
DB
1
DB
1
Activation
Preference
Copy
Queue
Length
Replay Queue
Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
Healthy
Best Copy Selection – RTM
•
Sort list of available copies based by Copy
Queue Length (using Activation Preference
as secondary sort key if necessary):
− Server3\DB1
− Server2\DB1
− Server4\DB1
Database
Copy
Activation
Preference
Copy
Queue
Length
Replay Queue
Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
Healthy
Best Copy Selection – RTM
•
Only two copies meet first set of criteria for
activation (CQL< 10; RQL< 50;
CI=Healthy):
− Server3\DB1
− Server2\DB1
− Server4\DB1
Database
Copy
Lowest copy queue length – tried first
Activation
Preference
Copy
Queue
Length
Replay Queue
Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
Healthy
Best Copy Selection – SP1
•
•
•
Four copies of DB1
DB1 currently active on Server1
Auto database mount
dial set to Lossless
Server1
X
DB
1
Database
Copy
Server2
Server3
Server4
DB
1
DB
1
DB
1
Activation
Preference
Copy
Queue
Length
Replay Queue
Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
Healthy
Best Copy Selection – SP1
•
Sort list of available copies based by
Activation Preference:
− Server2\DB1
− Server3\DB1
− Server4\DB1
Database
Copy
Activation
Preference
Copy
Queue
Length
Replay Queue
Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
Healthy
Best Copy Selection – SP1
•
Sort list of available copies based by
Activation Preference:
− Server2\DB1
− Server3\DB1
− Server4\DB1
Database
Copy
Lowest preference value – tried first
Activation
Preference
Copy
Queue
Length
Replay Queue
Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
Healthy
Best Copy Selection
•
After Active Manager determines the best
copy to activate
− The Replication service on the target server
attempts to copy missing log files from the
source (ACLL)
− If successful, then the database will mount with zero
data loss
− If unsuccessful (lossy failure), then the database will
mount based on the AutoDatabaseMountDial setting
− If data loss is outside of dial setting, next copy will
be tried
Best Copy Selection
•
After Active Manager determines the best
copy to activate
− The mounted database will generate new log
files (using the same log generation sequence)
− Transport Dumpster requests will be initiated
for the mounted database to recover lost
messages
− When original server or database recovers, it
will run through divergence detection and either
perform an incremental resync or require a full
reseed
Datacenter Activation
Coordination Mode
Datacenter Activation Coordination
Mode
•
•
DAC mode is a property of a DAG
Acts as an application-level form of quorum
− Designed to prevent multiple copies of same
database mounting on different members due
to loss of network
Datacenter Activation Coordination
Mode
•
RTM: DAC Mode is only for DAGs with three or
more members that are extended to two Active
Directory sites
− Don’t enable for two-member DAGs where each
member is in different AD site or DAGs where all
members are in the same AD site
− DAC Mode also enables use of Site Resilience tasks
•
−
−
−
Stop-DatabaseAvailabilityGroup
Restore-DatabaseAvailabilityGroup
Start-DatabaseAvailabilityGroup
SP1: DAC Mode can be enabled for all DAGs
Datacenter Activation Coordination
Mode
•
Uses Datacenter Activation Coordination
Protocol (DACP), which is a bit in memory
set to either:
− 0 = can’t mount
− 1 = can mount
Datacenter Activation Coordination
Mode
•
Active Manager startup sequence
− DACP is set to 0
− DAG member communicates with other DAG
members it can reach to determine the current
value for their DACP bits
− If the starting DAG member can communicate with
all other members, DACP bit switches to 1
− If other DACP bits are set to 0, starting DAG
member DACP bit remains at 0
− If another DACP bit is set to 1, starting DAG
member DACP bit switches to 1
Improvements in Service
Pack 1
Replication and Copy Management
enhancements in SP1
Improvements in Service Pack 1
•
•
•
Continuous replication changes
− Enhanced to reduce data loss
− Eliminates log drive as single point of failure
Automatically switches between modes:
− File mode (original, log file shipping)
− Block mode (enhanced log block shipping)
Switching process:
− Initial mode is file mode
− Block mode triggered when target needs Exx.log file (e.g., copy
queue length = 0)
− All healthy passives processed in parallel
− File mode triggered when block mode falls too far behind (e.g.,
copy queue length > 0)
Improvements in Service Pack 1
ESE Log Buffer
Database
Log
is built
copy
up
Log
fragment
toand
date
detected
and
inspected
converted
to
Log
File 6
Log
File 3
Log
Log
File
File 44
Log
File 1
Log
File 2
Log
File 5
Replication Log Buffer
Send me the latest log
files … I have log 2
Log
File 6
Log
File 7
Log
File 1
Log
File 2
complete log
Continuous Replication
Replication – Block
Continuous
– FileMode
Mode
Improvements in Service Pack 1
•
•
SP1 introduces RedistributeActiveDatabases.ps1
script (keep database copies balanced across
DAG members)
− Moves databases to the most preferred copy
− If cross-site, tries to balance between sites
Targetless admin switchover altered for stronger
activation preference affinity
− First pass of best copy selection sorted by activation
preference; not copy queue length
− This basically trades off even distribution of copies for a
longer activation time. So you might pick a copy with
more logs to play, but it will provide you with better
distribution of databases
Improvements in Service Pack 1
•
*over Performance Improvements
− In RTM, a *over immediately terminated replay
on copy that was becoming active, and mount
operation did necessary log recovery
− In SP1, a *over drives database to clean
shutdown by playing all logs on passive copy,
and no recovery required on new active
Improvements in Service Pack 1
•
DAG Maintenance Scripts
− StartDAGServerMaintenance.ps1
− It runs Suspend-MailboxDatabaseCopy for each
database copy hosted on the DAG member
− It pauses the node in the cluster, which prevents it
from being and becoming the PAM
− It sets the DatabaseCopyAutoActivationPolicy
parameter on the DAG member to Blocked
− It moves all active databases currently hosted on the
DAG member to other DAG members
− If the DAG member currently owns the default
cluster group, it moves the default cluster group (and
therefore the PAM role) to another DAG member
Improvements in Service Pack 1
•
DAG Maintenance Scripts
− StopDAGServerMaintenance.ps1
− It run Resume-MailboxDatabaseCopy for each
database copy hosted on the DAG member
− It resumes the node in the cluster, which it enables
full cluster functionality for the DAG member
− It sets the DatabaseCopyAutoActivationPolicy
parameter on the DAG member to Unrestricted
Improvements in Service Pack 1
•
CollectOverMetrics.ps1 and
CollectReplicationMetrics.ps1 rewritten
Improvements in Service Pack 1
•
Exchange Management Console
enhancements in SP1
− Manage DAG IP addresses
− Manage witness server/directory and alternate
witness server/directory
Switchovers and
Failovers (*overs)
Exchange 2010 *Overs
•
•
•
Within a datacenter
− Database *over
− Server *over
Between datacenters
− Single database *over
− Server *over
Datacenter switchover
Single Database Cross-Datacenter
*Over
− Database mounted in another datacenter and another
Active Directory site
− Serviced by “new” Hub Transport servers
− “Different OwningServer” – for routing
− Transport dumpster re-delivery now from both Active
Directory sites
− Serviced by “new” CAS
− “Different CAS URL” – for protocol access
− Outlook Web App now re-directs connection to second
CAS farm
− Other protocols proxy or redirect (varies)
Datacenter Switchover
−
−
−
−
−
Customers can evolve to site resilience
Standalone  local redundancy  site resilience
Consider name space design at first deployment
Keep extending the DAG!
Monitoring and many other concepts/skills just reapplied
− Normal administration remains unchanged
− Disaster recovery not HA event
Site Resilience
Agenda
•
Understand the steps required to build and
activate a standby site for Exchange 2010
−
−
−
−
−
Site Resilience Overview
Site Resilience Models
Planning and Design
Site activation steps
Client Behavior
Site Resilience Drivers
•
Business requirements drive site resilience
− When a risk assessment reveals a high-impact
threat to meeting SLAs for data loss and loss of
availability
− Site resilience required to mitigate the risk
− Business requirements dictate low recovery
point objective (RPO) and recovery time
objective (RTO)
Site Resilience Overview
•
•
Ensuring business continuity brings
expense and complexity
− A site switchover is a coordinated effort with
many stakeholders that requires practice to
ensure the real event is handled well
Exchange 2010 reduces cost and
complexity
− Low-impact testing can be performed with
cross-site single database switchover
Exchange 2007
Site resilience choices
•
•
•
•
•
•
CCR+SCR and /recoverCMS
SCC+SCR and /recoverCMS
CCR stretched across datacenters
SCR and database portability
SCR and /m:RecoverServer
SCC stretched across datacenters with
synchronous replication
Exchange 2010 makes it simpler
•
Database Availability Group (DAG) with
members in different datacenters/sites
− Supports automatic and manual cross-site
database switchovers and failovers (*overs)
− No stretched Active Directory site
− No special networking needed
− No /recoverCMS
Suitability of site resilience solutions
Solution
RTO goal
RPO goal
Deployment
complexity
Ship backups and
restore
High
High
Low
Standby Exchange
2003 clusters
Moderate
Low
High
CCR+SCR in
separate AD sites
Moderate
Low
Moderate
CCR in a stretched
AD site
Low
Low
High
Exchange 2010
DAGs
Low
Low
Low
Site Resilience Models
Voter Placement and Infrastructure Design
Infrastructure Design
•
•
There are two key models you have to take into
account when designing site resilient solutions
− Datacenter / Namespace Model
− User Distribution Model
When planning for site resilience, each datacenter
is considered active
− Exchange Server 2010 site resilience requires active
CAS, HUB, and UM in standby datacenter
− Services used by databases mounted in standby
datacenter after single database *over
Infrastructure Design
User Distribution Models
•
•
•
The locality of the users will ultimately determine your site
resilience architecture
− Are users primarily located in one datacenter?
− Are users located in multiple datacenters?
− Is there a requirement to maintain user population in a particular
datacenter?
Active/Passive user distribution model
− Database copies deployed in the secondary datacenter, but no
active mailboxes are hosted there
Active/Active user distribution model
− User population dispersed across both datacenters with each
datacenter being the primary datacenter for its specific user
population
Infrastructure Design
Client Access Arrays
• 1 CAS array per AD site
•
•
•
− Multiple DAGs within an AD site can use the same CAS array
FQDN of the CAS array needs to resolve to a loadbalanced virtual IP address in DNS, but only in internal
DNS
− You need a load balancer for CAS array, as well
Set the databases in the AD site to utilize CAS array via
Set-MailboxDatabase -RPCClientAccessServer property
By default, new databases will have the
RPCClientAccessServer value set on creation
− If database was created prior to creating CAS array, then it is set to
random CAS FQDN (or local machine if role co-location)
− If database is created after creating CAS array, then it is set to the
CAS array FQDN
Voter Placement
•
•
Majority of voters should be deployed in
primary datacenter
− Primary = datacenter with majority of user
population
If user population is spread across
datacenters, deploy multiple DAGs to
prevent WAN outage from taking one
datacenter offline
Voter Placement
DAG2
Witness
HUB03
CAS04
MBX04
MBX05
HUB04
DAG1
Alt Wit
MBX03
DAG2
HUB02
CAS01
MBX02
MBX07
HUB01
MBX01
DAG1
MBX08
CAS02
DAG2
Alt Wit
DAG1
Witness
CAS03
Seattle
MBX06
Portland
Site Resilience
Namespace, Network and Certificate
Planning
Planning for site resilience
Namespaces
• Each datacenter is considered active and needs
•
•
their own namespaces
Each datacenter needs the following namespaces
−
−
−
−
OWA/OA/EWS/EAS namespace
POP/IMAP namespace
RPC Client Access namespace
SMTP namespace
In addition, one of the datacenters will maintain
the Autodiscover namespace
Planning for site resilience
Namespaces
•
•
•
Best Practice: Use Split DNS for Exchange
hostnames used by clients
Goal: minimize number of hostnames
− mail.contoso.com for Exchange connectivity on
intranet and Internet
− mail.contoso.com has different IP addresses in
intranet/Internet DNS
Important – before moving down this path,
be sure to map out all host names (outside
of Exchange) that you want to create in the
internal zone
Planning for site resilience
Namespaces
External
DNS
External
DNS
Mail.contoso.com
Pop.contoso.com
Imap.contoso.com
Autodiscover.contoso.co
m
Smtp.contoso.com
Mail.region.contoso.com
Pop.region.contoso.com
Imap.region.contoso.com
Smtp.region.contoso.com
Exchange
Config
Exchange
Config
ExternalURL =
mail.contoso.com
CAS Array =
outlook.contoso.com
OA endpoint =
mail.contoso.com
Internal DNS
Mail.contoso.com
Pop.contoso.com
Imap.contoso.com
Autodiscover.contoso.co
m
Smtp.contoso.com
Outlook.contoso.com
Datacenter 1
CAS
AD
HT
MBX
Datacenter 2
HT
MBX
CAS
AD
ExternalURL =
mail.region.contoso.com
CAS Array =
outlook.region.contoso.co
m
OA endpoint =
mail.region.contoso.com
Internal DNS
Mail.region.contoso.com
Pop.region.contoso.com
Imap.region.contoso.com
Smtp.region.contoso.com
Outlook.region.contoso.co
m
Planning for site resilience
Network
•
Design High Availability for Dependencies
−
−
−
−
−
−
Active Directory
Network services (DNS, TCP/IP, etc.)
Telephony services (Unified Messaging)
Backup services
Network services
Infrastructure (power, cooling, etc.)
Planning for site resilience
Network
• Latency
•
•
− Must have less than 250 ms round trip
Network cross-talk must be blocked
− Router ACLs should be used to block traffic between
MAPI and replication networks
− If DHCP is used for the replication network, DHCP can
be used to deploy static routes
Lower TTL for all Exchange records to 5 minutes
− OWA/EAS/EWS/OA, IMAP/POP, SMTP, RPCCAS
− Both internal and external DNS zone
Planning for site resilience
Certificates
Certificate Type
Pros
Cons
Wildcard Certs
•One cert for both sides
•Flexible if names change
•Wildcard certs can be expensive, or
impossible to obtain
•WM 5 clients don’t work with wildcard
certs
•Setting of Cert Principal Name to
*.company.com is global to all CAS in
forest
Intelligent Firewall
•Traffic is forwarded to the
‘correct’ CAS
•Requires ISA or other firewall which can
forward based on properties
•Additional hardware required
•AD replication delays affect publishing
rules
Load Balancer
•Load Balancer can listen for both
external names and forward to the
‘correct’ CAS
•Requires multiple certificates
•Requires multiple IP’s
•Requires load balancer
Same Config in Both
Sites
•Just an A record change required
after site failover
•No way to run DR site as Active during
normal operation
Manipulate Cert
Principal Name
•Minimal configuration changes
required after failover
•Works with all clients
•Setting of Cert Principal Name to
mail.company.com is global to all CAS in
forest
Planning for site resilience
Certificates
•
Best practice: minimize the number of certificates
− 1 certificate for all CAS servers + reverse proxy + Edge/Hub
− Use Subject Alternative Name (SAN) certificate which can cover
multiple hostnames
− 1 additional certificate if using OCS
−
•
OCS requires certificates with <=1024 bit keys and the server name in the
certificate principal name
If leveraging a certificate per datacenter, ensure
the Certificate Principal Name is the same on all
certificates
− Outlook Anywhere won’t connect if the Principal Name
on the certificate does not match the value configured
in msstd: (default matches OA RPC End Point)
−
Set-OutlookProvider EXPR -CertPrincipalName
msstd:mail.contoso.com
Datacenter Switchover
Switchover Tasks
Datacenter Switchover Process
•
•
•
•
•
Failure occurs
Activation decision
Terminate partially running primary datacenter
Activate secondary datacenter
− Validate prerequisites
− Activate mailbox servers
− Activate other roles (in parallel with previous step)
Service is restored
Datacenter Switchovers
Primary to Standby
1.
2.
3.
4.
5.
6.
Primary site fails
Stop-DatabaseAvailabilityGroup
<DAGName> –ActiveDirectorySite
<PSiteName> –ConfigurationOnly (run
this in both datacenters)
Stop-Service clussvc
Restore-DatabaseAvailabilityGroup
<DAGName> –ActiveDirectorySite
<SSiteName>
Databases mount (assuming no
activation blocks)
Adjust DNS records for SMTP and
HTTPS
Standby to Primary
1.
2.
3.
4.
5.
6.
7.
8.
Verify all services working
Start-DatabaseAvailabilityGroup
<DAGName> –ActiveDirectorySite
<PSiteName>
Set-DatabaseAvailabilityGroup
<DAGName> –WitnessDirectory
<Directory> –WitnessServer
<ServerName>
Reseed data
Schedule downtime for dismount
Change DNS records back
MoveActiveMailboxDatabase <DBName> –
ActivateOnServer <ServerName>
Mount databases in primary datacenter
Datacenter Switchover Tasks
•
•
•
Stop-DatabaseAvailabilityGroup
− Adds failed servers to stopped list
− Removes servers from started list
Restore-DatabaseAvailabilityGroup
− Force quorum
− Evict stopped nodes
− Start using alternate file share witness if necessary
Start-DatabaseAvailabilityGroup
− Remove servers from stopped list
− Join servers to cluster
− Add joined servers to started list
Client Experiences
Typical Outlook Behavior
•
•
All Outlook versions behave consistently in a single
datacenter scenario
− Profile points to RPC Client Access Server array
− Profile is unchanged by failovers or loss of CAS
All Outlook versions should behave consistently in a
datacenter switchover scenario
− Primary datacenter Client Access Server DNS name is bound to IP
address of standby datacenter’s Client Access Server
− Autodiscover continues to hand out primary datacenter CAS name
as Outlook RPC endpoint
− Profile remains unchanged
Client Experiences
Outlook – Cross-Site DB Failover Experience
•
•
Behavior is to perform a direct connect from the CAS array
in the first datacenter to the mailbox hosting the active
copy in the second datacenter
You can only get a redirect to occur by changing the
RPCClientAccessServer property on the database
Client Experiences
Other Clients
•
Other client behavior varies based on
protocol and scenario
In-Site *Over
Scenario
Out-of-Site *Over
Scenario
Datacenter
Switchover
OWA
Reconnect
Manual Redirect
Reconnect
OA
Reconnect
Reconnect / Autodiscover
Reconnect
EAS
Reconnect
Redirect or proxy
Reconnect
POP/IMAP
Reconnect
Proxy
Reconnect
EWS
Reconnect
Autodiscover
Reconnect
Autodiscover
N/A
Seamless
Reconnect
SMTP /
Powershell
N/A
N/A
Reconnect
End of Exchange 2010 High
Availability Module
For More Information
•
•
•
•
Exchange Server Tech Center
http://technet.microsoft.com/en-us/exchange/default.aspx
Planning services
http://technet.microsoft.com/en-us/library/cc261834.aspx
Microsoft IT Showcase Webcasts
http://www.microsoft.com/howmicrosoftdoesitwebcasts
Microsoft TechNet
http://www.microsoft.com/technet/itshowcase
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.