Exchange server 2010 high availability deep dive Scott Schnoll

Download Report

Transcript Exchange server 2010 high availability deep dive Scott Schnoll

SESSION CODE: EXL407
Scott Schnoll
Principal Technical Writer
Microsoft Corporation
EXCHANGE SERVER 2010
HIGH AVAILABILITY DEEP DIVE
(c) 2011 Microsoft. All rights reserved.
Agenda
► Exchange Server 2010 High Availability Deep
Dive
–
–
–
–
Database Availability Group Networks
Active Manager
Best Copy Selection
Datacenter Activation Coordination Mode
(c) 2011 Microsoft. All rights reserved.
Exchange Server 2010 High
Availability
Deep Dive: Database Availability Group
Networks
DAG Networks
► A DAG network is a collection of one or more subnets
► There are two types of DAG networks
– MAPI Network - connects DAG members to network resources
(Active Directory, other Exchange servers, DNS, etc.)
• Registered in DNS / DNS configured
• Uses default gateway
• Client for Microsoft Networks/File and Print Sharing enabled
– Replication Network - used for/by continuous replication (log
shipping and seeding)
• Not registered in DNS / DNS not configured
• Typically no default gateway
• Client for Microsoft Networks/File and Print Sharing disabled
DAG Networks
► All DAGs must have:
– Exactly one MAPI network
– Zero or more Replication networks
• Separate network(s) on separate subnet(s)
• LRU determines which replication network is used with multiple
replication networks
► DAG networks automatically created when Mailbox
server is added to DAG
– Based on cluster’s enumeration of networks
• Cluster enumeration based on subnet
• One cluster network is created for each subnet
DAG Networks
► Maximum round trip return latency between all
DAG members must be 500 ms or less
– Regardless of the latency of the solution, customers
should validate that the network between all DAG
members is capable of satisfying the data protection
and availability goals of the deployment
– May need to investigate increasing the number of
databases or decreasing the number of mailboxes per
database to achieve desired goals
DAG Networks
Server / Network
IP Address / Subnet Bits
Default Gateway
EX1 – MAPI
192.168.0.15/24
192.168.0.1
EX1 – REPLICATION
10.0.0.15/24
N/A
EX2 – MAPI
192.168.0.16/24
192.168.0.1
EX2 – REPLICATION
10.0.0.16/24
N/A
Name
Subnet(s)
Interface(s)
MAPI Access
Enabled
Replication
Enabled
DAGNetwork01
192.168.0.0/24
EX1 (192.168.0.15)
EX2 (192.168.0.16)
True
True
DAGNetwork02
10.0.0.0/24
EX1 (10.0.0.15)
EX2 (10.0.0.16)
False
True
DAG Networks
Server / Network
IP Address / Subnet Bits
Default Gateway
EX1 – MAPI
192.168.0.15/24
192.168.0.1
EX1 – REPLICATION
10.0.0.15/24
N/A
EX2 – MAPI
192.168.1.15/24
192.168.1.1
EX2 – REPLICATION
10.0.1.15/24
N/A
Name
Subnet(s)
Interface(s)
DAGNetwork01
192.168.0.0/24 EX1 (192.168.0.15)
True
True
DAGNetwork02
10.0.0.0/24
False
True
DAGNetwork03
192.168.1.0/24 EX2 (192.168.1.15)
True
True
DAGNetwork04
10.0.1.0/24
False
True
EX1 (10.0.0.15)
EX2 (10.0.1.15)
MAPI Access
Enabled
Replication
Enabled
DAG Networks
► Collapse subnets into two DAG networks and
disable replication for the MAPI network:
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 Subnets 10.0.0.0,10.0.1.0
Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03
Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
Name
Subnet(s)
Interface(s)
DAGNetwork01
192.168.0.0/24 EX1 (192.168.0.15)
True
True
DAGNetwork02
10.0.0.0/24
False
True
DAGNetwork03
192.168.1.0/24 EX2 (192.168.1.15)
True
True
DAGNetwork04
10.0.1.0/24
False
True
EX1 (10.0.0.15)
EX2 (10.0.1.15)
MAPI Access
Enabled
Replication
Enabled
DAG Networks
► Collapse subnets into two DAG networks and
disable replication for the MAPI network:
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 Subnets 10.0.0.0,10.0.1.0
Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03
Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
Name
Subnet(s)
Interface(s)
DAGNetwork01
192.168.0.0/24 EX1 (192.168.0.15)
192.168.1.0/24 EX2 (192.168.1.15)
False
True
DAGNetwork02
10.0.0.0/24
10.0.1.0/24
False
True
EX1 (10.0.0.15)
EX2 (10.0.1.15)
MAPI Access
Enabled
Replication
Enabled
DAG Networks
► Automatic detection occurs only when members added
to DAG
– If networks are added after member is added, you must perform
discovery
Set-DatabaseAvailabilityGroup -DiscoverNetworks
► DAG network configuration persisted in cluster registry
– HKLM\Cluster\Exchange\DAG Network
► DAG networks include built-in encryption and
compression
– Encryption: Kerberos SSP EncryptMessage/DecryptMessage APIs
– Compression: Microsoft XPRESS, based on LZ77 algorithm
DAG Networks
► Block cross-network communication to minimize
heartbeat traffic
Allowed
Subnet 1
Subnet 3
Subnet 2
Subnet 4
Blocked
DAG Networks
► If using iSCSI storage, configure DAG and cluster
to ignore iSCSI networks
1. Set-DatabaseAvailabilityGroupNetwork -Identity
<DAGNetworkName> -ReplicationEnabled:$false
-IgnoreNetwork:$true
2. Cluster network <ClusterNetworkName> /prop Role=0
DAG Networks
► When a DAG spans multiple subnets you need an IP
address on the MAPI network for each subnet
► Use DHCP in site resilience configurations to assign
IP addresses to Replication network
– Enables delivery of the typically required static routes
– If using static IP addresses, use netsh to configure static
routes
► Configure a DNS TTL on service access connection
records that is consistent with your SLA, e.g. ~5
minutes for a one hour RTO SLA
Exchange Server 2010 High
Availability
Deep Dive: Active Manager
Active Manager
► What are the three Active Manager roles?
– Standalone
– PAM (Primary Active Manager)
– SAM (Standby Active Manager)
► Transition of role state logged into MicrosoftExchange-HighAvailability/Operational event log
(Crimson Channel)
Active Manager Functionality
►Mount and Dismount Databases
►Provide Database Availability
Information
►Provide Interface for Administrative
Tasks
►Monitor for Failures
►Maintains Database and Server State
Information
AutoMount on DAG Members
► In a DAG, all AutoMount operations are
coordinated through the PAM
► AutoMount operations occur:
– When the first server in the DAG is initialized
– When the ownership of the PAM role is changed
AutoMount on DAG Members
► Checks msExchMasterServerOrAvailabilityGroup
to determine all databases hosted on the DAG
► Checks if database can be mounted on startup
– If msExchEDBOffline is TRUE, stop processing
– If msExchEDBOffline is FALSE, proceed with processing
AutoMount on DAG Members
► Checks persistent database information stored in
cluster registry
► Determines if database is mounted on another
DAG member
– If the database is mounted on another server, take no
action
– If the database is not mounted on another server,
proceed
AutoMount on DAG Members
► Checks AdminDismount in cluster registry:
– If AdminDismount is TRUE, take no action
– If AdminDismount is FALSE, proceed
► Checks persistent database state information in
cluster registry for server on which database was last
mounted
– If server available, issue mount request to Information
Store on that server
– If server not available or property not set, issue mount
request to next server in sorted list
AutoMount on DAG Members
► If AutoMount operation succeeds:
– Update persistent database state information stored
in cluster database
– Propagate information to all other DAG members
Mount / Dismount Database Copy
► Mount Database
– An administrator action invoked through a task
– The last part of a move operation
► Dismount Database
– An administrator action invoked through a task
– The first part of a move operation
Mount Database – DAG Member
► Initiate RPC to member of the DAG
– If the server contacted is not the PAM, the task is
referred to the PAM
– If the server is the PAM, continue with no referral
► Checks the
msExchMasterServerOrAvailabilityGroup to
ensure database is hosted in the DAG
– If database is hosted in DAG, proceed
– If database is not hosted in DAG, error out
Mount Database – DAG Member
► Checks if the database is already mounted
– If already mounted, task fails
– If not already mounted, task continues
► PAM invokes callback
– This invokes a pre-check for the database mount
operation
– Persistent database state updated to show mount
Initiated
Mount Database – DAG Member
► PAM invokes RPC call to Information Store to
mount database
– If mount fails, task fails
– If mount succeeds, task completes successfully
► Persistent database state updated to record
results of operation and propagated to other
members
Dismount Database – DAG Member
► Task initiates call to PAM or is referred to PAM
► PAM checks that
msExchMasterServerOrAvailabilityGroup value
matches the DAG
► PAM verifies that database is mounted in the
DAG by checking persistent database state
information stored in registry
– If database is mounted, task proceeds
– If database is dismounted, task fails
Dismount Database – DAG Member
► PAM updates persistent state information in
cluster database to show state Initiated
► PAM makes RPC call to Information Store on
DAG member and invokes dismount
– If dismount operation succeeds, persistent database
state information stored in cluster database is
updated
– If dismount operation fails, task fails
Auto Dismount – DAG Member
► Occurs when a DAG loses quorum
► All DAG members are running (but may not be
participating in the cluster)
► Databases dismounted as quickly as possible to
avoid split-brain
– Information Store service is terminated
Auto Dismount – DAG Member
► Dismount operation should attempt to update
database state information in cluster database
► This is the only case where a database operation
occurs on a server other than the PAM
Active Manager – Move Database
► Move Database
– An administrator action invoked by a task
– Automatic operation initiated by the PAM (failover)
► Begins with a Dismount operation and ends with
a Mount operation
Exchange Server 2010 High
Availability
Deep Dive: Best Copy Selection
Best Copy Selection
►Process of finding the best copy of an
individual database to activate, given a list
potential copies for activation and their
status
►Active Manager selects the “best” copy to
become the new active copy when the
existing active copy fails or when an
administrator performs a targetless
switchover
Best Copy Selection – RTM
► Sorts copies by copy queue length to minimize
data loss, using activation preference as a
secondary sorting key if necessary
► Selects from sorted listed based on which set of
criteria met by each copy
► Attempt Copy Last Logs (ACLL) runs and
attempts to copy missing log files from previous
active copy
Best Copy Selection – SP1
► Sorts copies by activation preference when auto
database mount dial is set to Lossless
– Otherwise, sorts copies based on copy queue length, with
activation preference used a secondary sorting key if
necessary
► Selects from sorted listed based on which set of
criteria met by each copy
► Attempt Copy Last Logs (ACLL) runs and attempts to
copy missing log files from previous active copy
Best Copy Selection
► Is database mountable?
– Is copy queue length <= AutoDatabaseMountDial?
• If Yes, database is marked as current active and mount
request is issued
• If not, next best database tried (if one is available)
► During best copy selection, any servers that are
unreachable or “activation blocked” are ignored
Best Copy Selection
Criteria Copy Queue Length Replay Queue Length
Content Index
Status
1
< 10 logs
< 50 logs
Healthy
2
< 10 logs
< 50 logs
Crawling
3
N/A
< 50 logs
Healthy
4
N/A
< 50 logs
Crawling
5
N/A
< 50 logs
N/A
6
< 10 logs
N/A
Healthy
7
< 10 logs
N/A
Crawling
8
N/A
N/A
Healthy
9
N/A
N/A
Crawling
10
Any database copy with a status of Healthy,
DisconnectedAndHealthy, DisconnectedAndResynchronizing,
or SeedingSource
Best Copy Selection – RTM
► Four copies of DB1
► DB1 currently active on Server1
Server1
X
DB1
Database Copy
Server2
Server3
Server4
DB1
DB1
DB1
Activation
Preference
Copy Queue
Length
Replay Queue
Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
Healthy
Best Copy Selection – RTM
► Sort list of available copies based by Copy
Queue Length (using Activation Preference as
secondary sort key if necessary):
– Server3\DB1
– Server2\DB1
– Server4\DB1
Database Copy
Activation
Preference
Copy Queue
Length
Replay Queue
Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
Healthy
Best Copy Selection – RTM
► Only two copies meet first set of criteria for
activation (CQL< 10; RQL< 50; CI=Healthy):
– Server3\DB1
– Server2\DB1
– Server4\DB1
Database Copy
Lowest copy queue length – tried first
Activation
Preference
Copy Queue
Length
Replay Queue
Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
Healthy
Best Copy Selection – SP1
► Four copies of DB1
► DB1 currently active on Server1
► Auto database mount
Server1
dial set to Lossless
X
DB1
Database Copy
Server2
Server3
Server4
DB1
DB1
DB1
Activation
Preference
Copy Queue
Length
Replay Queue
Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
Healthy
Best Copy Selection – SP1
► Sort list of available copies based by Activation
Preference:
– Server2\DB1
– Server3\DB1
– Server4\DB1
Database Copy
Activation
Preference
Copy Queue
Length
Replay Queue
Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
Healthy
Best Copy Selection – SP1
► Sort list of available copies based by Activation
Preference:
– Server2\DB1
– Server3\DB1
– Server4\DB1
Database Copy
Lowest preference value – tried first
Activation
Preference
Copy Queue
Length
Replay Queue
Length
CI State
Database State
Server2\DB1
2
4
0
Healthy
Healthy
Server3\DB1
3
2
2
Healthy
DiscAndHealthy
Server4\DB1
4
10
0
Crawling
Healthy
Best Copy Selection
► After Active Manager determines the best copy
to activate
– The Replication service on the target server
attempts to copy missing log files from the source
(ACLL)
• If successful, then the database will mount with zero
data loss
• If unsuccessful (lossy failure), then the database will
mount based on the AutoDatabaseMountDial setting
• If data loss is outside of dial setting, next copy will be
tried
Best Copy Selection
►If an activated database copy is mounted
– It will generate new log files (using the same log
generation sequence)
– Transport Dumpster requests will be initiated for
the mounted database to recover lost messages
– When original server or database recovers, it
will run through divergence detection and either
perform an incremental resync or require a full
reseed
Exchange Server 2010 High
Availability
Deep Dive: Datacenter Activation
Coordination Mode
Datacenter Activation Coordination Mode
► DAC mode is a property of a DAG
► Acts as an application-level form of quorum
– Controls whether or not a Mailbox server attempts to
mount its active databases on startup
– Designed to prevent multiple copies of same database
mounting on different members due to loss of network
(split brain)
► Also enables use of Site Resilience tasks
– Stop-DatabaseAvailabilityGroup
– Restore-DatabaseAvailabilityGroup
– Start-DatabaseAvailabilityGroup
Datacenter Activation Coordination Mode
► RTM: DAC Mode for DAGs with three or more
members that are extended to two Active
Directory sites
– Don’t enable for two-member DAGs where each
member is in different AD site or DAGs where all
members are in the same AD site
► SP1: DAC Mode can be enabled for all DAGs
► If using Third Party Replication (TPR) mode,
check with your vendor for guidance on DAC
mode
Datacenter Activation Coordination Mode
►Uses Datacenter Activation
Coordination Protocol (DACP)
►A bit in memory (in
MSExchangeRepl.exe) set to either:
– 0 = can’t mount
– 1 = can mount
Datacenter Activation Coordination Mode
► Active Manager startup sequence
– DACP is set to 0
– DAG member communicates with other DAG members it
can reach to determine the current value for their DACP
bits
• If the starting DAG member can communicate with all other
members on the StartedServers list, DACP bit switches to 1
• If the starting DAG member can communicate with another
member, and that other member’s DACP bit is set to 1, starting
DAG member DACP bit switches to 1
• If the starting DAG member can communicate with another
member, and that other member’s DACP bits are set to 0,
starting DAG member DACP bit remains at 0
Outlook
Outlook
DAG1 HT2010
FSW
DAG1
CAS-Pri
HT2010
CAS-Sec
Active
MBX-A
Secondary Datacenter
Primary Datacenter
Datacenter Activation Coordination Mode
Active
MBX-B
MBX-C
MBX-D
Secondary Datacenter
Primary Datacenter
Datacenter Activation Coordination Mode
Outlook
Outlook
AWS
DAG1 HT2010
FSW
DAG1
CAS-Pri
Active
MBX-A
HT2010
CAS-Sec
Active
MBX-B
MBX-C
MBX-D
Secondary Datacenter
Primary Datacenter
Datacenter Activation Coordination Mode
Outlook
Outlook
AWS
DAG1 HT2010
FSW
DAG1
CAS-Pri
Active
0
MBX-A
HT2010
CAS-Sec
Active
0
MBX-B
1
MBX-C
1
MBX-D
Resources
Exchange Team Blog - http://aka.ms/ehlo
Exchange 2010 Documentation - http://aka.ms/ex2010docs
My Blog – http://aka.ms/schnoll
Twitter: @schnoll
Enrol in Microsoft Virtual Academy Today
Why Enroll, other than it being free?
The MVA helps improve your IT skill set and advance your career with a free, easy to access
training portal that allows you to learn at your own pace, focusing on Microsoft
technologies.
What Do I get for enrolment?
► Free training to make you become the Cloud-Hero in my Organization
► Help mastering your Training Path and get the recognition
► Connect with other IT Pros and discuss The Cloud
Where do I Enrol?
www.microsoftvirtualacademy.com
Then tell us what you think. [email protected]
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other
countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing
market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this
presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
(c) 2011 Microsoft. All rights reserved.
Resources
www.msteched.com/Australia
www.microsoft.com/australia/learning
Sessions On-Demand & Community
Microsoft Certification & Training Resources
http:// technet.microsoft.com/en-au
http://msdn.microsoft.com/en-au
Resources for IT Professionals
Resources for Developers
(c) 2011 Microsoft. All rights reserved.