Using_Failover_Clusters_for_High_Availabilityx

Transcript Using_Failover_Clusters_for_High_Availabilityx

Contact Information
https://github.com/RowdyVinson
[email protected]
@RowdyVinson
https://www.linkedin.com/in/rowdyvinson-0600024
Using Failover Clusters
for High Availability
ROWDY VINSON
Who are you?

Who here is a Developer-turned-DBA? Systems admins/engineers?
Other?

Anyone work in the virtualization/storage stack regularly?

Who knows PowerShell well enough to change a setting with a
script?

We’ll be talking to concepts in all these areas today, so please ask
questions if something doesn’t translate well into your native tongue
Who I am

In my free time I build things, play video games, and argue
the merits of Star Trek over all other “Star”-based scifi fandoms

Professionally, I’m a Systems Engineer and DBA who’s been
working with SQL server in various roles for about a decade

My team is responsible for delivery and support of 300 virtual
servers in a tier 4 datacenter environment

I’m responsible for the health of about 60 SQL server and
about 400 user databases supporting 30 different applications
Why is HA important?

All servers go down eventually
 Planned
 Upgrades
 Patches
(hardware and software)
(OS and applications)
 Unplanned
 Failures
 Accidents
 Jr.
Sys Admins (or Sr. Sys Admins between the hours
of 6pm Friday and 9am Monday)
 Save
your skin and start planning for this
How do we get HA? Clusters!

What is a “Cluster”?

Nodes
 Configured

Servers
Roles
 Services
 Resources

Storage

Names/IPs

Networks
How do we get HA? Clusters!

How do I keep my cluster happy?
 Quorum
 Cluster
 This

rules
Validation
is required for Microsoft Support
Redundant hardware platform
 VMware
host isolation
 Network
multipathing
 HA
SAN
How do we get HA? Clusters!

How do they react to X?

Maintenance (Demo 1)

Node-specific Failure (Demo 2)

Role performance issues (Demo 3)

Environment failure (Demo 4)
Demo 0
Scenario
Outcome
Check-Cluster*
This gives us an overview of the
state of the cluster. We’ll also use it
for populating variables for use later
in the demos.
*This is a script of my design and something we’ve found valuable
for troubleshooting in our environment. I’m adding some logic to
it to make remediation of issues better, but this is a work in
progress.
Demo 1
Scenario
Outcome
The node that the SQL role
is running on is gracefully
shut down.
What we see here is a best-case for
clustering. The non-active node
notices 5 seconds of heartbeat failure
(5 is the default threshold and 1
second is the default interval) and
initiates a role recovery on itself.
Demo 2
Scenario
Outcome
NIC is disabled on node
running the SQL role
In this case, a real “failure” happens
from the perspective of the cluster
service. The role is shifted to the
other node and service is restored.
Demo 3
Scenario
Outcome
High node resource use
This demonstrates that most typical
performance related issues will not
cause failover. HA is not a
performance-related solution.
Demo 4
Scenario
Outcome
Interrupted iSCSI connection
(failed SAN connection)
Cluster loses access to it’s quorum
drive and the storage resources for
the SQL role, so both nodes assume
they are wounded and they stop
serving the role.
This error extends well beyond the DBA team. The systems team, as well as
network team would have to coordinate to resolve this. Redundancy in the
network and virtualization stacks could help reduce this rick, but it is never zero.
Can my team support a cluster?

Can my team support a cluster?

What you need:
 Strong
Server support team

Virtualization expertise may be required here if it is used

Windows OS expertise is a must

PowerShell is a must

Enough staff to handle SLA commitments during turnover/vacations
 Significant
 Are

redundancy in the environment

High-tier datacenter

Redundancy in networks and virtualization
they right for me?
¯\_(ツ)_/¯
Questions?
More Reading

Edwin Sarmiento : http://www.edwinmsarmiento.com/resources-windowsserver-failover-clustering-wsfc-for-sql-server-dbas/

Cluster Quorum info: https://technet.microsoft.com/enus/library/jj612870(v=ws.11).aspx

More Quorum info:
https://blogs.msdn.microsoft.com/sqlalwayson/2012/03/13/quorum-voteconfiguration-check-in-alwayson-availability-group-wizards-andy-jing/

DR matrixes: https://www.brentozar.com/archive/2014/05/new-highavailability-planning-worksheet/

Using_Failover_Clusters_for_High_Availabilityx

Transcript Using_Failover_Clusters_for_High_Availabilityx

Directory