SQL Server - WordPress.com

Download Report

Transcript SQL Server - WordPress.com

Overview of high availability in
Microsoft SQL Server
Szymon Wójcik
Agenda
Introduction
What is availability?
What is and why to choose high availability?
Key factors to consider for high availability scenario
High availability techniques in Microsoft SQL Server
Replication
Log shipping
Mirroring
Failover clustering
Discussion
PLSSUG Cracow Partners
Agenda
Introduction
What is availability?
What is and why to choose high availability?
Key factors to consider for high availability scenario
High availability techniques in Microsoft SQL Server
Replication
Log shipping
Mirroring
Failover clustering
Discussion
Introduction
Szymon Wójcik
Experience with MS SQL Server since 2000
(dev/admin)
MCITP: DBA SQL Server 2005
Interests:
Performance tuning
High availability
Blog – sqlphobosq.wordpress.com
Twitter - @phobosq
Agenda
Introduction
What is availability?
What is and why to choose high availability?
Key factors to consider for high availability scenario
High availability techniques in Microsoft SQL Server
Replication
Log shipping
Mirroring
Failover clustering
Discussion
Availability [1/5]
One of the concepts defined within ITIL
Ability to perform its agreed function when required
Determined by:
Reliability – how long (MTBF)
Maintainability – how quickly restored (MTRS)
Serviceability – contract conditions
Performance
Security
Confidentiality
Integrity
Availability
Availability [2/5]
Best practice – measure in %:
𝐴𝑔𝑟𝑒𝑒𝑑 𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝑇𝑖𝑚𝑒 − 𝐷𝑜𝑤𝑛𝑡𝑖𝑚𝑒
𝐴=
∗ 100
𝐴𝑔𝑟𝑒𝑒𝑑 𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝑇𝑖𝑚𝑒
Agreed Service Time – defined in SLA (Service Level
Agreement)
Downtime – duration of service unavailability during
Agreed Service Time
Important when planning/deploying a service to
understand availability concept
Availability [3/5] – figures for one week
Allowed downtime duration per week [hh:mm:ss format]
Availability level
8x5 (40 hours/week)
24x7 (168 hours/week)
80%
08:00:00
33:36:00
90%
04:00:00
16:48:00
95%
02:00:00
08:24:00
98%
00:48:00
03:21:36
99%
00:24:00
01:40:48
99,9%
00:02:24
00:10:05
99,99%
00:00:14,4
00:01:01
99,999%
00:00:01,44
00:00:06
Availability [4/5] – figures for one year
Allowed downtime duration per year [DD.hh:mm:ss
format]
Availability level
8x5 (40 hours/week)
24x7 (168 hours/week)
80%
52.00:00:00
73.00:00:00
90%
26.00:00:00
36.12:00:00
95%
13.00:00:00
18.06:00:00
98%
5.04:48:00
7.07:24:00
99%
2.14:24:00
3.15:48:00
99,9%
0.06:14:24
0.08:45:36
99,99%
0.00:37:26
0.00:52:33
99,999%
0.00:03:45
0.00:05:15
Availability [5/5] – important notes
Availability != Uptime (service may be up but
unavailable)
Scheduled downtime does not have to cause
unavailability (up to definition in SLA)
Agenda
Introduction
What is availability?
What is and why to choose high availability?
Key factors to consider for high availability scenario
High availability techniques in Microsoft SQL Server
Replication
Log shipping
Mirroring
Failover clustering
Discussion
High availability - definition
System design approach and service implementation
that ensures a certain level of operational
performance (Wikipedia)
Masks the effects of hardware or software failure
Maintains availability of applications so that
perceived downtime is minimized (Microsoft)
High availability != disaster recovery
High availability is used for ensuring for meeting
Service Level Target for availability
Disaster recovery is ensuring operational continuity
They can be used complementary – HA can minimize
the need of invoking DR, but never replace it
Why to choose high availability
For users:
Minimizes downtime probability
Allows to sustain a failure if properly designed
For administrators:
Simplifies migration effort
Minimizes risk of continuity
Agenda
Introduction
What is availability?
What is and why to choose high availability?
Key factors to consider for high availability scenario
High availability techniques in Microsoft SQL Server
Replication
Log shipping
Mirroring
Failover clustering
Discussion
Single point of failure
A whole system is as strong as the weakest link
Switch
Server
LAN
User
Router
Server on the Web
Hardware redundancy
Introduce additional hardware to minimize risk of
failure
Switch
Switch
Server
LAN
User
Router
Router
Server on the Web
Hardware redundancy
Not only whole machines may be multiplicated to
become fault tolerant
Also components:
Power supplies
CPUs
Hard disks
Network interface cards
Storage controllers
Standby node
A standby node is a machine in a HA system that
takes over in case of primary server failure
Three types:
Cold standby – Unplugged, needs to be prepared before
use
Warm standby – Ready to use, but requires manual switch
Hot standby – Ready to use, takes over automatically
Fail over = switching from primary to standby
Fail back = return to primary
There may be more than one standby in HA scenario!
Load balancing vs failover
Load balancing – distributing of workload between
several peer servers
If one goes down, others take over
Workload distributed by load balancer
Failover – automatic switch to standby
Standby is not active
Switch initiated upon loss of heartbeat
Other points
High availability requires additional costs – multiple
components must be present according to design in
order to meet requirements
It may become complex to maintain – additional CIs
present in environment that need to be kept up-todate
Hardware design must be followed by software to
fully benefit from HA
KISS – Keep It Simple and Stupid
Agenda
Introduction
What is availability?
What is and why to choose high availability?
Key factors to consider for high availability scenario
High availability techniques in Microsoft SQL Server
Replication
Log shipping
Mirroring
Failover clustering
Discussion
High availability in Microsoft SQL Server
SQL Server, as a RDBMS, provides means for failover
scenario
Load balancing is difficult and must be properly
designed in order to work
High availability in SQL Server does not prevent
logical data corruption – periodic DBCC checks are
advised
HA methods overview in SQL Server
Method
What it does
Replication
Transfers completed
transactions to standby
nodes
Log shipping
Performs regular log
backups, copy to
standby and restore
Database
mirroring
Replays transactions as
they are logged
Failover
cluster
Monitors Windows
service status and
transfers execution
Standby
type
Cold/warm
Cold
Warm/hot
Hot
# of standby
nodes
Any
Remark
•
•
•
Standby is accessible
May allow for updates
Conflicts may appear
•
Database unavailable
during restore
Standby may be
accessible
Any
•
1
•
•
Standby unavailable
Requires third server
to allow hot standby
•
Requires shared (or
replicated) storage
Requires identical
hardware
Failover = downtime
Any
•
•
Replication
Three server roles in replication:
Publisher
Distributor
Subscriber
Three types:
Snapshot
Transactional
Merge
Two subscription methods:
Push – Distributor pushes articles to Subscribers
Pull – Subscribers downloads from Distributor
Replication topology
Publisher
Subscriber
Distributor
Subscriber
Publisher
Subscriber
Subscriber
Possible application of replication
Create a second copy of data to be used in case of
emergency (DR)
Create a copy of data to offload the server (load
balancing)
Allow offline users to work with data and upload
their changes later (high availability)
Replication agents
External programs which are used to implement
replication:
Snapshot Agent:
creates snapshots
Log Reader Agent:
Reads transaction log
Marks transactions for replication
Distribution Agent:
Dispatches transactions to Subscriber
Merge Agent:
Downloads remote and uploads local changes
Resolves conflicts in merge replication
Snapshot replication
Publisher makes a copy of a database which is
applied at Subscriber
Good for small, static data:
Whole snapshot is applied every time – the changes which
appear after snapshot will be applied with next snapshot
Requires sufficient bandwidth
Transactional replication
Starts with a snapshot
Transactions are recorded at Publisher and replayed
at Subscriber
May allow for updatable subcriptions
If Subscriber is offline, records are stored at the
Distributor
Merge replication
Starts with a snapshot
Merges changes between Publisher and Subscribers
Allows synchronization via HTTPS (since SQL Server
2008)
Allows the most autonomous design – e.g. mobile
users, multiple branch offices working on the same
data
Replication how-to
Configure Distributor
Configure Publisher:
Select replication type
Select articles to be published
[Optional] Set up article filtering
Set up security
Configure Subscribers:
Connect to Distributor
Select subscription method
Apply snapshot
[Transactional/merge] Synchronize changes
Failover in replication
Stop subscription
Direct all traffic from Publisher to Subscriber:
Change application connection strings
Change DNS aliases, if required, or
Change IP addresses
Failback in replication
After restoring Publisher, restore a copy of database
from Subscriber
Direct all traffic from Subscriber to Publisher
Reestablish the replication
Log shipping
Keeps a standby by automating backup, copy and
restore process
Three server roles in log shipping:
Primary
Secondary
Monitor
How it works? [1/2]
Restore a full backup from Primary to Secondary and
then:
A job runs on Primary which backs up transaction log
Second job copies the log backup to Secondary
Third job on Secondary restores the log after it’s copied
[Optional] Monitor server tracks performance and
incidents
How it works? [2/2]
Primary
Secondary
Monitor
Failover in log shipping
Copy transaction log backups from primary to
secondary
Backup tail of the log on primary
Restore all backups except tail-log with NORECOVERY
Restore tail-log with RECOVERY
Disable log shipping jobs
Redirect client traffic to secondary
Drawbacks of log shipping
You can’t miss a transaction log backup
Network traffic generated has to be considered
You are always behind on Secondary
Secondary is read-only
Database mirroring
Allows to keep your standby up-to-date
Allows automatic failover
Cost-effective alternative to clustering
Available in Standard Edition (2005 – 2008 R2)
Does not require cluster capable hardware
Might be in implemented when Windows
Authentication mode is not possible (using
certificates)
How it works?
Mirror
Principal
Witness
Database mirroring modes
High availability (with witness)
Automatic failover
Synchronous transaction commit (principal commits after
mirror confirms it’s commit)
High protection (without witness)
Manual failover
Synchronous transaction commit
High performance (without witness)
Manual failover
Asynchronous transaction commit
Manual failover in database mirroring
Can be done with
one mouse click in
SSMS
Requires client
traffic redirection:
Possible within
connection string
using Failover
Partner command
Automatic failover in database mirroring
Initiated automatically by witness if there is no
quorum:
If principal is unavailable, fails over to mirror
Does nothing if mirror becomes unavailable
Fails over also if principal is up but unreachable from
network!
Requires client traffic redirection:
Possible within connection string using Failover Partner
command
Failover clustering
Provides protection on a server level:
Automatic failover in case of server failure
Fails over logins, endpoints and jobs
Combines multiple machines (nodes) in a single
virtual server
Requires cluster-capable hardware:
Shared or common storage
Certified server hardware
Clustering
Node A
User
Storage
Node B
Cluster
Failover in a cluster
Node A
User
Storage
Node B
Cluster
Summary
Method
What it does
Replication
Transfers completed
transactions to standby
nodes
Log shipping
Performs regular log
backups, copy to
standby and restore
Database
mirroring
Replays transactions as
they are logged
Failover
cluster
Monitors Windows
service status and
transfers execution
Standby
type
Cold/warm
Cold
Warm/hot
Hot
# of standby
nodes
Any
Remark
•
•
•
Standby is accessible
May allow for updates
Conflicts may appear
•
Database unavailable
during restore
Standby may be
accessible
Any
•
1
•
•
Standby unavailable
Requires third server
to allow hot standby
•
Requires shared (or
replicated) storage
Requires identical
hardware
Failover = downtime
Any
•
•
Discussion
THANK YOU!