Database Mirroring
Download
Report
Transcript Database Mirroring
Meeting the Availability
Challenge
Don Vilen
Program Manager
SQL Server
Microsoft Corporation
Causes of Data Unavailability
What part of your database system causes
the most data unavailability?
Hardware?
SQL Server and Windows?
Application software?
Your people?
People, Planning, and
Procedures
In several studies the major cause of data
unavailability is people
Operators and DBAs
Users
SQL Server 2005 has many
enhancements to improve availability, but
not all problems can be solved by DBMS
technology
Training your people, planning your
operations, and documenting your
procedures can have the greatest impact
on availability
Availability Enhancements
Database Snapshots
Database Mirroring
Partial Database
Availability
Table and Index
Partitioning
Row-level Versioning
Snapshot Isolation
Read Committed Snapshot
Isolation
Failover Clustering
enhancements
Fast Database Recovery
Peer-to-Peer Replication
Scaleable Shared
Database
Checksum on Data Pages
Instant File Initialization
Backup/Restore
Checksum on Backups
Backup Media Mirroring
Restore Read-Only
Filegroups without Log
Page-level Restore
Piecemeal Restore
Data Backups don’t block
Log Backups
Dedicated Admin
Connection (DAC)
Dynamic Configuration
CPU Affinity, AWE memory
Attach/Detach
Areas of Availability
Operator, DBA, and User Error
High Availability and the SQL Server
process
Availability inside the SQL Server process
Operator, DBA, User Error
People are the cause of most unavailability
Protect – Permissions: Restrict as necessary
Prevent / Avoid – Triggers on DDL and DML can
rollback an operation even if they do have
permission
Detect – Triggers and Events can be used to
detect and record who, what, when
Delay – Log Shipping can delay the error from
reaching the standby server
Recover – Database Snapshot can save a copy
of a database at a point in time
Can RESTORE from the Snapshot to recover
Database Snapshot
Read-only Snapshot of an entire database
at a point in time
Created instantly; no data is copied
Snapshot must be created before the error
Base database continues to change
Database Snapshot does not restrict the base
database
Multiple Snapshots are allowed
Database Snapshots can exist forever
Constrained by resources
Database Snapshot
Syntax Examples
To create a database snapshot
CREATE DATABASE mydbSnap0600
ON (<filelist>)
AS SNAPSHOT OF mydb
To drop a database snapshot
DROP DATABASE mydbSnap0600
To restore a database to a snapshot
RESTORE DATABASE mydb
FROM DATABASE_SNAPSHOT
= 'mydbsnap0600'
Database Snapshot
How it really works
CREATE DATABASE mydbSnap AS SNAPSHOT OF mydb
USE mydb
UPDATE (pages 4, 9, 10)
mydb – Database
Page
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
mydbSnap – Read-Only Database Snapshot
USE mydbSnap
SELECT (pages 4, 6, 9, 10, 14)
Areas of Availability
Operator, DBA, and User Error
High Availability and the SQL Server
process
Availability inside the SQL Server process
High Availability and the SQL
Server Process
Instance Maintenance
Instance and Database Startup
Ensuring High Availability for the SQL
Server Process
Instance Maintenance
Install and Upgrade
More reliable Install / Setup process
Faster Upgrade
Minimal downtime during upgrade
System Stored Procedures are in the ‘resource
database’
Read-only, invisible/transparent to users
Allows for fast, side-by-side upgrades
No scripts to run to upgrade system procs
master is upgraded during the Upgrade process
Other databases are upgraded as part of database
startup – once master is complete
Upgrade also occurs:
When a down-level database is attached
When a down-level database is restored
Instance Maintenance
Configuration
More dynamic configuration
No instance restart required to change:
AWE Memory configuration
Hot-add memory
CPU Affinity configuration
Surface Area Configuration Manager
Instance and Database
Startup
Faster Instance Startup
Memory, even AWE memory, is allocated as
needed
Faster Database Startup
Fast Recovery
User databases start up in parallel
Fast Recovery on
Database Restart
SQL Server 2000
Database is available after Undo completes
Redo
Undo
Time
Available
SQL Server 2005, Enterprise Edition
Database is available when Undo begins
Redo
Undo
Available
Ensuring High Availability for
the SQL Server Process
Failover Clustering
Database Mirroring
Peer-to-Peer Transactional Replication
Others
Log Shipping
Backup/Restore and Attach/Detach
3rd Party Solutions
Geographically-Dipersed Clusters
Hot Standby Failover Solutions
Failover Clustering and Database Mirroring
Both Provide
Failover Cluster
Automatic detection
Automatic, fast failover
Manual failover
Transparent client redirect
Zero work loss
Database Mirroring
Failover Cluster
Failover Clustering
* Inst1
Hot Standby – Automatic failover
Built on Microsoft Server Clusters (MSCS)
Multiple nodes provide availability, transparent to client
Maximum of 2, 4, or 8 nodes depending on OS edition
Automatic detection and failover
Requires certified hardware; see Windows Catalog: Clustered
Supports many scenarios: Multiple Active Instances, N+1, N+I
Multiple Active Instances
* Inst1
Inst3 *
Inst2 *
N+1: N Active, 1 Inactive
Instances
* Inst1
Inst2 *
N+I: N Active, I Inactive
Instances
Failover Cluster
Failover Clustering
Hot Standby Instance
* Inst1
Zero work loss, zero impact on throughput
Instance Failover – entire instance works as
a unit
Single copy of instance databases
Available since SQL Server 7.0
Setup is performed at install time
Clients connect to ‘virtual’ server-name / IP
Failover Cluster
Failover Clustering
SQL Server 2005
More nodes
Match operating system limits
Two-node Failover Clustering is available
in SQL Server 2005 Standard Edition
Unattended setup
Support for mounted volumes (Mount Points)
Full support for Majority Node Set quorum
Full-Text Search and Analysis Services are now
cluster aware
Database Engine, Agent have been since 7.0
Analysis Services now has multiple instances
Database Mirroring
New for SQL Server 2005
Hot Standby Database
Database Failover
Very fast failover
Just seconds in most cases
Zero data loss
Automatic or manual failover
Automatic re-sync after failover
Automatic, transparent client redirect
Database Mirroring
Database Mirroring
Database Mirroring
Hardware
Works with standard computers, storage,
and networks
No shared storage components, virtually no distance
limitations
Impact to transaction throughput
Zero to minimal, depending on environment / workload
/ network
Database Mirroring
Fault Tolerant Virtual Database
Clients
Witness
Principal
Mirror
Witness
Witness and Quorum
Sole purpose of the Witness is
to provide automatic failover
To survive the loss of one server you must
have at least three
Prevents “split brain”
Does a lost connection mean the partner is
down or is the network down?
To become the Principal, a server must
talk to at least one other server
The Witness does not direct the Mirror to
become Principal
It simply answers the question “Who do you see?”
Witness
Witness
Witness is an instance of
SQL Server 2005
Perhaps even SQL Server Express
Can be witness for multiple sessions
Consumes very little resources
Not a single point of failure
Partners can form quorum on their own
Database Mirroring
How it works
Application
Mirror is always
redoing – it
remains current
Witness
Commit
Principal
Mirror
1
5
2
SQL Server
2
Log
>2
Data
SQL Server
4
3
Log
>3
Data
Safety / Performance
There is a trade-off between performance
and safety
Database Mirroring has two safety levels
FULL – commit when logged on Mirror
Allows automatic failover
No data loss
OFF – commit when logged on Principal
System does its best to keep up
Prevents failover; to make mirror available
Must ‘force’ service
Or terminate Database Mirroring session
Database Mirroring Modes
High-Availability Mode
Safety Full; Synchronous operation
Database is available whenever a quorum exists
Automatic failover
High-Protection Mode
Safety Full; Synchronous operation
No witness
Principal continues servicing the database even if it loses
connection to mirror
Manual failover only; no automatic failover
A transition mode; should not be in this mode for long
High-Performance Mode
Safety Off; Asynchronous operation
Manual failover only
Supports only one form of role switching: forced service (with
possible data loss)
Transparent Client Redirect
No changes to application code
Client automatically redirected if session is
dropped
Client library is aware of Principal and Mirror
servers
Upon initial connect to Principal, library
caches Mirror name
When client attempts to reconnect
If Principal is available, connects
If not, client library automatically redirects
connection to Mirror
Peer-to-Peer
Transactional
Replication
Peer-to-Peer
Transactional
Replication
New in SQL Server 2005
Transactional replication but ..
All participants are peers
Schema is identical on all sites
Publish the updates made on ‘their’ data
Subscribe to others to pick up their changes
No hierarchy as in ‘normal’ transactional replication
A given set of data can be updated at only one site at a time
Data ‘ownership’ is purely logical; does not prevent conflicts
SQL Server prevents a change from round-tripping
Enables load-balancing and high availability
Warm / hot standby
Small possibility of data loss on failure
Scalable Shared
Database
Scalable Shared
Database
New in SQL Server 2005 EE
Multiple instances sharing the
ReadOnly
same database files
Volume must be Read-only
Database access is Read-only
Enables load-balancing and scalability
Some availability increase but it depends on
shared disk system
Data files can be mirrors periodically split
off from an updateable database
Third-party functionality
Warm Standby Solutions
Replication
Replication and Log Shipping
Both Provide Multiple copies and Manual failover
Replication – since SQL Server 6.0
Primarily used where availability is required in
conjunction with scale out of read activity
Failover possible; a custom solution
Not limited to entire database; Can define subset of
source database or tables
Copy of database is continuously accessible for read
activity
Latency between source and copy can be as low as
seconds
Log Shipping
Basic idea: Backup, Copy, Restore Log will always be
supported
But no more investment in the scripts
Database scope
Database accessible but read-only
Users must exit for next log to be applied
Data backups no longer block log backups
Log Shipping
Cold Standby Solutions
Backup / Restore
Backup / Restore and
Detach / Copy / Attach
Both Provide
Manual detection and failover
Potential for some work loss
Whole-database scope
Standard servers
Limited reporting on standby
Duplicate copy of database
Client must know where to re-connect
Slowest failover – Most downtime
Backup / Restore
Smaller size – only used pages are copied
Log backups allow restore to point in time
Longer restore time
Detach / Copy / Attach
Copies entire files
No possibility of rolling forward subsequent logs
Detach / Copy / Attach
Third-Party Availability
Solutions
Many solutions by many companies with
many different names
Most replicate data and log changes by
one of these methods
Looking into SQL Server’s log files and
translating log records into logical operations
Intercepting IOs and duplicating them, either
to another disk set or simply recording them
Replicating the writes at the storage level
Local copies / mirrors for snapshot backups, etc.
Remote copies / mirrors for disaster recovery and
Geographically-Dispersed Clusters
Geographically-Dispersed
Clusters and
Storage-level Replication
Geographical
Failover Cluster
Solutions from many third party vendors replicate the
local I/Os on a remote system
Hardware and software methods
Synchronous and asynchronous solutions
Solutions must meet Core I/O Requirements
Many solutions use MSCS to provide automatic failover
See Windows Catalog, Geographically-dispersed Clusters
Other solutions are primarily to duplicate the data at a
remote site, often with an independent SQL Server
Similar to Log Shipping or Detach / Attach
Complementary Technologies
Failover Solution + Replication
Example
Publisher
Fault-Tolerant Publisher and Distributor
Distributor
Subscribers
Combining HA Technologies
Principal Server can be a Failover Cluster
Failover to mirror will occur before failover within the
cluster
So Principal will come back up as the Mirror
Mirror can be a Failover Cluster as well
Failover Cluster
Principal
Failover Cluster
Mirror
Comparing HA Technologies
How Do You Compare the Alternatives?
There are many dimensions to consider when
evaluating an HA / business continuance solution
Detection: Automatic or Manual
Failover: Automatic or Manual
Time to Fail Over
Number of Failures it can survive
Data Currency / Loss
Cost of redundant system(s)
Additional hardware
Additional management
Granularity of Data Safety:
Instance, Database, Table, Row
Complexity
Data Consistency
Transparency to Clients
Privileges Required to Setup
Remote DR Site
Impact on Performance
There is no “one size fits all”
Everyone’s requirements are different
Areas of Availability
Operator, DBA, and User Error
High Availability and the SQL Server
process
Availability inside the SQL Server process
Availability Inside the SQL
Server Process
Memory and CPUs
I/O and Media/Disks
Concurrency, Avoiding Deadlocks
Diagnosing an Issue
Tuning and Fragmentation
DBCC Operations
Memory and CPUs
Memory
Dynamic Configuration applies to AWE memory as
well as regular memory
AWE allocation is dynamic
Checkpoint – more well behaved for large memory
systems
Sniffer – Checksums periodically evaluated in cache
CPUs
CPU Affinity is Dynamically Configurable
NUMA-awareness
Scheduling on a CPU is at the batch level rather than
the session level
I/O and Media/Disks
I/O retries, diagnosis/transparency, and
better messages
SQL Server’s basic I/O requirements
RAID and storage is usually reliable, but
what happens if there are issues with:
Data Files
Log Files
Full-text Catalogs
Backup Files
tempdb
Microsoft SQL Server Core I/O
Requirements
I/O- and Storage-Level Replication
See whitepaper SQL Server 2000 I/O Basics on TechNet
http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIOb
asics.mspx
To ensure correct operation the underlying I/O and storage system
must meet the Core I/O Requirements
Stable Media – once ‘write’ is acknowledged, must be guaranteed
Write Ordering –writes must preserve ordering across devices
Torn I/O Prevention – writes must be done as a unit; not split
Applies to both local storage and any I/O-level or storage-level
replication scheme, synchronous or asynchronous
Guarantees that what appears at secondary site is an image that
actually appeared at some point in time at the primary
Asynchronous replication allows potential loss of committed
transactions
Data Files
Database Page Checksums
Detect disk I/O errors not reported by the hardware or operating
system
DBCC Check* now use Database Snapshots for improved
scalability
Partial Database Availability
Database is available if the Primary filegroup is available
Fine-Grained Online Repair
Online Restore – Database remains online; Only data being
restored is offline
Piecemeal Restore – Online restore of filegroups by priority
Filegroup / File backups allowed on Simple recovery model
databases
Page-level Restore – Can restore individual pages to repair errors
found by page checksum or torn pages
Instant File Initialization
Skips file zeroing, fast DB create / restore
Restore read-only filegroups without applying logs
Log Files and
Full-text Catalogs
Log Files
Checksum
Full-text Catalogs
Backup / Restore includes Full-text data
Detach / Attach includes Full-text data
Backup Files
RESTORE VERIFYONLY now checks
everything it can short of writing the data
Backup Media Mirroring
Extra copies for archival or disaster recovery
Backup and Database Page Checksums
RESTORE can detect disk I/O errors not
reported by the hardware or operating system
Can continue past errors – repair later
tempdb Scalability
Scalability of tempdb has been enhanced
Caching of initial pages for temporary tables and
table variables
Improved allocation page latching protocol
Reduced logging overhead for tempdb so there
is less I/O bandwidth for tempdb log file
More efficient allocation algorithm for pages from
‘mixed’ extents in tempdb
tempdb Best Practices
We still recommend the following if you see
latch contention on tempdb allocation or
system catalog pages:
Avoid auto grow
Pre-allocate space for tempdb files
Make as many tempdb files as you have CPUs
Account for any affinity mask settings
File sizes of equal amounts
Increased Concurrency
Avoiding Blocking and Deadlocks
Database Snapshots
Standby / Reporting server with
Replication
Log Shipping
Database Mirroring
Scalable Shared Databases – Read-only
Row-level versioning and new Snapshot
isolation levels
Increased Concurrency
Row-level Versioning
Row-level versioning keeps a copy of each row prior to
an update
Rather than wait for an update to complete, you can see
the data that was committed at the time:
Your transaction began:
Your statement began:
Snapshot Isolation
Read-Committed Snapshot Isolation
You only see committed data, and it is time-consistent
and transactionally consistent
But it might not be the most recent, so not appropriate for some
OLTP applications
If you can work with this snapshot data
Readers don’t wait for locks, reducing blocking and deadlocks
You have a view of the data at a single point in time
Triggers now use row-level versions rather than the log
Diagnosing an Issue
Diagnosis, Runaway query
DMVs
Events
DDL Triggers
Deadlock graphs in XML
Dedicated Administration Connection – DAC
Tuning and Fragmentation
Database Tuning Advisor
Understands indexes and partitions
Online Index Operations
Indexes remain online during reorganize and
rebuild
Table and Index Partitioning
Rows can be placed in filegroups depending
on a column value
Partial availability, defragmentation can occur
at the partition/filegroup level
Partitions can be moved among tables for fast
load, data movement
DBCC Operations
Shrink – includes BLOBs
Defrag – new
ALTER INDEX <xxx>
REORGANIZE | REBUILD
Check* uses Database Snapshots for
improved scalability, performance
In all editions!
Progress Reporting for Shrink and Defrag
Areas of Availability
Summary
Operator, DBA, and User Error
High Availability and the SQL Server
process
Availability inside the SQL Server process
For More Information
SQL Server Books Online
Whitepapers
Failover Clustering
Database Mirroring
Peer-to-Peer Replication
MSDN and TechNet webcasts
© 2006 Microsoft Corporation. All rights reserved.
This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.