Building Mission Critical Systems with SQL Server 2005
Download
Report
Transcript Building Mission Critical Systems with SQL Server 2005
Building Highly Available
Systems with SQL Server™
2005
Availability
What does it mean to you?
Why not?
Can your customers get done,
what they need to get done,
when they need to do it?
Site is unavailable
System is unavailable
Database is unavailable
Database is partially unavailable
Table is unavailable
Data is unavailable
24x7x365
Take Advantage When?
How much work to leverage the technology?
Partial Database
Availability
Online Piecemeal
Restore
Instant File
Initialization
Online Index
Operations
Snapshot Isolation
Statement-level Snapshot
Failover Clustering
Database Mirroring
Log Shipping
Database Snapshots
Snapshot Isolation
With Update Conflict
Detection
Snapshot Isolation
Transaction-level Snapshot
(RO)
Online Index
Operations
When Criteria NOT Met
(minority)
When Criteria Met
Fast Recovery
Design and
Architect
Minimal Work to
Leverage
Upgrade
Immediate
Replication
Improving Availability from Installation to Design
Availability in Layers to minimize downtime and data loss
Improved Data Availability
without Requiring Standby
Downtime is reduced and/or prevented for
these barriers:
Database is partially unavailable
Table is unavailable
Data is unavailable
Barriers to Availability
Isolated Failures
Continuing to work with isolated failures
Limiting the scope of failure
Partial Database Availability
Online Piecemeal Restore
Supporting Technology
Instant File Initialization
How do they work?
What happens when…
Disks Fail
In SQL Server™ 2000
Database is marked suspect
Users are unable to access the database
In SQL Server™ 2005
Filegroup is marked offline
Users are able to access undamaged data
What happens when…
Recovery begins
In SQL Server™ 2000
Database is in a restoring state
Users are unable to access the database
File needs to be recreated and zero initialized
File Restore can proceed – offline
In SQL Server™ 2005
Filegroup is in a restoring state
Users are able to access undamaged data
File can be recreated with instant file Initialization
Piecemeal Restore can proceed – online
How is This Possible?
Fine grained operations are based on “functional
partitioning”
Partitioning – in this sense – does not require
Partitioned Tables
Partitioned Tables benefit significantly from fine
grained operations
Partitioning for fine grained operations requires
secondary, non-primary data files where data is
strategically placed
Recovery of your damaged devices can be
prioritized and then the database can be brought
online in stages
Functional Partitioning
Strategies to separate Objects/Data
Related Object-groupings
Separate tables – strategically placed on different
filegroups
Time-based data placement/partitioning
Structures designed for sliding window scenario
List-based groupings/partitioning
Range-based partitioning based on complete lists
To fully leverage Partial Database Availability for
partitioned objects – use Partitioned Tables
Partitioned Tables – new feature in SQL
Server™ 2005 to further simplify the process of
building large data warehouses
Benefits of Partitioning
Speed in managing sliding window
Partition manipulation outside of active table
Piecemeal Backup
Backup active components more frequently,
inactive less frequently
Partial Database Availability
If a filegroup becomes unavailable the
undamaged data remains available
Online Piecemeal Restore
During the restore, the undamaged data
remains available
Partial Database Availability
Improving Availability for Isolated Disaster
Undamaged data remains available while
damaged data is inaccessible
File Status shown in sys.database_files
catalog view
Page Errors written to suspect_pages
table in msdb
Agent alerts:
Notification of the damaged file
Can take the database offline, if desired
Can automate the restore, for read-only data
Database Components
Readonly
filegroups
Readwrite
filegroup
TicketSalesD
BPrimary
File1
File2
2004
File3
2003
File4
2002
File5
2001
Database consists of…
Filegroups consist of…
Files consist of…
Extents consist of…
Pages consist of data
File Header 0
4
8
12
16
20
24
28
1
5
9
13
17
21
25
29
2
6
10
14
18
22
26
30
3
7
11
15
19
23
27
31
File6
Log
extent
extent
extent
extent
0
1
2
3
…
Improving Data Availability, Part I
Partial Database Availability
Name
Title
Company
Online Piecemeal Restore
Improving Availability during Recovery
Almost any component (page, file, filegroup) can
be restored – ONLINE
If a page is damaged – restore only that page
from a file, filegroup or database backup
If a file is damaged – restore only that file from a
file, filegroup or database backup
If a filegroup is damaged – restore only that
filegroup from a filegroup or database backup
Readonly filegroups can be restored without
rolling forward log changes
Users can access the database during the
restore
Instant File Initialization
Improving Availability by Reducing Downtime
SQL Server™ 2000
All data and log files must be zero initialized
Downtime during recovery negatively impacted by the
file creation phase of restore
SQL Server™ 2005
Only log files must be zero initialized
Downtime during recovery significantly reduced by
skipping zero initialization during the file creation
phase of restore
Not only a benefit to Restore
Database Creation
All Restores: File, Filegroup and Database Restores
Database File Changes: autogrow, manual resizing
Improving Data Availability, Part I
Online Piecemeal Restore
Name
Title
Company
Summary: Isolated Failures
Technology
Partial
Database
Availability
Instant File
Initialization
Online
Piecemeal
Restore
Improves
When
Data Availability
Undamaged data/partitions remains
Upgrade
available
Immediate
Recovery Time
Recover only that which is damaged
Database Creation Time
Files are not zero-initialized
File, Filegroup, and Database Restore
Upgrade
Missing files are created quickly
Immediate
Autogrow and Manual Growth time
Additional space is quickly added
Recovery Time – Less time to create files
Data Availability
Undamaged data/partitions remains
Upgrade
available during recovery
Immediate
Recovery Time
Recover only that which is damaged – online
Barriers to Availability
Concurrency Requirements
Database is available but the application/user
cannot complete required operations
What about operational impacts?
Maintenance Operations which cause blocking
New Online Index Rebuilds
What about application impact?
Poorly designed and/or long running transactions
Varying data access patterns
New Snapshot Isolation options
What happens when…
Indexes need to be rebuilt
In SQL Server™ 2000
Index rebuilds require an exclusive table-level lock,
resulting in offline rebuilds
Users are unable to access the table
In SQL Server™ 2005
Rebuilds of an index can be performed online if a
few simple criteria are met
Users are able to access the table
Online Index Operations
Improving Concurrency during Index
Maintenance
SQL Server™ 2000
Offline Index Rebuilds;
table data is unavailable
during operation
Rebuild options: DBCC DBREINDEX and CREATE
with DROP_EXISTING
SQL Server™ 2005
Includes all of the above offline operations, plus…
New ALTER INDEX…REBUILD:
ONLINE – allows concurrent user access (queries as well as
modifications) to the index during rebuild
OFFLINE – works using locks (same as SQL Server™ 2000)
If online is not possible by default, consider design
alternatives to fully leverage online index rebuilds
Online Index Rebuilds
Name
Title
Company
What happens when…
Readers and Writers desire the same data
In SQL Server™ 2000
Locking is used to guarantee the intended level of isolation
Users must wait to access locked data
Concurrency and performance compromised
Correctness is compromised when lower isolation levels are
used to avoid locking
In SQL Server™ 2005
Locking OR Versioning can be used to guarantee a variety of
isolation levels
With versioning, Readers won’t block writers and writers won’t
block readers
Performance improved if contention was primary bottleneck
Correctness is not compromised due to use of lower isolation
levels
Snapshot Isolation
Improving Concurrency in Mixed
Workloads
SQL Server™ 2000
Isolation implemented solely through locking
Mixed workloads may experience:
Concurrency problems due to blocking
The Inconsistent Analysis problem
SQL Server™ 2005
Isolation implemented using locking and versioning
Mixed workloads can improve read consistency and
performance using:
Read committed with Statement-level snapshot to improve
statement-level consistency
Snapshot Isolation to improve transaction-level consistency
Snapshot Isolation
Name
Title
Company
Summary: Concurrency Requirements
Technology
Improves
When
Online Index
Operations
Table concurrency
tables being rebuilt remain available
Downtime due to Maintenance
no longer required for majority of indexes
Minimal Work
to Leverage
Design and
Architect
Snapshot
Isolation
Statement-level Snapshot
Snapshot
Isolation
Transaction-level Snapshot
for read-consistency
Transaction-level Snapshot
for Update Conflict
Resolution
Row concurrency
locked rows prior and consistent version
remain accessible
Accuracy
long running aggregates/statements use
consistent version from statement start
Analysis/Query Time
Queries do not wait!
Row concurrency
locked rows prior and consistent version
remain accessible
Accuracy
long running aggregates/statements use
consistent version from transaction start
Analysis/Query Time
Queries do not wait!
Minimal Work
to Leverage
Minimal Work
to Leverage
Design and
Architect
Improved Availability
with Standby Technologies
Downtime is reduced and/or prevented for
these barriers:
Site is unavailable
System is unavailable
Database is unavailable
Barriers to Availability
Catastrophic Failures
Database is completely unavailable
Server is unavailable
Site is unavailable
Standby Technologies
Failover Clustering
Database Mirroring
Replication
Log Shipping
Supporting Technology
Fast Recovery
How do they work?
Failover Clustering
Server-level Redundancy
Established High Availability Technology
Hot Standby: Automatic Detection and
Automatic Failover
No work loss exposure and no direct impact
to workload
Protects against node failures
Geographically Dispersed
Failover Clusters with
approved hardware
Recovery on failover
improved by Fast Recovery
Failover Cluster
Failover Clustering
New for SQL Server™ 2005
Faster Failover through Fast Recovery
Supports up to an 8-node Failover Cluster with
Enterprise Edition
Supports up to a 2-node Failover Cluster with
Standard Edition
Supports mounted volumes for better explicit
disk usage – helps in server consolidation
Supports dynamic AWE for better memory
utilization
Unattended setup
All SQL Server data services participate
Database Engine, SQL Server Agent, Full-Text Search
Analysis Services – Now has multiple instances
Fast Recovery
Improving Availability by Reducing Downtime
Not only beneficial to Failover Clustering
On every server startup, Restart Recovery
runs to guarantee consistency
Restart Recovery has two phases:
REDO: rolls forward committed transactions
UNDO: rolls back any incomplete transactions
In SQL Server™ 2005, users are allowed
access after REDO
SQL Server™ 2005
Redo
ONLINE
SQL Server™ 2000
Undo
ONLINE
Database Mirroring
Database-level Redundancy
Upcoming High Availability Technology
Released for testing and prototyping in
SQL Server™ 2005 RTM
Certified for Production Use in the first half
of 2006
Supports three configurations:
High Availability
High Protection
High Performance
Database Mirroring
Technology Overview
Principal Database handles user activity
Mirror Database receives changes via secure,
dedicated TCP channel
Server does NOT require a license if the server acts
solely for redundancy
Optional Witness Server
Lightweight mechanism to help provide quorum
Can run on any SQL Server Edition
Supports three configurations:
High Availability
High Protection
High Performance
Database Mirroring
Basic Principal of Synchronous Mirroring
Acknowledge
Acknowledge
Commit
Constantly
Redoing on
Mirror
Transmit to Mirror
Write to
Local
Log
DB
Committed
in Log
Log
Write to
Remote Log
Log
DB
Database Mirroring
Configuration Summary
High
Availability
High Protection
High
Performance
Automatic
Detection
No Automatic
Detection
No Automatic
Detection
Automatic Failover
Manual Failover
Manual Failover
Uses synchronous
form of mirroring
Uses synchronous
form of mirroring
Uses asynchronous
form of mirroring
Requires Witness
Does not require
Witness
Does not require
Witness
Principal
performance is
Principal
Principal
affected by
performance is
performance is
speed
affected within
by
affected by
network
Mirror database
is available
secondsNOT
of failover
distance
network speed and
network speed and
and
Mirror
database is available
for read-only analysis
through
distance
distance
the use of Database Snapshots
Database Mirroring
Name
Title
Company
Database Scale Out
Peer to Peer Replication
Identical databases
continuously
synchronize in near
real time
Scale query
workloads beyond
what’s possible with a
single database
Example: Distributed Trading System
Chicago
London
Tokyo
Availability through Scalability
Peer to Peer Replication
Enables load-balancing and improved
availability through scalability
Database failures shouldn’t bring down the
application system
Database upgrades should be done without
outages
Individual databases can be taken
online/offline and maintained without
application downtime
Warm Standby
Small possibility of some data loss on failure
Peer-to-Peer Replication
Based on Established Transaction Replication
Technology
Based on Bi-directional Transactional Replication
All participants are peers
Schema is identical on all sites
Publish the updates made on “their” data
Subscribe to others to pick up their changes
No hierarchy as in “normal” transactional replication
A given set of data can be updated at only one site at a
time
Data “ownership” is purely logical; does not prevent conflicts
SQL Server prevents a change from round-tripping
Peer to Peer Topology
London
Chicago
Logreader
Agent
Logreader
Agent
Peer to Peer
Transactional
Replication
Dist
DB
Dist
DB
Distribution
Agent
Tokyo
Logreader
Agent
Dist
DB
Distribution
Agent
Distribution
Agent
Peer-to-Peer Replication
Name
Title
Company
Log Shipping
Database-level Redundancy
Established High Availability Technology
Supports multiple secondary servers
Secondary for Failover
Secondary for Reporting
Secondary with delay for Human Error Recovery
Can be combined with other technologies such
as Failover Clustering and Database Mirroring
New for SQL Server™ 2005
Integration in SQL Server Management Studio
Log Shipping is not delayed during Database or
Differential Backups
Summary: Standby Technologies
Technology
Standby
Protection
When
Failover Clustering
Hot
Server
Minimal Work to
Leverage
Database Mirroring
Hot
Database
Minimal Work to
Leverage
Log Shipping
Warm
Database
Minimal Work to
Leverage
Database Mirroring
Warm
Database
Minimal Work to
Leverage
Warm
Database
Minimal Work to
Leverage
Replication
Warm
Database
Partial Database Availability
Online
Filegroup/File
Design and
Architect
Upgrade
Immediate
Cold
Database
Filegroup/File
Upgrade
Immediate
High Availability Configuration
High Protection Configuration
Database Mirroring
High Performance Configuration
Backup and Restore
(Publication of objects)
Barriers to Availability
People
I’m going to
modify this
data…right
…
here
!
This job would be great
if it weren’t for…
…the users
…the staff
…us
Barriers to Availability
Many more barriers than discussed
Only some are addressable by database technology
Be sure to consider people, planning, procedures and training
Microsoft SQL Server™ 2005 gives you greatly
improved tools to overcome these barriers to
availability:
Database Server Failure or Disaster
Isolated Disk Failure
Data Access Concurrency Limitations
Database Maintenance and Operations
Availability at Scale
User or Application Error
Summary
SQL Server™ 2005 offers greater
availability – immediately
Many technologies available just by
upgrading!
Some architected/implemented over time
SQL Server™ 2005 is more Available
Partially damaged databases remain available
Databases being recovered remain available
Instant File Initialization, Fast Recovery
New and Improved Replication Alternatives
SQL Server™ 2005 is more Robust
Take Advantage When?
How much work to leverage the technology?
Partial Database
Availability
Online Piecemeal
Restore
Instant File
Initialization
Online Index
Operations
Snapshot Isolation
Statement-level Snapshot
Failover Clustering
Database Mirroring
Log Shipping
Database Snapshots
Snapshot Isolation
With Update Conflict
Detection
Snapshot Isolation
Transaction-level Snapshot
(RO)
Online Index
Operations
When Criteria NOT Met
(minority)
When Criteria Met
Fast Recovery
Design and
Architect
Minimal Work to
Leverage
Upgrade
Immediate
Replication
Improving Availability from Installation to Design
Availability in Layers to minimize downtime and data loss