Building Mission Critical Systems with SQL Server 2005

Download Report

Transcript Building Mission Critical Systems with SQL Server 2005

Building Highly Available
Systems with SQL Server™
2005
Availability
What does it mean to you?
 Why not?
Can your customers get done,
what they need to get done,
when they need to do it?
 Site is unavailable
 System is unavailable
 Database is unavailable
 Database is partially unavailable
 Table is unavailable
 Data is unavailable
24x7x365
Take Advantage When?
How much work to leverage the technology?

Partial Database
Availability

Online Piecemeal
Restore


Instant File
Initialization


Online Index
Operations


Snapshot Isolation
Statement-level Snapshot



Failover Clustering

Database Mirroring

Log Shipping

Database Snapshots
Snapshot Isolation
With Update Conflict
Detection
Snapshot Isolation
Transaction-level Snapshot
(RO)
Online Index
Operations
When Criteria NOT Met
(minority)
When Criteria Met
Fast Recovery

Design and
Architect
Minimal Work to
Leverage
Upgrade
Immediate

Replication
Improving Availability from Installation to Design
Availability in Layers to minimize downtime and data loss
Improved Data Availability
without Requiring Standby
 Downtime is reduced and/or prevented for
these barriers:
 Database is partially unavailable
 Table is unavailable
 Data is unavailable
Barriers to Availability
Isolated Failures
 Continuing to work with isolated failures
 Limiting the scope of failure
 Partial Database Availability
 Online Piecemeal Restore
 Supporting Technology
 Instant File Initialization
 How do they work?
What happens when…
 Disks Fail
 In SQL Server™ 2000
 Database is marked suspect
 Users are unable to access the database
 In SQL Server™ 2005
 Filegroup is marked offline
 Users are able to access undamaged data
What happens when…
 Recovery begins
 In SQL Server™ 2000
 Database is in a restoring state
 Users are unable to access the database
 File needs to be recreated and zero initialized
 File Restore can proceed – offline
 In SQL Server™ 2005
 Filegroup is in a restoring state
 Users are able to access undamaged data
 File can be recreated with instant file Initialization
 Piecemeal Restore can proceed – online
How is This Possible?
 Fine grained operations are based on “functional
partitioning”
 Partitioning – in this sense – does not require
Partitioned Tables
 Partitioned Tables benefit significantly from fine
grained operations
 Partitioning for fine grained operations requires
secondary, non-primary data files where data is
strategically placed
 Recovery of your damaged devices can be
prioritized and then the database can be brought
online in stages
Functional Partitioning
Strategies to separate Objects/Data
 Related Object-groupings
 Separate tables – strategically placed on different
filegroups
 Time-based data placement/partitioning
 Structures designed for sliding window scenario
 List-based groupings/partitioning
 Range-based partitioning based on complete lists
 To fully leverage Partial Database Availability for
partitioned objects – use Partitioned Tables
 Partitioned Tables – new feature in SQL
Server™ 2005 to further simplify the process of
building large data warehouses
Benefits of Partitioning
 Speed in managing sliding window
 Partition manipulation outside of active table
 Piecemeal Backup
 Backup active components more frequently,
inactive less frequently
 Partial Database Availability
 If a filegroup becomes unavailable the
undamaged data remains available
 Online Piecemeal Restore
 During the restore, the undamaged data
remains available
Partial Database Availability
Improving Availability for Isolated Disaster
 Undamaged data remains available while
damaged data is inaccessible
 File Status shown in sys.database_files
catalog view
 Page Errors written to suspect_pages
table in msdb
 Agent alerts:
 Notification of the damaged file
 Can take the database offline, if desired
 Can automate the restore, for read-only data
Database Components
Readonly
filegroups
Readwrite
filegroup
TicketSalesD
BPrimary
File1
File2
2004
File3
2003
File4
2002
File5
2001
 Database consists of…
 Filegroups consist of…
 Files consist of…
 Extents consist of…
Pages consist of data
File Header 0
4
8
12
16
20
24
28
1
5
9
13
17
21
25
29
2
6
10
14
18
22
26
30
3
7
11
15
19
23
27
31
File6
Log
extent
extent
extent
extent
0
1
2
3
…
Improving Data Availability, Part I
Partial Database Availability
Name
Title
Company
Online Piecemeal Restore
Improving Availability during Recovery
 Almost any component (page, file, filegroup) can
be restored – ONLINE
 If a page is damaged – restore only that page
from a file, filegroup or database backup
 If a file is damaged – restore only that file from a
file, filegroup or database backup
 If a filegroup is damaged – restore only that
filegroup from a filegroup or database backup
 Readonly filegroups can be restored without
rolling forward log changes
 Users can access the database during the
restore
Instant File Initialization
Improving Availability by Reducing Downtime
 SQL Server™ 2000
 All data and log files must be zero initialized
 Downtime during recovery negatively impacted by the
file creation phase of restore
 SQL Server™ 2005
 Only log files must be zero initialized
 Downtime during recovery significantly reduced by
skipping zero initialization during the file creation
phase of restore
 Not only a benefit to Restore
 Database Creation
 All Restores: File, Filegroup and Database Restores
 Database File Changes: autogrow, manual resizing
Improving Data Availability, Part I
Online Piecemeal Restore
Name
Title
Company
Summary: Isolated Failures
Technology
Partial
Database
Availability
Instant File
Initialization
Online
Piecemeal
Restore
Improves
When
 Data Availability
Undamaged data/partitions remains
Upgrade
available
Immediate
 Recovery Time
Recover only that which is damaged
 Database Creation Time
Files are not zero-initialized
 File, Filegroup, and Database Restore
Upgrade
Missing files are created quickly
Immediate
 Autogrow and Manual Growth time
Additional space is quickly added
 Recovery Time – Less time to create files
 Data Availability
Undamaged data/partitions remains
Upgrade
available during recovery
Immediate
 Recovery Time
Recover only that which is damaged – online
Barriers to Availability
Concurrency Requirements
 Database is available but the application/user
cannot complete required operations
 What about operational impacts?
 Maintenance Operations which cause blocking
 New Online Index Rebuilds
 What about application impact?
 Poorly designed and/or long running transactions
 Varying data access patterns
 New Snapshot Isolation options
What happens when…
 Indexes need to be rebuilt
 In SQL Server™ 2000
 Index rebuilds require an exclusive table-level lock,
resulting in offline rebuilds
 Users are unable to access the table
 In SQL Server™ 2005
 Rebuilds of an index can be performed online if a
few simple criteria are met
 Users are able to access the table
Online Index Operations
Improving Concurrency during Index
Maintenance
 SQL Server™ 2000
 Offline Index Rebuilds;
table data is unavailable
during operation
 Rebuild options: DBCC DBREINDEX and CREATE
with DROP_EXISTING
 SQL Server™ 2005
 Includes all of the above offline operations, plus…
 New ALTER INDEX…REBUILD:
 ONLINE – allows concurrent user access (queries as well as
modifications) to the index during rebuild
 OFFLINE – works using locks (same as SQL Server™ 2000)
 If online is not possible by default, consider design
alternatives to fully leverage online index rebuilds
Online Index Rebuilds
Name
Title
Company
What happens when…
 Readers and Writers desire the same data
 In SQL Server™ 2000
 Locking is used to guarantee the intended level of isolation
 Users must wait to access locked data
 Concurrency and performance compromised
 Correctness is compromised when lower isolation levels are
used to avoid locking
 In SQL Server™ 2005
 Locking OR Versioning can be used to guarantee a variety of
isolation levels
 With versioning, Readers won’t block writers and writers won’t
block readers
 Performance improved if contention was primary bottleneck
 Correctness is not compromised due to use of lower isolation
levels
Snapshot Isolation
Improving Concurrency in Mixed
Workloads
 SQL Server™ 2000
 Isolation implemented solely through locking
 Mixed workloads may experience:
 Concurrency problems due to blocking
 The Inconsistent Analysis problem
 SQL Server™ 2005
 Isolation implemented using locking and versioning
 Mixed workloads can improve read consistency and
performance using:
 Read committed with Statement-level snapshot to improve
statement-level consistency
 Snapshot Isolation to improve transaction-level consistency
Snapshot Isolation
Name
Title
Company
Summary: Concurrency Requirements
Technology
Improves
When
Online Index
Operations
 Table concurrency
tables being rebuilt remain available
 Downtime due to Maintenance
no longer required for majority of indexes
Minimal Work
to Leverage
Design and
Architect
Snapshot
Isolation
Statement-level Snapshot
Snapshot
Isolation
Transaction-level Snapshot
for read-consistency
Transaction-level Snapshot
for Update Conflict
Resolution
 Row concurrency
locked rows prior and consistent version
remain accessible
 Accuracy
long running aggregates/statements use
consistent version from statement start
 Analysis/Query Time
Queries do not wait!
 Row concurrency
locked rows prior and consistent version
remain accessible
 Accuracy
long running aggregates/statements use
consistent version from transaction start
 Analysis/Query Time
Queries do not wait!
Minimal Work
to Leverage
Minimal Work
to Leverage
Design and
Architect
Improved Availability
with Standby Technologies
 Downtime is reduced and/or prevented for
these barriers:
 Site is unavailable
 System is unavailable
 Database is unavailable
Barriers to Availability
Catastrophic Failures
 Database is completely unavailable
 Server is unavailable
 Site is unavailable
 Standby Technologies
 Failover Clustering
 Database Mirroring
 Replication
 Log Shipping
 Supporting Technology
 Fast Recovery
 How do they work?
Failover Clustering
Server-level Redundancy
 Established High Availability Technology
 Hot Standby: Automatic Detection and
Automatic Failover
 No work loss exposure and no direct impact
to workload
 Protects against node failures
 Geographically Dispersed
Failover Clusters with
approved hardware
 Recovery on failover
improved by Fast Recovery
Failover Cluster
Failover Clustering
New for SQL Server™ 2005
 Faster Failover through Fast Recovery
 Supports up to an 8-node Failover Cluster with





Enterprise Edition
Supports up to a 2-node Failover Cluster with
Standard Edition
Supports mounted volumes for better explicit
disk usage – helps in server consolidation
Supports dynamic AWE for better memory
utilization
Unattended setup
All SQL Server data services participate
 Database Engine, SQL Server Agent, Full-Text Search
 Analysis Services – Now has multiple instances
Fast Recovery
Improving Availability by Reducing Downtime
 Not only beneficial to Failover Clustering
 On every server startup, Restart Recovery
runs to guarantee consistency
 Restart Recovery has two phases:
 REDO: rolls forward committed transactions
 UNDO: rolls back any incomplete transactions
 In SQL Server™ 2005, users are allowed
access after REDO
SQL Server™ 2005
Redo
ONLINE
SQL Server™ 2000
Undo
ONLINE
Database Mirroring
Database-level Redundancy
 Upcoming High Availability Technology
 Released for testing and prototyping in
SQL Server™ 2005 RTM
 Certified for Production Use in the first half
of 2006
 Supports three configurations:
 High Availability
 High Protection
 High Performance
Database Mirroring
Technology Overview
 Principal Database handles user activity
 Mirror Database receives changes via secure,
dedicated TCP channel
 Server does NOT require a license if the server acts
solely for redundancy
 Optional Witness Server
 Lightweight mechanism to help provide quorum
 Can run on any SQL Server Edition
 Supports three configurations:
 High Availability
 High Protection
 High Performance
Database Mirroring
Basic Principal of Synchronous Mirroring
Acknowledge
Acknowledge
Commit
Constantly
Redoing on
Mirror
Transmit to Mirror
Write to
Local
Log
DB
Committed
in Log
Log
Write to
Remote Log
Log
DB
Database Mirroring
Configuration Summary
High
Availability
High Protection
High
Performance

Automatic
Detection

No Automatic
Detection

No Automatic
Detection

Automatic Failover

Manual Failover

Manual Failover

Uses synchronous
form of mirroring

Uses synchronous
form of mirroring

Uses asynchronous
form of mirroring

Requires Witness

Does not require
Witness

Does not require
Witness
Principal
performance is
 Principal
 Principal
affected by
performance is
performance is
speed
affected within
by
affected by
 network
Mirror database
is available
secondsNOT
of failover
distance
network speed and
network speed and
 and
Mirror
database is available
for read-only analysis
through
distance
distance

the use of Database Snapshots
Database Mirroring
Name
Title
Company
Database Scale Out
Peer to Peer Replication
 Identical databases
continuously
synchronize in near
real time
 Scale query
workloads beyond
what’s possible with a
single database
Example: Distributed Trading System
Chicago
London
Tokyo
Availability through Scalability
Peer to Peer Replication
 Enables load-balancing and improved
availability through scalability
 Database failures shouldn’t bring down the
application system
 Database upgrades should be done without
outages
 Individual databases can be taken
online/offline and maintained without
application downtime
 Warm Standby
 Small possibility of some data loss on failure
Peer-to-Peer Replication
 Based on Established Transaction Replication
Technology
 Based on Bi-directional Transactional Replication
 All participants are peers
 Schema is identical on all sites
 Publish the updates made on “their” data
 Subscribe to others to pick up their changes
 No hierarchy as in “normal” transactional replication
 A given set of data can be updated at only one site at a
time
 Data “ownership” is purely logical; does not prevent conflicts
 SQL Server prevents a change from round-tripping
Peer to Peer Topology
London
Chicago
Logreader
Agent
Logreader
Agent
Peer to Peer
Transactional
Replication
Dist
DB
Dist
DB
Distribution
Agent
Tokyo
Logreader
Agent
Dist
DB
Distribution
Agent
Distribution
Agent
Peer-to-Peer Replication
Name
Title
Company
Log Shipping
Database-level Redundancy
 Established High Availability Technology
 Supports multiple secondary servers
 Secondary for Failover
 Secondary for Reporting
 Secondary with delay for Human Error Recovery
 Can be combined with other technologies such
as Failover Clustering and Database Mirroring
 New for SQL Server™ 2005
 Integration in SQL Server Management Studio
 Log Shipping is not delayed during Database or
Differential Backups
Summary: Standby Technologies
Technology
Standby
Protection
When
Failover Clustering
Hot
Server
Minimal Work to
Leverage
Database Mirroring
Hot
Database
Minimal Work to
Leverage
Log Shipping
Warm
Database
Minimal Work to
Leverage
Database Mirroring
Warm
Database
Minimal Work to
Leverage
Warm
Database
Minimal Work to
Leverage
Replication
Warm
Database
Partial Database Availability
Online
Filegroup/File
Design and
Architect
Upgrade
Immediate
Cold
Database
Filegroup/File
Upgrade
Immediate
High Availability Configuration
High Protection Configuration
Database Mirroring
High Performance Configuration
Backup and Restore
(Publication of objects)
Barriers to Availability
People
I’m going to
modify this
data…right
…
here
!
This job would be great
if it weren’t for…
…the users
…the staff
…us
Barriers to Availability
Many more barriers than discussed
Only some are addressable by database technology
Be sure to consider people, planning, procedures and training
 Microsoft SQL Server™ 2005 gives you greatly
improved tools to overcome these barriers to
availability:
 Database Server Failure or Disaster
 Isolated Disk Failure
 Data Access Concurrency Limitations
 Database Maintenance and Operations
 Availability at Scale
 User or Application Error
Summary
 SQL Server™ 2005 offers greater
availability – immediately
 Many technologies available just by
upgrading!
 Some architected/implemented over time
 SQL Server™ 2005 is more Available
 Partially damaged databases remain available
 Databases being recovered remain available
 Instant File Initialization, Fast Recovery
 New and Improved Replication Alternatives
 SQL Server™ 2005 is more Robust
Take Advantage When?
How much work to leverage the technology?

Partial Database
Availability

Online Piecemeal
Restore


Instant File
Initialization


Online Index
Operations


Snapshot Isolation
Statement-level Snapshot



Failover Clustering

Database Mirroring

Log Shipping

Database Snapshots
Snapshot Isolation
With Update Conflict
Detection
Snapshot Isolation
Transaction-level Snapshot
(RO)
Online Index
Operations
When Criteria NOT Met
(minority)
When Criteria Met
Fast Recovery

Design and
Architect
Minimal Work to
Leverage
Upgrade
Immediate

Replication
Improving Availability from Installation to Design
Availability in Layers to minimize downtime and data loss