Transcript Slide

Gopal Ashok
Program Manager
Microsoft Corp
What is this talk about?
Ensuring IT services
and operational
continuity in the
enterprise
Protect mission
critical SQL Server
databases using
Always On
Technologies
Maintenance
Analysis
Testing
Solution Design
Implementation
Deployments and Best Practices
Defining HA and DR
High availability is a system design
protocol and associated implementation
that ensures a certain absolute degree
of operational continuity during a given
measurement period
Disaster Recovery involves processes
and procedures designed to restore
business operations due to a natural or
human-induced disaster
Typically involves providing redundancy
spanning multiple sites or across
geographic regions
Availability defined in terms of service
level agreements (SLA)
Recovery Time
Data loss during unplanned
downtime
Recovery Time Objective (RTO) guided
by availability requirements
How much downtime can you tolerate?
Recovery Point Objective (RPO) guided
by criticality of application data
How much data can you lose?
Availability
Class
Acceptable
Downtime (hrs/yr)
OR RTO
Acceptable Data
Loss (time of last
copy) OR RPO
Tier 1
>99.99%
(1 hr or less)
5 min or less
Tier 2
99.9% - 99.99% (18.5 hrs)
5 mins to 8.5 hrs
Tier 3
(<99.9%)
(Hours to days)
Hours to days
Protection Levels
Protection against resource failures
Machine
Database Corruption
Disk
Resource Bottlenecks
Regional DR
Location Redundancy
Building
< 10 miles
Geographic DR
 Protection against
Local HA
 Natural Disasters
 Protection against
 Network Outages
 Site Failures
 Location Redundancy
– City, County
– < 100
 Location Redundancy
– State, Country
– > 100 miles
SQL Server High Availability Planning
Analysis
Application tiers serviced by the
databases
Protection levels: Local HA,
Regional DR, Geographic DR
Causes of database downtime
Maintenance
Analysis
Solution Design
Need to understand what solutions
exists?
What are the characteristics and
cost of the solution?
Implementation
What are the deployment steps
and best practices?
Testing
Solution Design
Implementation
Database Downtime Drivers
Solution Design
Understand the available technology options
and characteristics before making a decision
Solution
Architecture
HA Capabilities
Limitations and
Caveats
Cost Vector
Always On Technologies
Provides a full
range of options
to minimize
operational
downtime and
maintain
appropriate levels
of application
availability.
Always On Solution Characteristics
RPO
Redundancy and
Utilization
Failover
Cost
Hardware
App
Perf
Impact
Manageability
Low
Low
Low
*
Low
High
Low
*
Low
Low
Low
Cluster
High***
Low ***
Low***
Transactional
Replication
Low
Low
High
Peer-Peer
Replication
Low
Low
High
Solutions
No Data
Loss
(RPO=0)
Failover Unit
Inst
DB
Tab
Auto
Failover
(RTO)
Sync
Async
Multiple
*
Log Shipping
DBM
Read
+ **
Write
* Database Mirroring and Log Shipping can provide point in time read capability using STANDBY or
database snapshots respectively
** Database Mirroring provides fastest failover to hot secondary
*** Depends on SAN technology
Increasing Availability: ServiceU
Planned downtime:
Provide solutions for reserved
seat ticketing, box office
management, event
management and online
Payments
No Service = No Revenue
RPO = 0 (no data loss)
RTO = 60 seconds maximum;
some database changes may
require a longer downtime than
60 seconds; in those cases every
effort is made to minimize the
service interruption
Unplanned downtime:
Loss of a database server:
RPO = 0; that is, no data loss
RTO = 60 seconds maximum
Loss of the primary data center,
or the entire database storage
unit in the primary data center:
RPO = 3 minutes maximum;
RTO = 15 minutes total, including
evaluation of the issue;
ServiceU High Availability Architecture
Basic Principle: Redundancy for
all components
3-node cluster
Redundancy during single node
failure, patching etc
No Majority: Disk Only Quorum
Model
Availability during multi-node
failure
No automatic failback to
preferred node
ServiceU Disaster Recovery Architecture
Using Log Shipping to setup Mirroring
Upgrading to SQL Server 2008
Windows Server 2003\SQL Server 2005
Upgraded both OS and SQL Server to 2008
Had to do this with very little downtime
How much? Let’s find out!!!!
Primary Site Upgrade Process
Application Switch Over to
temp cluster
Establish async DBM
from 2005 to 2008
Block users
Sync mirroring
DBM Failover
Redirection
Remove DBM
Total end user down-time
10 minutes
Temporary SQL Server 2008 Cluster
On Windows Server 2008
Upgraded primary cluster to
2008
Repeated steps above
Downtime 6 minutes
Windows Server 2008 & SQL Server 2008 Better Together
Failover Clustering
Rolling upgrade and patching
16 nodes
Database Mirroring
Automatic recovery from page
corruption
Log stream compression
Faster recovery on failover
Resource Governor
Manage SQL Server workloads
and resources by specifying
limits on resource consumption
Backup Compression
Reduce backup and restore time
Log Shipping
Sub-Minute Log Shipping
Backup compression
Replication
Peer-Peer Replication: Hot add new
nodes
Improved performance over WAN
links
Database Mirroring Compression
Benefit
Cost
Automatic Page Repair
Rolling upgrade using Mirroring
Failure is not an option: bWin
Sports betting, Soft & skill
games
1 million bets per day on > 90
Sports
The Mission:
Failure is not an option &
Money is not a problem
Rather lose availability and
performance than data
Environment
100+ TB Data
850+ DB’s
100 Instances
450K SQL Statements\Sec
bWin High Availability Architecture
Datacenter A
Datacenter B
Principal: 32
IA64 Dual Core
CPU’s
Mirror 32: IA64
Single Core
Mirroring
Principal
Mirror
64 Network
Ports (1 Gbps)
400 local SAS
drives on 16
Log Shipping
1h delay
Log backup file server
Database backup file server
Log backup file server
LogShipping
No delay
Database backup file server
RAID controllers
(for OS, TempDB
and Log files –
low latency)
16 HBA’s for 256
Disk / 256GB
cache SAN
system
Scale Out and Availability Scenario
Adventureworks is
building a new web
based order
management system that
allows customers from all
over the world to access
the system and place
orders
The core group of
customers are in Western
Europe, South East Asia
and North America
 Requirements
– Geo Redundancy
– Data Locality
– High Availability
– Local Read-Scale
 Workload Characteristics
– Mainly reads
– Few writes
 Application Characteristics
– Each user logging in connects to a
particular server
 Partitioned based on user-id and region
 Writes from a user always happen on one
server regardless of the region the user log in
from
– All reads redirected to the closest geolocation
 Reasonable tolerance for latency (5-10 minutes)
Replication Topology
Asia1
Peer Nodes
Read-Only Servers
Asia2
Key to Success
It’s not the vendor!
It’s not the technology!
It’s not the features!
Licensing Facts
Passive servers are mirror, log shipped
secondary and clustering passive node
No license required on passive if it is
truly passive
A passive server does not need a
license if the number of processors in
the passive server is equal to or less
than the number of processors in the
active server.
The passive server can take the duties
of the active server for 30 days.
Afterwards, it must be licensed
accordingly.
HA Features Edition Support
Feature
Express
Workgroup
Standard
Database Mirroring
1
Failover Clustering
2
Enterprise
Comments
Advanced high
availability solution
that includes fast
failover and
automatic client
redirection
Backup Log-shipping
Data backup and
recovery solution
Online System
Changes
Includes Hot Add
Memory, dedicated
administrative
connection, and
other online
operations
Online Indexing
Online Restore
Fast Recovery
₁Single thread redo
₂ Limited to 2 node cluster
Database available
when undo
operations begin
Resources
www.microsoft.com/teched
www.microsoft.com/learning
Sessions On-Demand & Community
Microsoft Certification & Training Resources
http://microsoft.com/technet
http://microsoft.com/msdn
Resources for IT Professionals
Resources for Developers
Related Content
Breakout Sessions
DAT312 All You Needed to Know about Microsoft SQL Server 2008 Failover Clustering
Hands-on Labs
DAT12-HOL Microsoft SQL Server 2008 Database Mirroring, Part 1
DAT12-HOL Microsoft SQL Server 2008 Database Mirroring, Part 2
DAT05-HOL Microsoft SQL Server 2008 Data Snapshots
DAT07-HOL Microsoft SQL Server 2008 Peer-to-Peer Replication
DAT06-HOL Microsoft SQL Server 2008 Online Operations
Complete an evaluation
on CommNet and enter to
win an Xbox 360 Elite!
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should
not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.