Business Continuity

Download Report

Transcript Business Continuity

Business Continuity
and Disaster Recovery
Click to edit Master text styles
Andreas Tsangaris, Chief Technical Officer
PERFORMANCE
Disclaimer
This session may contain product features that are
currently under development.
This session/overview of the new technology represents
no commitment from VMware to deliver these features in
any generally available product.
Features are subject to change, and must not be included in contracts, purchase orders,
or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new technologies or features discussed or presented
have not been determined.
“These features are representative of feature areas under development. Feature commitments are subject to change, and must not be
included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final
delivery.”
Agenda
Business Continuity Requirements
Minimizing Downtime in the Datacenter
Providing Effective Disaster Recovery
Summary and Next Steps
Sources of Downtime
Solutions to reduce downtime need to address both planned and
unplanned downtime
Planned outages
(80-90% of downtime)
•
•
•
Hardware
maintenance
Firmware upgrades
Backup windows
Unplanned outages
(10-20% of downtime)
• Disasters
• Server
failures
• Software failures
• Storage failures
• User error
• Eliminating planned downtime can increase system availability by a
full order of magnitude
Flexible Recovery Point and minimal Recovery Time
Disaster Strikes
Last Backup or Point Where
Data is in Usable State
Systems Recovered
TIME
How far back?
How long to recover?
Recovery Point
Recovery Time
Data Loss and Time to Recover
Common Challenges:
–
Data loss of more than 24 Hours?
–
Recovery Time greater than 4 Hours?
Requirements for Business Continuity Solutions
Protection across operating systems and
applications
Protection against a broad
spectrum of downtime causes
Ensure minimum
interruption time
Independent of physical infrastructure
ESX and ESXi – Serious Availability
Proven by customers
–
–
–
–
Over 100,000 customers
Over seven years of maturation
Over 85% of customers using for
production workloads
Years of continuous uptime at customer
sites
Reliable by design
ESXi: 32MB on disk
Less code = fewer bugs,
fewer patches, etc.
No dependence on OS
or arbitrary drivers
2008 Editor’s Choice Awards
Most Reliable Category
1. VMware ESX
2. IBM mainframe
Agenda
Business Continuity Requirements
Minimizing Downtime in the Datacenter
–
–
Protection against failures
Eliminating planned downtime
Providing Effective Disaster Recovery
Summary and Next Steps
Hardware Failure Tolerance
Transforming Availability Service Levels
CONTINUOUS
VMware FT
AUTOMATED
RESTART
with VMware HA
UNPROTECTED
0%
100%
10%
Application Coverage
Fast Recovery from Hardware and Software Failures…
App Server
Exchange
File/Print
VMware Infrastructure
CPU Pool

Memory Pool
Storage Pool
Application vServices
–
–
VMotion, DRS
Update Manager
–
–
Storage VMotion
HA—High Availability
Interconnect Pool
High Availability vServices
Recover from Unplanned Downtime
VMware High Availability makes all
Servers and Applications protected
against component and complete
system failure.
Only One-Click to configure!
X
VMware HA Enhancements
Automatic restart of virtual machines in case
of physical server failures






X
32-node clusters
Additional isolation addresses
Configurable failure detection time
VMs are now restarted on hosts with most resources
Proactive cluster configuration checks
VM Failure monitoring (experimental) :
– Monitors virtual machines for guest OS failures
– Automatically restarts VM after specified interval
Simple, cost effective availability for
any workload
Resource Pool
Minimizes unplanned downtime due to
hardware and OS failures
High Availability vServices
Virtual Machine Monitoring
Set at cluster level
Applies to all VMs in the cluster
Can disable using “Restart
priority” for individual VM
Uses the VMtools heartbeat
X
Proactively Avoid Planned Downtime
VMotion
VMotion
Eliminating Downtime for Storage Changes
Examples
–
–
–
Redistributing load
Optimizing storage configuration
Storage refresh
Storage VMotion
–
–
LUN A1
LUN B1
LUN A2
LUN B2
Array A (off lease)
Array B (NEW)
Online migration of virtual machine
disks to new datastore
Zero downtime for applications and
users
Summary availabilty functionality
Planned
Unplanned
• Update Manager
• Network Port trunking
• Maintainance mode
• HA
• VMotion
• Site Recovery Manger
• Storage VMotion
• VCB / Snapshots
• Snapshots
Availability
New Solutions for Reduced Downtime
App
App
App
OS
OS
OS
ESX Server
Fault Tolerance
Zero downtime, zero data loss
continuous availability
Data Recovery
Integrated backup and recovery
appliance
Server
Storage
Availability
2009
vCenter Data Recovery
1. Backup
Agent-less, disk-based backup and recovery of
your VMs
VirtualCenter
VirtualCenter
VM or file level restore
1. Schedule backups via VC
2. Snapshots taken
3. Data de-duped and stored
Incremental backups and data
de-dupe to save disk space
De-duplicated
Storage
2. Restore
VirtualCenter
VirtualCenter
1. VM goes down
2. Select VM images/files
to recover
3. Restore…VM running
in seconds
Copyright © 2005 VMware, Inc. All rights reserved.
X
Quick, simple and complete data protection for
your VMs
Centralized Management through VirtualCenter
Cost Effective Storage Management
X
Futures: VMware Fault
Tolerance
X
Application protection against hardware
failures, with NO down time that is
Application and Operating System
Independent.
Agenda
Business Continuity Requirements
Minimizing Downtime in the Datacenter
Providing Effective Disaster Recovery
Summary and Next Steps
Virtual Datacenter OS from VMware
.Net
Windows
Application
vServices
Application
Management
vCenter
Infrastructure
Management
Linux
Availability
J2EE
Security
Grid
Scalability
Web 2.0
SaaS
…….
VMware Infrastructure -> Virtual Datacenter OS
• Site Recovery
Manager
Infrastructure
•
•
•
•
•
Lifecycle Manager
ConfigControl
Orchestrator
Capacity IQ
Chargeback
vServices
vCompute
vStorage
vNetwork
Cloud
vServices
Unplanned: Protecting from Hardware Failures
Complex Recovery
Processes and
Infrastructure
Dependent on Perfect
Training, Documentation,
and Execution
Failure to Meet Recovery
Requirements
 Recovery takes
days to weeks
 Recovery tests often fail
 Significant IT time and
resources consumed
Key Features of Virtualization for DR
Hardware
Independence
Hardware
Independence
Encapsulation
Partitioning and
Consolidation
Resource Pooling
Automate the Failover of an Entire Datacenter
Production
VMware Infrastructure
Recovery
VMware Infrastructure
Site Recovery Manager transforms disaster recovery
Site Recovery Manager Simplifies and Automates DR
Setup
•
•
•
Allocates recovery
resources
Integrates with replication
Helps build recovery
plans
Testing
Failover
•
•
Creates isolated test
environment
• Automates tests of
recovery plans
• Cleans up after tests
completed
Allocates resources for
recovery
• Prepares storage for
recovery
• Automates recovery
process
Ensure that disaster recovery is rapid, reliable, and manageable
Site Recovery Manager Use Cases

Target scenarios
–
–
–

Requirements
–
–

Restart of tens or hundreds of VMs in another datacenter
Restart can be unplanned (disaster) or planned (migration)
Can tolerate RTO of minutes to hours
Second site running VirtualCenter and ESX
Replicated Fibre Channel or iSCSI LUNs from supported storage vendors
SRM is not
–
–
A replication product
Geo-clustering for applications in VMs
So what does it look like?
Protected Site
VirtualCenter
Recovery Site
Site Recovery
Manager
Protected Site
Supports bidirectional site
protection
Recovery Site
Site Recovery
Manager
VirtualCenter
Protected VMs
powered on
offline
Protected VMs
online
become
in Protected
unavailable
Site
Array Replication
Datastore Groups
Datastore Groups
Disaster Recovery Setup
Integrate with replication
– Identify which virtual machines are
protected by replication configuration
Map recovery resources
– Network resources, server resources,
management objects
Create recovery plans
– For virtual machines, applications,
business units
– Convert manual runbook to preprogrammed response
– Customizable with scripting and callouts
Disaster Recovery Setup
Storage Partners
Integrate with replication
– Identify which virtual machines are
protected by replication configuration
Map recovery resources
– Network resources, server resources,
management objects
Create recovery plans
– For virtual machines, applications,
business units
– Convert manual runbook to preprogrammed response
– Customizable with scripting and callouts
Failover Automation
Detect site failures
–
Raise alert when heartbeat lost
Initiate failover
–
–
User confirmation of outage
Granular failover initiation
Manage replication failover
–
–
Break replication
Make replica visible to
recovery hosts
Execute recovery process
–
–
Use pre-programmed plan
Provide visibility into progress
Manage networking
–
–
Put VMs on right VLAN
Change IP addresses
Failover Automation
Detect site failures
–
Raise alert when heartbeat lost
Initiate failover
–
–
User confirmation of outage
Granular failover initiation
Manage replication failover
–
–
Break replication
Make replica visible to
recovery hosts
Execute recovery process
–
–
Use pre-programmed plan
Provide visibility into progress
Manage networking
–
–
Put VMs on right VLAN
Change IP addresses
Testing
Replication Management
–
–
Snapshot replicated LUNs
before test
Delete snapshots of replicated LUNs after test
Network Management
–
Change all virtual machines
to a test port group before powering them on
Customization/extensibility
–
–
Same breakpoints and callouts
as failover sequence
Extra breakpoints and callouts around the test
bubble
Testing
Replication Management
–
–
Snapshot replicated LUNs
before test
Delete snapshots of replicated LUNs after test
Network Management
–
Change all virtual machines
to a test port group before powering them on
Customization/extensibility
–
–
Same breakpoints and callouts
as failover sequence
Extra breakpoints and callouts around the test
bubble
Failback
Setup DR protection from DR site back to
primary site
–
–
–
Failover makes VMs reside at the DR site
Provide the failed-over VMs with protection
Same setup as was done for initial protection

Work with storage to reverse replication
Test failback
–
–
Test repeatedly – same mechanism as with test
failover
Only set the failback date after the plan is perfect
Failback to primary site
–
Just hit the failover button—failback is failover in
the reverse direction
SRM Benefit Summary
1
2
3
4
5
6
Accelerate Recovery
Ensure Reliable Recovery
Simplify Planning and Recovery
Expand Disaster Recovery Protection
Reduce Cost
Enable Compliance
Agenda
Business Continuity Requirements
Minimizing Downtime in the Datacenter
Providing Effective Disaster Recovery
Summary and Next Steps
VMware Infrastructure: The Safest Place To Run Applications
Prevent Planned Outages
Minimize Downtime from
Unplanned Outages
 NIC Teaming
Component
Server
Prevent Unplanned
Outages
• Multipathing
 DRS Maintenance Mode,
 VMotion
 HA
 Fault Tolerance
 VCB + Backup
ISV ways of BC/DR
All available
physical
hardware,
Virtualisation
enablesacross
new and
easier
products
Storage
Data
 Storage VMotion
operating
systems, and applications
 N/A
 Data Recovery
 VCB + Backup ISV
products
 Data Recovery
Site
 Site Recovery Manager
Next Steps
Learn more
–
Read more about VMware Business Continuity Solutions at
http://www.vmware.com/solutions/continuity/
–
Find more business continuity customer case studies at
http://www.vmware.com/customers/stories/index_continuity.html
Start your evaluation
–
VMware and partners can help you evaluate VMware software