Business Continuity & Disaster Recovery

Download Report

Transcript Business Continuity & Disaster Recovery

Business Continuity
&
Disaster Recovery
Lauren Farese – Oracle Corporation
Paul Christman – VERITAS Software
Walter Callahan – State of Ohio
What happened
th
on August 14 ,
2003?
Disasters happen every day...its a fact!
• Disasters cost money so why
suffer by being unprepared?
• Organizations that survive
typically have:
–
–
–
–
management foresight
tested procedures
processes
back-up facilities
• Business Continuity Planning
(BCP)
Downtime Costs Money
Downtime Per Year (7x24x365)
Percentage
Availability
Days
Hours
Minutes
95%
18
6
0
$250M
99%
3
15
36
$51M
99.9%
0
8
46
$5,003,312
99.99%
0
0
53
$504,136
99.999%
0
0
5
$47,560
99.9999%
0
0
1
$9,512
Cost$ *
Numbers assume $5B yearly revenue run rate.
* Oracle calculated costs and is not associated with the Standish Group Report
Business Continuity Planning vs.
Disaster Recovery Planning
• Both are directed at recovery of operations
• Business Continuity Planning is directed at the recovery
and resumption of business activities across the entire
enterprise
• Disaster Recovery Planning is usually directed at the
recovery of information technology systems and
business applications, including corporate data
• BCP addresses Processes, People and Property
Business Continuity Planning Phases
• Typically three phases
– Pre-Planning
– Planning
– Post-Planning
•
•
•
•
Critical success factor
Cost is always an issue
Executive ownership is critical
Must be a business priority
Phase One: Pre-Planning
•
Project initiation and management
– Establish a need
– Executive management ownership
– Time and budget allocation
•
Risk evaluation and control
– Events and environment issues
– Facilities and process evaluation
– Cost benefit analysis
•
Impact analysis
– Disruption and disaster scenarios
– Critical business functions
– Recovery time analysis
Phase Two: Planning
•
Develop continuity strategies
– Alternative organizational recovery
– Operations and information systems
– Adhere to recovery time objectives
•
Emergency response and operations
– Procedures for response and
stabilization
– Establish operations center
– Emergency command and control
•
Developing and implementing the plan
– Plan provides recovery within time
objective
Oracle BCM Business Flow
Disaster Recovery - Business Continuity Planning
Start
A
Global IT Bus ine s s
Ope rations
Rev iew changes made
to Global IT
env ironment (1)
Establish a
multidiscipline team (2)
Identif y Business
Continuity / Disaster
Recov ery team
members (3)
M ulti-dis cipline d
Dis as te r Re cove ry
Planning Te am
Does current
DR plan require
modif ication?
N
DR plan
passes tests?
Y
Approv al receiv ed?
A
A
Y
N
M ulti-dis cipline d
Dis as te r Re cove ry
Planning Te am
Identif y within current
plan areas that require
additional work to
mitigate new risk (7)
Dev elop new DR plan
(8)
Y
M ulti-dis cipline d
Dis as te r Re cove ry
Planning Te am
Modif y DR plan as
necessary & re-test
plan (11)
B
N
C
M ulti-dis cipline d
Dis as te r Re cove ry
Planning Te am
Modif y new DR plan to
address rev iewers
concerns (14)
Determine if
modif ications to plan
requires additional
testing (15)
C
Perf orm business risk
assessment to
determine current risk /
f uture risk prof ile (4)
Document &
communicate business
risk assessment results
& risk prof ile to Global
IT Senior Management
Team (5)
Rev iew current
Disaster Recov ery plan
to determine if new risk
prof ile is mitigated
within current DR plan
(6)
M ulti-dis cipline d
Dis as te r Re cove ry
Planning Te am
Determine what testing
has to be perf ormed on
DR plan (9)
M ulti-dis cipline d
Dis as te r Re cove ry
Planning Te am
Submit DR plan to
Senior Management f or
approv al (12)
Plan requires
additional testing due
to modif ications?
C
B
M ulti-dis cipline d
Dis as te r Re cove ry
Planning Te am
Test DR plan (10)
Y
N
Global IT Se nior
M anage m e nt
Rev iew new / changed
DR plan (13)
M ulti-dis cipline d
Dis as te r Re cove ry
Planning Te am
Re-submit to Senior
Management f or
approv al (16)
B
Phase Three: Post-Planning
•
Awareness and training
– Create organizational awareness
– Enhance skills
•
Maintaining and exercising
–
–
–
–
•
Coordinate plan exercises
Evaluate and document exercise results
Develop process to maintain the plan
Report results clearly and concisely
Coordination and communication
– Communication with media, families,
suppliers
– Crisis coordination with first responders,
local authorities
What about the
technology?
Match the Tools to the Business Needs
Wks Days Hrs Mins Secs
Recovery Point
Async.
Replication
Tape or Disk
Backup
Secs Mins Hrs Days Wks
Recovery Time
Sync. Clustering
Replication
Remote
Replication
Online
Restore Tape
Restore
Only as Good as the Weakest Link
Clients
Load Balancer
Web Cache
Application
Server Tier
Java Clusters
Database
Tier
BC/DR Must Address Every Component
• Network Infrastructure
• Data Storage – online, near-line and off-line
• Application servers and their offspring
Any component down = the entire system is un-usable
Network Infrastructure
• Wide Area Traffic Manager to direct client traffic to
proper site
• Network load balancer to distribute incoming requests
• Dedicated, fast link between sites
– Influences production database performance
• Redundant components and paths
– Network paths to the site and within the site
BC/DR Techniques for Data Storage
• Snapshots – frequent, within an array, FC, temporary
• Mirrors – frequent, in a different array, FC, temporary
• Replicas – synchronous or async, remote or local, FC
or IP, temporary or semi-permanent
• Near-Line Disk – infrequent, x-platform, FC or IP, BI
copy, DLM, or staging for backup
• Tape Backup – infrequent, FC or IP, required best
practice for DR
Application Availability with Local Clustering
Server 2
Instance ‘B’
Server 1
Instance ‘A’
Database
Protects from local server failures
Depends on shared available storage
Wide Area Clustering
• Extends local clustering model to several sites
• Requires data mirroring or replication
Cleveland
Columbus
Cincinnati
Sandusky
Wide Area Clustering
Site Migration
Failover
Replication
Key Steps to Success
•
•
•
•
•
•
Conduct a Business Impact Analysis
Identify which processes are truly critical and cost of BC
Prioritize investments in people and technology
Plan and Implement
Test, test, test!!!
Review the business continuity plan when the business
process changes
Real Life Example
Ohio Dept. of Public Safety
•
•
•
•
•
•
•
State Highway Patrol
Bureau of Motor Vehicles
Emergency Management Agency
Emergency Medical Services
Investigative Unit
Homeland Security
Administration
Data Center Facilities
• State of Ohio Computer Center –
–
–
–
–
West campus of Ohio State University
Primary site
Full data center facilities, i.e., UPS, Generator, Environmental
Operates light out
• Charles D. Shipley Building – Public Safety Headquarters,
1970 W. Broad Street
–
–
–
–
Approximately 4 miles apart
Secondary site
Full data center facilities, i.e., UPS, Generator, Environmental
Remote operations
Features
• OC48 Sonet ring between the buildings
– Moving to Gigabit Ethernet
• Mainframe environment has mirrored disks at primary
site, 3rd mirrored leg at secondary site
• Robotic tape silos at primary site, remote tape drives at
secondary site
• Redundant server with failover for law enforcement
• Servers at either site, mirror to other site
Decision Factors
• Prioritize business functions
• Work with business units for business continuity to
determine IT disaster planning levels
• Determine level of acceptable risks
–
–
–
–
Distance for secondary site
Hot versus cold site
Mirror data versus backups
Redundant servers with failover versus build new server at
time of disaster
“The pessimist sees difficulty in every
opportunity.
The optimist sees opportunity in every
difficulty”
- Winston Churchill