Best Practices for Disaster Recovery

Download Report

Transcript Best Practices for Disaster Recovery

Best Practices for Disaster Recovery
Design and Implementation
Damian Walch
Senior Vice President, Professional Services
Comdisco, Inc.
What We’ll Cover…
 Learning from the response to events of 09-11-01
 Addressing immediate actions to be taken NOW!
 Recovering the information flow
 Testing the strategies and plans
 Planning for an outage that can be catastrophic
Comdisco 528 Disasters Supported
Warning
Hurricane
Terrorism
Lightning
Software
Hardware
Tornado
Civil Unrest
Power Outage
Data Center Move
Flood
Environment
Fire
Network
Miscellaneous
Bomb
Earthquake
While companies think they’re immune to any long-term outage, more that onefourth of companies have experienced a disruption in the last 5 years, averaging eight
hours, or one business day.
Source: Comdisco Vulnerability index
Our Experience in the WTC Disaster
Lesson
 94 disasters declared related to the event
• 47 customers
• All platforms – mainframe, distributed, network and workarea
 Communications were very difficult
 Companies didn’t have backup staff for recovery
 Lack of “rally points” created more chaos and
added time to recovery process
 Mobile trailers can be essential for recovery
At-Time-of-Disaster Solutions
Best Practice
 Speed: Time to Deliver
 Good Teams: Experience in Crisis
 Measured Progress: Service Levels
 Networks: Wireless
 Innovation: Portal for Communication
Network of Vendors
Where to
Find It
Mobile Star
972.994.4900
www.mobilestar.com
CIT - Technology Rentals & Services
(Formerly Newcourt Financial)
800.227.5069
www.citgroup.com
Data Recovery Group
Aggreko
318.367.7884
www.aggreko.com
888.462.3299
www.datarecoverygroup.com
GE Capital
800.243.222
www.gecapital.com
Communications Portal
Solution
The efficient
and rapid
allocation of
resources
is key to the
quick
restoration of
critical services
and networks.
To do that, you
must have a
wealth of
current
intelligence
about your
resources
and what's
really
happening in
the field.
Recovery Event Sequence
Offsite Vital
Records
Event
Restore
Infrastructure
Immediate
Response
Restore
Application
Restore
Network
Restore
Data to RPO
Synchronize
Lost
Applications Data
Building
Blocks
Resume
Business
Interim
Site
Return
Home
Lost OS Data
Relocate
Business
Function
 Recovery Time Objective
Recreate
Lost Data
Transactions
Process
Backlog
Backlogged Transactions
Time required to recover critical systems to a functional state, often assumed to be
“back to normal” for those systems designated as mission critical.
 Recovery Point Objective
Point in time to which the information has been restored when the RTO has elapsed
and is dependent upon what is available from an offsite data storage location.
What is required for recovery?
Checklist
 Strategy:
Summary of below, documented.
 Data:
Applications identified, backed-up and taken
offsite.
 People:
Knowledgeable staff that understands DR and
critical images.
 Place:
Other locations identified with sufficient
capacity and testable.
 Network:
Capacity, equipment and software to restore
connectivity.
 Procedures: Action-oriented recovery plans
What to Include in Recovery
Strategy?
Good Idea
PC or
Computer
Interface
Router
or
ISP Portal
Router
or
ISP Portal
End User
LAN
Network
Data Storage
Services
Application / Database
Servers
How Do you Pick a Strategy?
Issue
Yesterday
Customer Data
Center
Comdisco Recovery
Center
Customer
Data Center
Today
Customer
Data Center
Tomorrow
Customer
Data Center
Comdisco
Recovery Center
Storage
Area
Networks
Comdisco Recovery
Center
Comdisco
Recovery Center
Considerations for Advanced
Recovery Solutions
LOCAL
PERFORMANCE
IMPACT
COST
RESYNCH
IMPACT
FAULT
RECOVERY
IO RATE
WIO RATE
Each
Customer has
unique
requirements
Decision
Point
DISTANCE
BANDWIDTH
Corporate Recovery Organization
Crisis
Management
Business Recovery
Teams
Business Recovery Coordinator
Critical Business Functions Only
Information Technology
Finance
Office Infrastructure Support
Building
Blocks
Looking for Alternative Site?
Checklist
 How much space is
available?
 Is the security desk manned
24 x 7?
 When is the space available?
 Is the building connected to
any of the SONET Fiber
Ring?
 How much outside parking?
 Does the parking lot have
lights?
 How many entrances and are
they secure?
 Is there security card access
into the building?
 How many hours per year do
the tenants experience
electrical outages in this
building?
 What network carriers are
providing service in
building?
 Are there telecom rooms on
each floor, shared?
 Is there a generator in the
building to provide power
backup in the event of power
failures? If so, is this
available to tenants?
How to Make Plans More Usable
Tip
 Don’t get into analysis – paralysis!!!
 Plans should be brief
• Nobody is going to use a plan that requires a binder
• They need to be action oriented
 You should be able to access or carry them
• Small enough to carry in a briefcase
• Utilize on a Personal Digital Assistant (PDA)
• Access via the internet
 Adaptable
• Just like programming, you can’t “hard code” information
• Give them guidelines and resources to address the situation
Conduct Realistic Testing
Tip
 Test at least once per year, but do it right!
 Require involvement from the staff that will
actually do the recovery…MAKE THEM AVAILABLE
 Use backups sometimes, at least inform them…
 Test the information flow
• Storage, databases and backend systems
• Infrastructure: including network, security and middleware
• Include end-users, workstations and servers
Resources for More Information
Resource
BOOKS
 Windows NT Backup & Recovery by
John McMains
 Disaster Recovery Planning for
Networks, Telecommunications and
Data Communications by Regis J.
Bates
 Oracle8i Backup & Recovery
by Rama Velpuri
 Disaster Recovery Planning and
Resources for Records Managers and
Librarians by Jacqueline Virando
 Blueprints for High Availability:
Designing Resilient Distributed
Systems by Evan Marcus
WEB SITES
 www.globalcontinuity.com
 www.drj.com
 www.comdisco.com
 www.survive.com
 www.gartnergroup.com
 www.rothstein.com
What Should Executives Ask?
Next Steps
 What is the state of recovery plans
and are they comprehensive?
 Ask the CIO if you have backups
completed regularly for critical
data on major systems or
workstations within the business
units?
 Revisit physical protection, user
authentication, access control,
encryption, security management
for networking and
communications.
 Do you have a command center for
the management team to discuss
activities and communicate?
 Discuss possible contracts for
replacement equipment or shipping
of assets from technology vendors.
 How would our customers contact
you in the event of an outage? Have
we redirected call traffic to an
alternate number?
 Do all executives understand their
altered role to be performed at time
of disaster and their successor?
 How are critical non-electronic
documents protected and where are
they stored, or they taken off-site?
The 7 Key Points to Take Home
 Know how you will communicate with
• Employees
• Customers
• Other Corporate Offices
 Develop and post rally points
 Develop a “portal” to communicate proactively
 Follow the “wire” and know the information flow
 Test with who will recover and use backups
 Plans should be brief, adaptable and portable
Your Turn!
Questions and Answers
Damian N. Walch
Senior Vice President, Professional Services
Comdisco, Inc.
847.518.7756
[email protected]