Disaster Recovery - Information Technology at the Johns Hopkins

Download Report

Transcript Disaster Recovery - Information Technology at the Johns Hopkins

Learn @ Lunch
Disaster Recovery Coordinator – Architect
Jack of All Trades…………………Master of One
After a major outage event, restore application functionality to
critical customers in the shortest possible timeframe and with the
least amount of impact.
Disaster scenario is loss of Mt. Washington Data Center for an
extended period of time
Learn @ Lunch
Organization
Disaster Recovery Architect – Arnold Jenkins
Disaster Recovery Testing and Recovery Environment
Network Infrastructure Coordination
Rules of Engagement Coordination
Disaster Recovery Coordinator – Dave Brooks
Test Scheduling and Coordination
Rules of Engagement Coordination
Disaster Recovery Testing and Recovery Documentation
Learn @ Lunch
What Kind of framework have we built to get there ?
Rules of Engagement
Testing Guidelines
Critical Applications List
Data Recovery Techniques
Hardware Test/Recovery Environment
Network Test Environment
What types of skill sets are required to accomplish it ?
Project Management
Storage Technologies
General Background In I.T. Components
Risk Assessment Strategies
Disaster Recovery
Business Continuity
Learn @ Lunch
What Kind of framework have we built to get there ?
Rules of Engagement
Interview Survey
Inventories (Hardware, software, personnel, network, dependencies)
Testing Guidelines
Test Plans
Test Recaps
Test Timelines
Test Phases (Crawl – Walk – Run)
Test Objectives (Primary – Secondary – Hip Pocket)
Pre and Post Test Debrief Meetings
Applications – Infrastructure - Closed Network
Recovery Plans
Critical Applications List
Recovery Time Objectives
Recovery Point Objectives
Order Of Recovery
Disaster Recovery Plan – Business Continuity Plan Relationship
Data Recovery Techniques
Traditional Tape Restore
Peer To Peer Copy
Site Recovery Manager
SAN to SAN
Learn @ Lunch
What types of skill sets are required to accomplish it ?
Project Management
Working Knowledge of:
Storage Technologies (SAN to SAN, PPRC, SRM)
Network
Mainframe
Midrange
WIN-Intel
Virtualization
Risk Assessment Strategies
Single Point of Failure Analysis
Application Criticality
Upstream/Downstream Dependencies
Disaster Recovery
Traditional Disaster Recovery
High Availability
Business Continuity
Event Scenarios
Alternate Resources
Working knowledge of business operations supported by critical
systems (clinical, teaching, research, administration)
Learn @ Lunch
Recent Disaster Recovery Test – Oct 14-15, 2010
Disaster Scenario: Loss of Mt. Washington Data Center
Test Window:
8:00am, Thursday, Oct 14th through 8:00am, Friday, Oct 15th
Test Sites:
Sungard Philadelphia Recovery Center
1830 Monument St. local recovery site
Eastern HS remote testing site
Scope of Test:
3 hardware environments (Mainframe, AIX, WIN-Intel)
85(+) People (tech support, network, infrastructure, applications, and customers)
20 production applications
5 infrastructure components (HIP, DNS, WINS, AD, SiteMinder)
3 data availability techniques (Tape restore, PPRC, SRM)
77 WIN-Intel Servers
190 Network IP addresses
Ancillary hardware brought in (Equinox, Zebra Printer, Wrist Band printer
Pentax workstation
Anticipate six customer signoffs for 7 recovered applications
Printed pharmacy labels, EPIC wrist bands
Learn @ Lunch
Why The Extensive Background ?
Infrastructure
HIP – Recovered and Available
DNS – Recovered and Available
WINS – Recovered and Available
Active Directory – Recovered and Available
SiteMinder - Recovered and Available
Mainframe JHH Regions - Recovered
S-FTP - Recovered - Available, and secure file transfers tested
Pharmacy/BDM– Recovered – Application Validation completed
Keane/ADT – Recovered – Back end processing verified. Web front end experienced problems – investigating 1
Chart Tracking - Recovered – Application and Customer validation completed – Anticipate Customer Signoff 1
IEPROD (Interface Engine) – Recovered and messaging
EPIC – Recovered – Application validation completed – wrist bands printed – messaged to Interface Engine
ORMIS – Recovered – Could not access reports – no applications validation conducted - investigating
POE – Recovered – Application validation completed - Messaged to Interface Engine. Wrist bands printed
EDMS – Recovered – Application validation completed
ISIS – Recovered – Application and Customer validation completed – Anticipate Customer Signoff
VPSX – Recovered and operational
PLUE – Recovered – Application validation – Printing from server successful
VisionChips – Recovered – Application and Customer validation completed – Anticipate Customer Signoff
WF - Recovered – Application and Customer validation completed – Anticipate Customer Signoff
HMED – Recovered – Application and Customer validation completed – Anticipate Customer Signoff
QS (Fetal Monitoring) – Recovered – Application and customer validation completed - Anticipate Customer Signoff
Pentax – Recovered – Application and Customer validation completed – Anticipate Customer Signoff
TheraDoc – Run – Application verification conducted, but not completed - investigating
BabySentry – Recovered – Application validation completed
Vision – Recovered – Application validation completed
Biosense – Recovered – Messaged to Interface Engine
Learn @ Lunch
I.T. @ J.H. Sharepoint sites for testing documentation
https://collaborate.johnshopkins.edu/sites/DRCustomers/default.aspx
Learn @ Lunch
Who Do We Interact With ?
Institutional Initiatives
JHH-JHHS Office of Emergency Management – Howie Gwon
(JHH, Bayview, JHU SoM)
JHU – Committee on Crisis Management – Jonathan Links
(Homewood, All Schools-All Locations)
JHMI – Critical Event Preparedness and Response (CEPAR) –
Dr. Gabe Kelen and Dianne Whyne – All of JHMI
Learn @ Lunch
Standards and Procedures Organizations
The Disaster Recovery Institute, International
https://www.drii.org/
Professional Practices
Program Initiation and Management
Risk Evaluation and Control
Business Impact Analysis
Business Continuity Strategies
Emergency Response and Operations
Business Continuity Plans
Awareness and Training Programs
Business Continuity Plan Exercise, Audit and Maintenance
Crisis Communications
Coordination with External Agencies
Certifications
Associate Business Continuity Professional (ABCP)
Certified Business Continuity Vendor (CBCV) Certified
Functional Continuity Professional (CFCP) Certified
Business Continuity Professional (CBCP) Master
Business Continuity Professional (MBCP)
Learn @ Lunch
Standards and Procedures Organizations
Degree Pursuits in Emergency Management
Undergraduate
University of Phoenix
University of Maryland University College
University of Maryland Eastern Shore
University of Maryland Baltimore County
University of Maryland College Park
Towson University
University of Baltimore
Salisbury State
Drexel University
University of Richmond
Graduate
Capella University
Virginia Commonwealth University
Colorado State University