Oracle`s Recipe for building an Unbreakable System

Download Report

Transcript Oracle`s Recipe for building an Unbreakable System

Ashish Prabhu
Douglas Utzig
High Availability Systems Group
Server Technologies
Oracle Corporation
Maximum Availability
Architecture
Oracle's Recipe For Building
An Unbreakable System
Agenda
 Achieving High Availability
 Maximum Availability Architecture (MAA)
Overview
 MAA Components
 Performance Considerations
 MAA Test Lab
 Q&A
High Availability is …
Causes of Downtime
Unscheduled
Outages
System Faults
and Crashes
Data Center
Disasters
Human
Error
Scheduled
Outages
Data and
Media Failures
Inadequate System
Design, Testing & Process
Maintenance &
Continuous
Operations
High Availability Goal
 Design and validate the best, integrated High
Availability solution
–
–
–
Unbreakable Architecture
 Handle all outages at all tiers
Best Practices
 Cookbook for prevention, avoidance, mitigation, and
recovery
 Configuration, operational, outage solutions, restore fault
tolerance
Complete out-of-the-box high availability
 Tested and validated solution
Unbreakable Architecture + Best Practices = Maximum Availability
Maximum Availability Architecture
 Best Oracle High Availability Architecture
–
–
–
Blueprint for Database and Oracle9iAS
Guidelines for hardware and non-Oracle software
but platform, OS, storage, network, … independent
Evolves with new Oracle versions and features
 Best Practices
–
–
–
Configuration and operational
Outages and detailed solutions
Restoring fault tolerance after an outage
Maximum Availability Architecture
Oracle9iAS
Oracle9iAS
WAN Traffic Manager
Dedicated Network
RAC
Primary Site
Data Guard
RAC
Secondary Site
Secondary Site
 Secondary Site is a Mirror of the Primary Site
–
–
Resolve unscheduled outages quickly and easily
Allow site-wide scheduled outages
 Same Service Levels
–
–
Predictable performance and response time
Site transparency
 Consistent Procedures and Processes
–
Reduces administrative complexity
Highly Available Database
Real Application Clusters
 Fast Failover
–
–
–
Protection from local site system failures
Faster than cold cluster failover solution
Fast-start fault recovery (instance failure MTTR)
 Availability and Accessibility
–
Allows for scheduled outages
 Add and remove nodes transparently
–
Transparent Application Failover (TAF) provides
uninterrupted service
Highly Available Database
Real Application Clusters
 Higher Scalability
–
–
–
All system resources from all nodes are leveraged
Cache fusion eliminates need to partition data or
modify the application – fully application transparent
Connection load balancing distributes connection
requests from application tier
 Manageability
–
Provides a single image of the database to manage
Highly Available Database
Oracle Data Guard
 Data Protection
–
Protection from site failures, data failures, human
errors, and corruptions
 Protection modes balance availability with performance
 Apply delay prevents user error propagation
–
–
Greater protection, performance, and manageability
compared to remote mirroring solution
Offload processing from primary database system
 Role Management
–
–
Switchover operation for scheduled outages
Failover operation for unscheduled outages
Highly Available Application
Oracle9iAS
 Availability
–
–
–
–
Oracle9iAS J2EE (OC4J) and Web Cache
clustering for protection against system outages
Automatic monitor and restart of failed processes
Application state preserved through failures
Add and remove nodes transparently
 Scalability
–
–
Hardware network load balancer distributes client
requests to Web Cache
Web Cache clustering for distributed caching and
load balancing across multiple OC4J instances
Highly Available Application
Oracle9iAS
Clients
Load Balancer
Web Cache
Application
Server Tier
OC4J Clusters
Database Tier
Network Infrastructure
 Wide Area Traffic Manager to direct client traffic
to proper site
 Network load balancer to distribute incoming
requests
 Dedicated, fast link between sites
–
Influences production database performance
 Redundant components and paths
–
Network paths to the site and within the site
Best Practices
 Configuration
–
Detailed recommendations for Oracle software
 Features to use, parameters to set
–
Guidelines for hardware and other software
 Operational
–
–
–
Technical – e.g. Switchover and failover procedures
Logistical – e.g. Change management considerations
Emphasis on outages
 Outages to monitor
 Detailed steps to resolve outages
 How to restore fault tolerance
Best Practices
Operational
Configuration
Database
Oracle9iAS
OS
Storage
Network
Monitor for Outage
Restore Fault
Tolerance
Detect
Outage
Resolve Outage
HA and Performance
 Combining high availability and performance
–
–
–
–
Secondary site with identical configuration as
primary site
Network bandwidth and latency between sites
Data Guard protection mode
Instance recovery time
Network Bandwidth / Latency
 Network bandwidth and latency between sites
influences commit response time
 Longer network latency will increase response
time
–
Remote write = network round trip time + local
write I/O time at secondary site
 Network bandwidth should be greater than
maximum redo generation rate
Database Protection Modes
 Balance performance with level of protection
from human error, data failures, and disasters
 Maximum Protection and Maximum
Availability modes
–
No-data-loss protection, but can have a
performance impact on production service levels
 Maximum Performance mode
–
Data loss possible, but less impact on production
service levels
Instance Recovery Time
 Balance performance with level of protection
from system faults and crashes
 Short instance recovery times can be achieved
with negligible impact on performance
–
Provided sufficient I/O capacity exists to handle
additional data block writes generated
 Fast-start checkpointing makes instance
recovery time-bounded and predictable
Instance Recovery Time
900
800
700
600
500
400
300
200
100
0
writes/sec
tps
disabled
300
180
90
MAA Test Lab
Oracle, Sun, HP, EMC, F5
Oracle9iAS
Oracle9iAS
Sun Microsystems
WAN Traffic Manager
Hewlett-Packard
EMC
Dedicated Network
RAC
Primary Site
Guard
F5Data
Networks
RAC
Secondary Site
Maximum Availability Architecture
 Best Oracle High Availability Architecture

What to use
 Best Practices



How to build it
How to manage it
How to fix it
MAA Information Sources
 Oracle Technology Network
–
High Availability Collateral section
 Maximum Availability Architecture - Overview
 Maximum Availability Architecture – The Details
http://otn.oracle.com/deploy/availability/techlisting.html
 Oracle Consulting – Advanced Technologies
Solutions (ATS) Group
http://otn.oracle.com/consulting/9iServices/content.html
Next Steps
Sessions by Oracle Database Development
Monday
Tuesday
RAC: The Present, The Future, but
not Science Fiction
Breaking All the Rules with The
Unbreakable Database
Mon, 1pm -- Moscone Room 103
Tue, 11am -- Moscone Room 103
Running Your Applications on Oracle
Real Application Clusters
Oracle’s Recipe For Building An
Unbreakable System
Mon, 11am -- Moscone Room 134
Tue, 1pm -- Moscone Room 134
Real Customers, Real Application
Clusters, Real Results
Bullet-Proof Data Protection with
Oracle Data Guard
Mon, 4pm -- Moscone Room 134
Tue, 4pm -- Moscone Room 134
Deploying A Highly Manageable
Oracle Real Application Clusters
Database
Mon, 5:30pm -- Moscone Room 134
For More Info On Oracle HA Go To http://otn.oracle.com/deploy/availability/
Next Steps
Sessions by Oracle Database Development
Wednesday
Getting Under The Hood With Data
Guard SQL Apply
Database HA Demos All Four Days
In The Oracle Demo Campground
Real Application Clusters
Wed, 8:30am -- Moscone Room 134
LogMiner, Flashback Query and
Online Redefinition: Power Tools
For DBAs
Wed, 11am -- Moscone Room 134
Are You Using The Best To Protect
Your Enterprise Data?
Wed, 4pm -- Moscone Room 252
Oracle LogMiner - Not Just An Error
Recovery Tool
Data Guard
Backup & Recovery with Recovery
Manager
LogMiner, Flashback Query and
Online Redefinition
Wed, 5:30pm -- Moscone Room 102
For More Info On Oracle HA Go To http://otn.oracle.com/deploy/availability/
Next Steps
Sessions by Oracle Database Development
Showcase Presentation/Demo
Monday
11:00 AM -- Database High Availability: Data Guard
11:30 AM -- Database High Availability: Backup & Recovery and Recovery Manager
12:00 PM -- Database High Availability: Online Reorg, Flashback Query and LogMiner
Tuesday
11:00 AM -- Real Application Clusters: Scalability
11:30 AM -- Real Application Clusters: High Availability
12:00 PM -- Real Application Clusters: CFS on Linux
Wednesday
11:00 AM -- Real Application Clusters: Scalability
11:30 AM -- Real Application Clusters: High Availability
12:30 PM -- Database High Availability: Data Guard
For More Info On Oracle HA Go To http://otn.oracle.com/deploy/availability/
QUESTIONS
ANSWERS