Oracle`s Recipe for building an Unbreakable System
Download
Report
Transcript Oracle`s Recipe for building an Unbreakable System
Ashish Prabhu
Douglas Utzig
High Availability Systems Group
Server Technologies
Oracle Corporation
Maximum Availability
Architecture
Oracle's Recipe For Building
An Unbreakable System
Agenda
Achieving High Availability
Maximum Availability Architecture (MAA)
Overview
MAA Components
Performance Considerations
MAA Test Lab
Q&A
High Availability is …
Causes of Downtime
Unscheduled
Outages
System Faults
and Crashes
Data Center
Disasters
Human
Error
Scheduled
Outages
Data and
Media Failures
Inadequate System
Design, Testing & Process
Maintenance &
Continuous
Operations
High Availability Goal
Design and validate the best, integrated High
Availability solution
–
–
–
Unbreakable Architecture
Handle all outages at all tiers
Best Practices
Cookbook for prevention, avoidance, mitigation, and
recovery
Configuration, operational, outage solutions, restore fault
tolerance
Complete out-of-the-box high availability
Tested and validated solution
Unbreakable Architecture + Best Practices = Maximum Availability
Maximum Availability Architecture
Best Oracle High Availability Architecture
–
–
–
Blueprint for Database and Oracle9iAS
Guidelines for hardware and non-Oracle software
but platform, OS, storage, network, … independent
Evolves with new Oracle versions and features
Best Practices
–
–
–
Configuration and operational
Outages and detailed solutions
Restoring fault tolerance after an outage
Maximum Availability Architecture
Oracle9iAS
Oracle9iAS
WAN Traffic Manager
Dedicated Network
RAC
Primary Site
Data Guard
RAC
Secondary Site
Secondary Site
Secondary Site is a Mirror of the Primary Site
–
–
Resolve unscheduled outages quickly and easily
Allow site-wide scheduled outages
Same Service Levels
–
–
Predictable performance and response time
Site transparency
Consistent Procedures and Processes
–
Reduces administrative complexity
Highly Available Database
Real Application Clusters
Fast Failover
–
–
–
Protection from local site system failures
Faster than cold cluster failover solution
Fast-start fault recovery (instance failure MTTR)
Availability and Accessibility
–
Allows for scheduled outages
Add and remove nodes transparently
–
Transparent Application Failover (TAF) provides
uninterrupted service
Highly Available Database
Real Application Clusters
Higher Scalability
–
–
–
All system resources from all nodes are leveraged
Cache fusion eliminates need to partition data or
modify the application – fully application transparent
Connection load balancing distributes connection
requests from application tier
Manageability
–
Provides a single image of the database to manage
Highly Available Database
Oracle Data Guard
Data Protection
–
Protection from site failures, data failures, human
errors, and corruptions
Protection modes balance availability with performance
Apply delay prevents user error propagation
–
–
Greater protection, performance, and manageability
compared to remote mirroring solution
Offload processing from primary database system
Role Management
–
–
Switchover operation for scheduled outages
Failover operation for unscheduled outages
Highly Available Application
Oracle9iAS
Availability
–
–
–
–
Oracle9iAS J2EE (OC4J) and Web Cache
clustering for protection against system outages
Automatic monitor and restart of failed processes
Application state preserved through failures
Add and remove nodes transparently
Scalability
–
–
Hardware network load balancer distributes client
requests to Web Cache
Web Cache clustering for distributed caching and
load balancing across multiple OC4J instances
Highly Available Application
Oracle9iAS
Clients
Load Balancer
Web Cache
Application
Server Tier
OC4J Clusters
Database Tier
Network Infrastructure
Wide Area Traffic Manager to direct client traffic
to proper site
Network load balancer to distribute incoming
requests
Dedicated, fast link between sites
–
Influences production database performance
Redundant components and paths
–
Network paths to the site and within the site
Best Practices
Configuration
–
Detailed recommendations for Oracle software
Features to use, parameters to set
–
Guidelines for hardware and other software
Operational
–
–
–
Technical – e.g. Switchover and failover procedures
Logistical – e.g. Change management considerations
Emphasis on outages
Outages to monitor
Detailed steps to resolve outages
How to restore fault tolerance
Best Practices
Operational
Configuration
Database
Oracle9iAS
OS
Storage
Network
Monitor for Outage
Restore Fault
Tolerance
Detect
Outage
Resolve Outage
HA and Performance
Combining high availability and performance
–
–
–
–
Secondary site with identical configuration as
primary site
Network bandwidth and latency between sites
Data Guard protection mode
Instance recovery time
Network Bandwidth / Latency
Network bandwidth and latency between sites
influences commit response time
Longer network latency will increase response
time
–
Remote write = network round trip time + local
write I/O time at secondary site
Network bandwidth should be greater than
maximum redo generation rate
Database Protection Modes
Balance performance with level of protection
from human error, data failures, and disasters
Maximum Protection and Maximum
Availability modes
–
No-data-loss protection, but can have a
performance impact on production service levels
Maximum Performance mode
–
Data loss possible, but less impact on production
service levels
Instance Recovery Time
Balance performance with level of protection
from system faults and crashes
Short instance recovery times can be achieved
with negligible impact on performance
–
Provided sufficient I/O capacity exists to handle
additional data block writes generated
Fast-start checkpointing makes instance
recovery time-bounded and predictable
Instance Recovery Time
900
800
700
600
500
400
300
200
100
0
writes/sec
tps
disabled
300
180
90
MAA Test Lab
Oracle, Sun, HP, EMC, F5
Oracle9iAS
Oracle9iAS
Sun Microsystems
WAN Traffic Manager
Hewlett-Packard
EMC
Dedicated Network
RAC
Primary Site
Guard
F5Data
Networks
RAC
Secondary Site
Maximum Availability Architecture
Best Oracle High Availability Architecture
What to use
Best Practices
How to build it
How to manage it
How to fix it
MAA Information Sources
Oracle Technology Network
–
High Availability Collateral section
Maximum Availability Architecture - Overview
Maximum Availability Architecture – The Details
http://otn.oracle.com/deploy/availability/techlisting.html
Oracle Consulting – Advanced Technologies
Solutions (ATS) Group
http://otn.oracle.com/consulting/9iServices/content.html
Next Steps
Sessions by Oracle Database Development
Monday
Tuesday
RAC: The Present, The Future, but
not Science Fiction
Breaking All the Rules with The
Unbreakable Database
Mon, 1pm -- Moscone Room 103
Tue, 11am -- Moscone Room 103
Running Your Applications on Oracle
Real Application Clusters
Oracle’s Recipe For Building An
Unbreakable System
Mon, 11am -- Moscone Room 134
Tue, 1pm -- Moscone Room 134
Real Customers, Real Application
Clusters, Real Results
Bullet-Proof Data Protection with
Oracle Data Guard
Mon, 4pm -- Moscone Room 134
Tue, 4pm -- Moscone Room 134
Deploying A Highly Manageable
Oracle Real Application Clusters
Database
Mon, 5:30pm -- Moscone Room 134
For More Info On Oracle HA Go To http://otn.oracle.com/deploy/availability/
Next Steps
Sessions by Oracle Database Development
Wednesday
Getting Under The Hood With Data
Guard SQL Apply
Database HA Demos All Four Days
In The Oracle Demo Campground
Real Application Clusters
Wed, 8:30am -- Moscone Room 134
LogMiner, Flashback Query and
Online Redefinition: Power Tools
For DBAs
Wed, 11am -- Moscone Room 134
Are You Using The Best To Protect
Your Enterprise Data?
Wed, 4pm -- Moscone Room 252
Oracle LogMiner - Not Just An Error
Recovery Tool
Data Guard
Backup & Recovery with Recovery
Manager
LogMiner, Flashback Query and
Online Redefinition
Wed, 5:30pm -- Moscone Room 102
For More Info On Oracle HA Go To http://otn.oracle.com/deploy/availability/
Next Steps
Sessions by Oracle Database Development
Showcase Presentation/Demo
Monday
11:00 AM -- Database High Availability: Data Guard
11:30 AM -- Database High Availability: Backup & Recovery and Recovery Manager
12:00 PM -- Database High Availability: Online Reorg, Flashback Query and LogMiner
Tuesday
11:00 AM -- Real Application Clusters: Scalability
11:30 AM -- Real Application Clusters: High Availability
12:00 PM -- Real Application Clusters: CFS on Linux
Wednesday
11:00 AM -- Real Application Clusters: Scalability
11:30 AM -- Real Application Clusters: High Availability
12:30 PM -- Database High Availability: Data Guard
For More Info On Oracle HA Go To http://otn.oracle.com/deploy/availability/
QUESTIONS
ANSWERS