Inside the Interconnect
Download
Report
Transcript Inside the Interconnect
RAC Basics
Julian Dyke
Independent Consultant
Web Version - February 2008
1
© 2008 Julian Dyke
juliandyke.com
Agenda
2
© 2008 Julian Dyke
Real Application Clusters
The Theory
The Reality
juliandyke.com
RAC
The Theory
3
© 2008 Julian Dyke
juliandyke.com
RAC
Redundancy
4
Single Point of Failure
If component fails, system will be inaccessible
Redundancy
Duplicate components
If component fails another can be used
Active-Active or Active-Passive
Examples include
Power Supplies
RAID
Bonded Networks
IO Multipathing
Oracle RAC
© 2008 Julian Dyke
juliandyke.com
RAC
4-node cluster
Public
Network
Private
Network
(Interconnect)
Node 1
Node 2
Node 3
Node 4
Instance
1
Instance
2
Instance
3
Instance
4
Storage
Network
Shared
Storage
5
© 2008 Julian Dyke
juliandyke.com
RAC
Cache Coherency
6
RAC must ensure changes made by any instance
Are not overwritten by another instance
Maintain ACID properties
Current Blocks
Blocks can be updated by any instance
Only current version of a block can be updated
Only one current version of a block can exist across all
instances
Consistent Read Blocks
Can have theoretically unlimited number of consistent
versions of a block
in each instance
across all instances
© 2008 Julian Dyke
juliandyke.com
RAC
Cluster Manager
7
All clusters must have cluster management software
Manages node membership and evictions
Oracle Clusterware
Mandatory for RAC in Oracle 10.1 and above
Known as Cluster Ready Services (CRS) 10.1 only
Can be combined with vendor clusterware
IBM HA/CMP
HP ServiceGuard
Sun Cluster
Must be running before ASM/RDBMS instances can be
started on a node
Can be used with non-RAC databases and applications
Oracle 10.2 and above
© 2008 Julian Dyke
juliandyke.com
RAC
Interconnect
8
Used for inter-node communication by:
Oracle Clusterware
ASM Instances
RDBMS Instances
Optimally high bandwidth / low latency
Typically 1GB Ethernet
Uses TCP / UDP protocols
NIC interfaces often bonded for availability
Other physical networks supported e.g. Infiniband
© 2008 Julian Dyke
juliandyke.com
RAC
Shared Storage
9
Required for:
Oracle Clusterware Files
Oracle Cluster Registry (OCR)
Voting Disk
Database Files
Control Files
Database
Online Redo Logs
Server Parameter File
Strongly recommended for
Archived redo logs
Backup copies
© 2008 Julian Dyke
juliandyke.com
RAC
Shared Storage
10
Can use:
Storage Area Network (SAN) e.g.:
EMC Clariion / Symmetrix
HP MSA / EVA / XP series
Hitachi
Fujitsu
Network Attached Storage (NAS) e.g.:
Network Appliance
Pillar Data System
Sun StorageTek
EMC Celerra
JBOD (with ASM)
© 2008 Julian Dyke
juliandyke.com
RAC
Shared Storage
11
Fibre Channel
SCSI protocol - block based
Normally 2Gb or 4Gb
Requires one or more Host Bus Adapters (HBA) per node
Requires fabric switches
iSCSI
SCSI protocol - block based
Packets sent over dedicated IP network
Can use standard network components
Processing often offloaded to NIC firmware
NFS
File-based
Uses standard network components
© 2008 Julian Dyke
juliandyke.com
RAC
Shared Storage
12
Cluster-aware File Systems:
Automatic Storage Management
Cluster File Systems
Oracle Cluster File System (OCFS/OCFS2)
Red Hat GFS
IBM GPFS
Sun Storedge QFS
Veritas CFS
Network File System
On supported Network Attached Storage only
© 2008 Julian Dyke
juliandyke.com
RAC
Automatic Storage Management (ASM)
13
Introduced in Oracle 10.1
Additional functionality in 10.2 and 11.1
Generic code (all supported platforms)
Available for both single-instance and RAC databases
Provides shared storage for RAC
Can optionally provide mirroring:
Normal Redundancy (mirrored)
High Redundancy (triple mirroring)
Useful with JBOD or extended clusters
Mandatory for Oracle 10g Standard Edition RAC
Presents storage as disk groups containing
Physical disks
Logical files
Requires additional ASM instance on each node
© 2008 Julian Dyke
juliandyke.com
RAC
Licensing
14
Standard Edition
RAC option free
Maximum two nodes
Maximum four CPUs
Must use Oracle Clusterware
Must use Automatic Storage Management (ASM)
No extended clusters
Enterprise Edition
RAC option 50% extra (per EE license)
No limit on number of nodes
No limit on number of CPUs
Can use any shared storage (ASM, CFS or NFS)
Can use Enterprise Manager Packs (Diagnostics, Tuning..)
© 2008 Julian Dyke
juliandyke.com
RAC
Process Architecture
Clusterware
OPROCD
OCSSD
CRSD
Clusterware
EVMD
OPROCD
OCSSD
+ASM1
EVMD
+ASM2
PMON
SMON
LGWR
DBWn
ARCH
PMON
SMON
LGWR
DBWn
ARCH
LMON
LCK0
LMD0
LMSn
DIAG
LMON
LCK0
LMD0
LMSn
DIAG
PROD1
PROD2
PMON
SMON
LGWR
DBWn
ARCH
PMON
SMON
LGWR
DBWn
ARCH
LMON
LCK0
LMD0
LMSn
DIAG
LMON
LCK0
LMD0
LMSn
DIAG
Node 1
15
CRSD
© 2008 Julian Dyke
Node 2
juliandyke.com
RAC
Reasons For Deployment
16
Availability
Node failure
Instance failure
Scalability
Distribute workload across multiple instances
Scale out
Manageability
Economies of scale
Administration / Monitoring / Backups / Standby
Reduction in total cost of ownership
Database consolidation
Commodity hardware
© 2008 Julian Dyke
juliandyke.com
RAC
Availability
17
Ensure continued availability of database in event of node or
instance failure
Automatic failover
No human intervention required
In the event of node or instance failure:
All sessions connected to failed node are terminated
Sessions connected to remaining nodes are
temporarily suspended while resources are re-mastered
resume after brown-out period
New sessions will be connected to remaining nodes only
Ensuring availability requires spare capacity during normal
operations
Either additional node
Or reduction in service level
© 2008 Julian Dyke
juliandyke.com
RAC
Availability
Public
Network
Private
Network
(Interconnect)
Node 1
Node 2
Node 3
Node 4
Instance
1
Instance
2
Instance
3
Instance
4
Storage
Network
Shared
Stoage
18
© 2008 Julian Dyke
juliandyke.com
RAC
Scalability
Resources
Workload can be distributed across multiple nodes
Workload can be balanced across all nodes using
connection management
Client-side using Oracle Net
Server-side using listener processes
Workload can be directed to specific nodes using services
Level of scalability dependent on application
Resources
Throughput
19
© 2008 Julian Dyke
Throughput
juliandyke.com
RAC
Scalability
20
Factors that can degrade scalability
Excessive parsing
Consistent reads
SELECT FOR UPDATE / user defined locking
DDL
Object-oriented code
Features that can improve scalability
Services
Automatic Segment Space Management
Partitioning
Sequences
Reverse indexes
© 2008 Julian Dyke
juliandyke.com
RAC
Manageability
21
Advantages
Consolidation
Economies of scale
Administration
Monitoring
Backup and recovery
Standby database
Disadvantages
Increased Planned downtime
Complexity
Dependencies
Skills
© 2008 Julian Dyke
juliandyke.com
RAC
Total Cost of Ownership
22
Benefits
Lower hardware costs - commodity hardware
Lower support costs
Management economies of scale
Costs
Redundant hardware
Servers, Storage, NIC, HBA, Switches, Fabric
Oracle licenses
Experienced staff
Application modifications
© 2008 Julian Dyke
juliandyke.com
RAC
Applications
23
Most applications should run on RAC without modification
Performance is not guaranteed
Applications that perform well in single-instance have best
chance of scaling in RAC
Applications performing badly in single-instance will
perform worse in RAC
Some features do not port easily to RAC e.g.:
DBMS_ALERT, DBMS_PIPE, External files
Applications that can be logically partitioned tend to scale
best
Minimize use of interconnect
Maximize use of buffer caches
Implementation more likely to succeed if you have direct or
indirect access to source code
© 2008 Julian Dyke
juliandyke.com
RAC
Database Services
24
Allow sessions with similar workload characteristics to be
logically grouped and managed
Services can be assigned to
set of preferred instances - used if available
set of available instances - used if preferred instances not
available
failover to available instances is automatic
failback to preferred instances is manual
Services can be configured to maximize instance affinity
Limited statistics reported at service level
Can also be reported at service / module / action level
Trace can be enabled at service level
Can also be enabled at service / module / action level
© 2008 Julian Dyke
juliandyke.com
RAC
Database Services
Before
After
SERVICE1
Listener1
Listener2
Listener1
Listener2
PROD1
PROD2
PROD1
PROD2
SERVICE1
SERVICE1
PROD1
SERVICE1 PREFERRED
25
SERVICE1
© 2008 Julian Dyke
SERVICE1
PROD2
AVAILABLE
juliandyke.com
RAC
Extended Clusters
26
Currently the Holy Grail of high availability
RAC nodes located at physically separate sites
Implicit disaster recovery
Requires Enterprise Edition licences + RAC option
In the event of a site failure, database is still available
Storage is duplicated at each site
Can use ASM or vendor-supplied storage technology
Active / Active configuration
Users can access database via either site
Configuration and performance tuning are complex
Cache fusion traffic between sites
© 2008 Julian Dyke
juliandyke.com
RAC
Extended Clusters
Private Network
Public Network
Instance 1
Quorum
Node 1
Instance 2
Node 2
Site3
Storage
Storage
Network
Network
27
Database
Database
Site1
Site2
© 2008 Julian Dyke
juliandyke.com
RAC
Disaster Recovery
28
Data Guard and RAC are fully compatible
Can configure any permutation e.g.
Primary
Standby
Single-instance
Single instance
RAC
Single instance
RAC
RAC
Single instance
RAC
All instances can participate in redo log shipping
Only one instance can perform managed recovery
Standby database might be a potential bottleneck
© 2008 Julian Dyke
juliandyke.com
RAC
Alternatives
29
Single Instance Databases
No RAC overhead
Simpler to install / configure / manage
Single point of failure
Oracle Products
Oracle Streams
Oracle Clusterware
Proprietary Clustering Solutions
HP ServiceGuard
IBM HA/CMP
Sun Cluster
© 2008 Julian Dyke
juliandyke.com
RAC
The Reality
30
© 2008 Julian Dyke
juliandyke.com
RAC
The Reality
31
Many sites running RAC
Mostly Oracle 10.2
A few still running Oracle 10.1
Still some Oracle 9.2
Most RAC users develop their own applications or use
bespoke applications developed by a third-party
Probably around 20 extended clusters in production across
Europe
Many Oracle 10.2 sites run ASM
Very few run OCFS or raw devices
Very few use third-party cluster file systems
Most sites using SAN - fewer using NAS
In UK most users currently deploy on Linux x86-64
Solaris very popular in other regions
© 2008 Julian Dyke
juliandyke.com
RAC
The Reality
32
Few Oracle 10g users run vendor clusterware
Most RAC deployments for availability
Decreased unplanned downtime
Increased planned downtime
Increasing number of deployments for scalability
Workload balancing
Services
Manageability benefits very doubtful
Economies of Scale versus Additional complexity
TCO reductions possible in some circumstances
Replace large SMP boxes
Replace legacy active-passive clusters
© 2008 Julian Dyke
juliandyke.com
RAC
The Reality
33
Most users run 2-node clusters
Some have 3-node or 4-node clusters
A handful run five nodes or more
Most users only have one database per cluster
Few grids
Oracle Clusterware scales well
Number of nodes does not impact performance
Oracle RAC databases might scale well
Dependent on application
Additional nodes may improve or degrade performance
© 2008 Julian Dyke
juliandyke.com
RAC
The Reality
34
ASM currently the most popular RAC storage technology
Deployed in numerous Oracle 10.2 RAC production systems
No operating system utilities
ASMCMD in Oracle 10.2 and above
Generally disliked by storage administrators
Too much control to DBAs
Acceptable performance
ASM instance provides metadata
RDBMS instances read and write blocks directly from files
© 2008 Julian Dyke
juliandyke.com
Thank you for your interest
35
References
http://www.juliandyke.com/References/References.html
Questions
[email protected]
© 2008 Julian Dyke
juliandyke.com