Slide 1

Transcript Slide 1

S311441 Practical Performance Management for Oracle Real
Application Clusters
Michael Zoll, Consulting Member of Technical Staff
Barb Lundhild, Product Manager, Oracle Real Application Clusters
The following is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
Agenda
• Oracle RAC Infrastructure and Technical
Fundamentals
• Application and Database Design
• Common Problems and Symptoms
• Diagnostics and Problem Determination
• Appendix
<Insert Picture Here>
Objective
• Convey a few simple and fundamental concepts of
Oracle RAC performance
• Summarize application level performance and
scalability information
• Provide some simple sizing hints
• Give exemplary overview of common problems and
solutions
• Builds on similar presentation from OOW 2008
http://www.oracle.com/technology/products/database/clustering/pdf/s298716_oow2008_perf.pdf
<Insert Picture Here>
Oracle RAC Infrastructure:
Technical Fundamentals,
Sizing and Configuration
Oracle RAC Architecture
/…/
Node1
public network VIP2
VIP1
Service
Service
Listener
Listener
SCAN_Listener
SCAN_Listener
VIPn
Node 2
Service
Listener
SCAN_Listener
instance 1
instance 2
instance n
ASM
ASM
ASM
Oracle Clusterware
Oracle Clusterware
Oracle Clusterware
Operating System
Operating System
Operating System
shared storage
Redo / Archive logs all instances
Managed by ASM
Database / Control files
OCR and Voting Disks
Node n
Global Cache and Global Enqueue Service
Processes and Functions
SGA
Runs in Real
Buffer Cache
Library
Cache
Time Priority
Dictionary
Cache
Global Resource Directory
Global Enqueue Service
Oracle
Process
Oracle
Proces
s
LMD0
Log buffer
Global Cache Service
LMON
Cluster Private High Speed Network
LMSx
DBW0
LGWR
Global Cache Access
Legend:
2
Immediate direct send: > 96%
Log write and send
:
3
LMS
: < 4%
LGWR
5
Post
Buffer Cache
6
1
Send
Shadow
process:
4
Receive
Post
Flush redo
Basic Performance Facts
• Global Cache access is 100 - 500 usecs ( roundtrip )
– Data immediately served from remote instances via private, high
speed interconnect
• Redo may have to be written to log file before send if data was changed and has
not been committed yet
– Performance varies with network infrastructure and network protocol
• Maximum network hops is 3 messages
– For clusters with more than 2 nodes, independent of total cluster
size
• CPU cost per OLTP transaction
– Dependent on locality of access , I.E. messages per tx
Basic Performance Facts: Latency
(UDP/GbE and RDS/IB )
Block size
RT (ms)
2K
4K
8K
16K
UDP/GE
0.30
0.31
0.36
0.46
RDS/IB
0.12
0.13
0.16
0.20
Lower CPU cost relative to protocols and network infrastructure
•Actual interconnect latency is generally not the problem
unless you have exceeded capacity or you are experiencing
errors
Private Interconnect
• Network between the nodes of an Oracle RAC
cluster MUST be private/dedicated to traffic
between Oracle RAC nodes
• Large ( Jumbo ) Frames for GbE recommended
– Avoids fragmentation and reassembly ( 8K / 1500 MTU = 6
fragments )
• Interconnect bandwidth should be tested with nonOracle utilities ( e.g. iPerf )
– No packet loss at 75% - 80% of bandwidth utilization
Interconnect Bandwidth
• Generally, 1Gb/sec sufficient for performance and
scalability in OLTP.
• DSS/DW systems should be designed with >
1Gb/sec capacity categorically
• Prediction of interconnect traffic is difficult
–
–
Depends on transaction instruction length per message
Empirical rule of thumb: 1Gb/sec per 32 CPU Cores
• Infiniband and 10GbE are supported for scale-out
<Insert Picture Here>
Performance and Scalability of
Applications and Database
Design with RAC
General Scalability
• Scaling OLTP workloads, DML intensive
– Scale well, if contention is little and database/working set size
scales ( I.E. add node when demand grows)
• Read intensive workloads scale predictably and linearly
– Bigger cache when adding more nodes
– Faster read access to global cache than to disk, less disk IO
• If cluster-size and database size growth are balanced,
system will perform and scale well
Performance and Scaling in Application and
Database Design
Response Time Impact
• Index contention on INSERTS when index is rightgrowing
– system generated “artificial” keys such as consecutive order
numbers or “natural” keys such as dates
• UPDATES or DELETES to rows in a small working set
– Session logging and tracking
– First-in first-out queues
– State of messages in queues
• Bulk INSERTS of large amounts of data
– LOBS
DML Contention and Serialization
Modification intensive operations on small set of ( cached) blocks
“busy blocks”
Table T
“busy blocks” Table T’
Index I
Index I
……
INSERT INTO I WHERE Key = sequence
UPDATE T SET … WHERE row in blocks[1..n]
and n is a small number
……
Performance and Scaling in Application and
Database Design
CPU Cost due to Inter-Instance Messaging and nonlinear scaling
• In-memory databases
– Working set spans multiple buffer caches
– Frequent modifications and reads of recent modifications
• Working set fits into memory of one instance
– Locality of access worsens when node are added and users are
load balanced
• Scale as long as sufficient CPU power is available
Read-intensive
Buffer Cache 32GB
Buffer Cache 32GB
Cache Transfer
Disk Transfer
Read
Read
Working Set on Disk 64GB
……
Eventually all blocks cached,
Larger read cache
……
No messages in 11g
Performance and Scalability
• Good linear or near-linear scaling out of box
• IO and CPU intensive applications with large working
sets and low proximity of access
–
–
–
–
Self-Service Web Applications ( Shopping Carts etc. )
CRM
Document storage and retrieval
Business Analytics and Data Warehousing
Performance and Scalability
• Partitioning or load direction may optimize performance
• High proximity of access , e.g. adding and removing
from message queues
– Advanced Queuing and Workflow
• Batch and bulk processes
– Order processing and Inventory
– Payroll processing
Identifying Performance and Scaling Bottlenecks in
Database Design
The Golden Rules:
• #1: For first approximation, disregard read-mostly objects and
focus on the INSERT, UPDATE and DELETE intensive indexes
and tablespace
• #2: If DML access to data is random, no worries if CPU is not
an issue
• #3: Standard SQL and schema tuning solves > 80% of
performance problems. There is usually only a few problem
SQL and Tables.
• #4: Almost everything can be scaled out quickly with loaddirection and load balancing
Identifying Performance and Scaling Bottlenecks in
Database Design
• Look for indexes with right-growing characteristics
– Keys comprising DATE columns or keys generated by sequence
numbers
• Find frequent updates of “small” and compact tables
– “small”=fits into a single buffer cache
• Identify frequently and concurrently modified LOBs
HOW ?
• Look at segment and SQL statistics in the Automatic
Workload Repository
• Use Oracle Enterprise Manager Access Advisories
and Automatic Database Diagnostics Monitor (ADDM)
• Instrumentation with MODULE and ACTION helps
identify and quantify components of the workload
Quick Fixes
Without modifying Application
• Indexes with right-growing characteristics
– Cache sequence numbers per instance
– Hash or range partition table with LOCAL indexes
• Frequent updates of “small” and compact tables
– Reduce block size ( 2K ) and row density of blocks
(PCTFREE 99 )
• Frequently modified LOBS
– Hash partitions ( 128 – 256 )
– FREE POOLS
Quick Fixes
• Application Modules which may not scale or cannot be
quickly reorganized can be directed to particular nodes
via cluster managed services
– For Administrator Managed and older releases create service
with 1 preferred node and the rest available
– For Policy Managed databases use a singleton service
• Some large scale and high performance applications
may be optimized by Data Partitioning ( range, hash,
or composites) and routing per partitioning key in
application server tier
– E.g. hash by CLIENT_ID, REGION etc.
Leverage Connection Pools
UCP: Load Balancing and Affinity
RAC
Database
30% Work
I’m busy
Pool
Instance1
10% Work
Application
I’m very busy
I’m idle
60% Work
Instance2
Instance3
Performance and Scalability
Enhancements in 11.1 and 11.2
• Read Mostly
– Automatic policy detects read and disk IO intensive tables
– No interconnect messages when policy kicks in -> CPU savings
• Direct reads for large ( serial and parallel ) scans
– No locks , no buffer cache contention
– Good when IO subsystem is fast or IO processing is offloaded to
storage caches or servers ( e.g. Exadata )
• Fusion Compression
– Reduces message sizes and therefore CPU cost
• Dynamic policies to make trade-off between disk IO and
global cache transfers
<Insert Picture Here>
Performance Diagnostics
and Checks: Metrics and
Method
Normal Behaviour
• It is normal to see time consumed in
–
–
–
–
–
<Insert Picture Here>
CPU
Db file sequential/scattered read
Direct read
Gc cr/current block 2-way/3-way ( Transfer from remote cache )
Gc cr/current grant 2-way ( Correlates with buffered disk IOs )
• Average latencies should be within baseline parameters
• Most problems boil down to CPU, IO, network capacity or
applications issues
Normality, Baselines and Significance
Most significant response time
component
Waits
db file sequential read
2,627,295
CPU time
gc current block 3-way
gc buffer busy acquire
gc current block 2-way
gc current block busy
Time(s)
21,808
AVG
(ms) %Time
8
9,156
43.2 %
18.2
3,289,371
4,019
1
8.0
373,777
3,272
9
6.5
3,982,284
3,192
1
6.3
125,595
2,931
GC waits are influenced by
interconnect or remote
effects which are not always
Avg < 1 ms
obvious
4
Contention
20.8%
Distributed Cause and Effect
Example: Cluster-wide Impact of a Log File IO Problem
Node
Node
2 2
ROOT CAUSE
Node 1
Disk Capacity
Disk or Controller Bottleneck
Global Metrics View
WORKLOAD REPOSITORY report for
Instance
OOW8
Host
oowdb8
Local Symptom
Event:
gc current block busy
23ms
WORKLOAD REPOSITORY report for
Instance
OOW4
Cause
Global Cache Transfer Stats
Inst #
Busy %
4
24.1
data block
114,426 95.9
Host
oowdb4
Log file paralel write
20ms
Avg global cache current block flush time (ms):
21 ms
Remote instance table
Investigate Serialization
gc buffer busy
9 ms
Waits for
gc current block busy
4 ms
Not OK!
Global Cache Transfer Stats
Inst Block
Blocks
No Class
Received
4 data block 114,426
7 data bloc
162,630
%
Immed
95.9
76.6
%
Busy
4.1
23.4
Log file IO
Avg global cache current block flush time (ms):
3.7
Example: Segment Statistics
Segments by Global Cache Buffer Busy
ES_BILLING
TABLE
97.41 %
Segments by Current Blocks Received
ES_BILLING
TABLE
85.81 %
ANALYSIS: TABLE ES_BILLING is frequently read
and modified on all nodes. The majority of global cache
accesses and serialization can be attributed to this .
Comprehensive Cluster-wide Analysis via Global
ADDM
Courtesy of Cecilia Gervasio, Oracle Server Technologies, Diagnostics and Manageability
<Insert Picture Here>
Common Problems and
Symptoms
Common Problems and Symptoms
<Insert Picture Here>
•
•
•
•
Interconnect or Switch Problems
Slow or bottlenecked disks
High Log file Sync Latency
System load and scheduling
Symptoms of Interconnect Problems
Serialization
High latencies
Capacity Limit
Congestion
Dropped Packets
ROOT CAUSE
Symptoms of an Interconnect Problem
Top 5 Timed Events
Avg %Total
~~~~~~~~~~~~~~~~~~
wait Call
Event
Waits Time(s)(ms) Time
Wait Class
---------------------------------------------------------------------------------------------------log file sync
286,038
49,872
174
41.7
Commit
gc buffer busy
177,315
29,021
164
24.3
Cluster
gc cr block busy 110,348
5,703
52
4.8
Cluster
gc cr block lost
4,272
4,953 1159
4.1
Cluster
cr request retry
6,316
4,668
3.9
Other
739
Should never be here
Always a severe performance problem
Interconnect or IPC problems
gc blocks lost
Applications: Oracle
Protocol processing:IP,UDP
UDP:
Packet receive errors
Socket buffer overflows
netstat –s
IP:
1201 Fragments dropped after timeout
467 Reassembly failure
468 Incoming packets discarded
Device
Drivers
TX errors:135 dropped: overruns:
RX errors: 0 dropped:27 overruns:
Ifconfig -a
NIC2
NIC1
Ports
Queues
Switch
Cluster-wide Impact of a Database File IO Problem
ROOT CAUSE
Node 2
Node 1
Disk Capacity
Disk or Controller Bottleneck
IO intensive Queries
Cluster-Wide Disk I/O Impact
Node 1
Top 5 Timed Events
Avg %Total
~~~~~~~~~~~~~~~~~~
wait Call
Event
Waits
Time(s)(ms) Time
------------------------------ ------------ ----------- ------ -----log file sync
286,038
49,872
174
41.7
gc buffer busy
177,315
29,021
164
24.3
gc cr block busy
110,348
5,703
52
4.8
CAUSE:
``
Expensive Query in Node 2
Causes IO bottleneck
Node 2
1. IO on disk group containing
redo logs is slow
2. Block shipping for frequently
modified blocks is delayed
by log flush IO
3. Serialization builds up
Load Profile
~~~~~~~~~~~~
Per Second
---------------
Redo size:
40,982.21
Logical reads:
81,652.41
Physical reads:
51,193.37
Log File Sync Latency: Causes and Symptoms
Courtesy of Vinay Srihari, Oracle Server Technologies, Recovery
Causes of High Commit Latency
• Symptom of Slow Log Writes
– I/O service time spike may last only seconds or minutes
– Threshold-based warning message in LGWR trace file
• “Warning: log write elapsed time xx ms, size xxKB”
• Dumped when write latency >= 500ms
– Large log_buffer makes a bad situation worse.
• Fixes
– Smooth out log file IO on primary system and standby redo
apply I/O pattern
– Primary and Standby storage subsystem should be configured
for peaks
– Apply bug fixes in appendix
Courtesy of Vinay Srihari, Oracle Server Technologies, Recovery
Block Server Process Busy or Starved
Node 2
Node 1
ROOT CAUSE
Too few LMSs
LMS not in High Prio
Memory Problems ( Swapping)
Block Server Process Busy or Starved
Top 5 Timed Events
Avg
%Total
~~~~~~~~~~~~~~~~~~
wait
Call
Event
Waits
------------------------------ -
Time (s) (ms) Time Wait Class
----------- ----------- ------ ------ ----------
gc cr grant congested
26,146
28,761 1100 39.1
Cluster
gc current block congested
13,237
13,703 1035 18.6
Cluster
gc cr grant 2-way
340,281
12,810
38
17.4
Cluster
gc current block 2-way
119,098
4,276
36
5.8
Cluster
8,109
3,460
427
4.7
Cluster
gc buffer busy
On remote note :
Avg message sent queue time (ms):
16.1
“Congested” : LMS could not dequeue messages fast enough
Block Server Processes Busy
• Increase # LMS based on
–
–
–
Occurrence of “congested” wait events
Heuristics: 75 – 80 % busy is ok
Avg send q time > 1ms
• Caveat: # of CPUs should always be >= # of LMS to
avoid starvation
• On NUMA architectures and CMT
–
–
Bind LMS to NUMA board or cores in processor set
Fence off Hardware interrupts from the processor sets
High Latencies in Global Cache
ROOT CAUSE
<Insert Picture Here>
Transient Problems and
Hangs
Temporary Slowness and Hang
• Can affect one or more instances in cluster
• Can be related
– IO issues at log switch time ( checkpoint or archiver
slow)
– Process stuck waiting for IO
– Connection storm
• Hard to establish causality with AWR statistics
• Use Oracle Enterprise Manager and Active
Session History
Temporary Cluster Wait Spike
SQL with High Global Cache Wait Time
Spike in Global Cache Reponse Time
Courtesy of Cecilia Gervasio, Oracle Server Technologies, Diagnostics and Manageability
Temporary Slowness or Hang
Slowdown from 5:00-5:30
$ORACLE_HOME/rdbms/admin/ashrpt.sql
Additional Diagnostics
• For all slowdown with high averages in gc wait time
–
–
Active Session History report ( all nodes )
Set event 10708 on selected processes:
• Event 10708 trace name context forever, level 7
• Collect trace files
– Set event 10899 system-wide
• Threshold based , I.E. no cost
– Continuous OS Statistics
• Cluster Health Monitor (IPD/OS)
–
–
LMS, LMD, LGWR trace files
DIA0 trace files
• Hang Analysis
<Insert Picture Here>
Conclusions
Golden Rules For Performance and Scalability in
Oracle RAC
• Thorough configuration and testing of infrastructure is
basis for stable performance
• Anticipation of application and database bottleneck and
their possible magnified impact in Oracle RAC is
relatively simple
• Enterprise Manager provides monitoring and quick
diagnosis of cluster-wide issues
• Basic intuitive and empirical guidelines to approach
performance problems suffice for all practical purposes
QUESTIONS
ANSWERS
http://otn.oracle.com/rac
Recommended Sessions
DATE and TIME
SESSION
Tuesday, October 13 1:00 PM
Next Generation Database Grid - Moscone
Sourt 104
Tuesday, October 13 2:30 PM
Single Instance Oracle Real Application
Clusters - Better Virtualization for
Databases – Moscone South 300
Wednesday, October 14 11:45 AM
Understanding Oracle Real Application
Clusters Intenals - Moscone South 104
Thursday, October 15 9:00 AM
Oracle ACFS: The Awaited Missing Feature
Moscone South 305
Visit us in the Moscone West Demogrounds Booth W-037
<Insert Picture Here>
Appendix
References
• http://www.oracle.com/technology/products/database/clustering/pdf/s29
8716_oow2008_perf.pdf
• http://otn.oracle.com/rac