Wait-based tuning
Download
Report
Transcript Wait-based tuning
Resource Mapping
A Wait Time Based Methodology for
Database Performance Analysis
Prepared for NoCOUG, Fall Conference, 2004
Presented by
Matt Larson
Chief Technology Officer
Confio Software
Presentation Agenda
2
Introduction
Conventional Tuning vs. Wait-based Tuning
Foundation: Resource Mapping Methodology
5 Key Steps of Applying RMM
RMM Advantages
Conclusion
Who am I?
3
Former DBA consultant specializing in Oracle
performance tuning
Co-author of three Oracle books (Oracle
Development Unleashed, Oracle Unleashed
2nd Edition, Oracle8 Server Unleashed)
Co-author of two other database related
books
CTO and founder of Oracle performance
software company
Problems with Conventional Tuning Tools:
Like the Drunk Under the Streetlight
4
Conventional Tuning
5
Art, not a science
Ratio-based (cache hit ratios, etc.)
Sometimes fruitless
It’s “tuned” (I guess?)
Different tuning/investigation process for each
DBA/DBA Team/Company
Problems with Conventional Tuning Tools
Optimize systems, not business results
Conventional tools:
• V$ Views: limited visibility & granularity
• Statspack: averages across entire database
• Tracing: Limited to session, excessive volume
Incorrect Data hides real results
• System-wide averages
• Event counters
• Incomplete visibility
6
What Problems are you Trying to Solve?
Methodology addresses a common problem
space:
• I spend the whole week monitoring and optimizing Oracle
configurations, but I have no demonstrable results to show
for it - why?
• Will more hardware make my application run faster? By how
much?
• Will the new application run efficiently on the production
server?
• Why does one application keep impacting my SLA
compliance?
• If I could make one (or 2, 3, or 4) changes to my database
to have the biggest impact, what would they be?
7
You know you are working on the wrong
thing when…
After spending an agonizing week tuning
Oracle buffers to minimize I/O operations,
management typically rewards you with:
•
•
•
•
8
A.
B.
C.
D.
An all expense paid vacation
A free lunch
A stale donut
Reward? Nobody even noticed!
You know you have a visibility problem
when…
You measure database performance based on:
•
•
•
•
9
A.
B.
C.
D.
Increasing trends in user response time
Increasing system down time
Increasing help desk calls
Increasing decibel levels from irate users
Your role is sub-essential to the business
of your organization when…
Your role in the rollout of a new customer
facing application results in:
•
•
•
•
10
A.
B.
C.
D.
Keys to drive the CEO’s Porsche
Keys to use the executive restroom
A mop to use in the executive restroom
Your office has been moved to the restroom
You know you are accustomed to
measuring the wrong thing when…
You measure the commute time to work
based on:
•
•
•
•
11
A.
B.
C.
D.
The time it takes to get there
Counting the times your wheels rotate
Monitoring your tachometer
The number of speeding tickets
Wait-based Performance Tuning
12
Emerging best-practice for database tuning
Proponents include leading consultants,
trainers and authors
Oracle is starting to build wait-based tuning
tools into the database particularly in 10g
Tune by determining where processing time is
spent
Oracle 10g - Moving towards wait-based
Adding wait-based columns to existing views
New wait-based views
Example:
v$session_wait_history
•
•
•
13
Provides the last 10 wait events for a session
Session ID, Username, Event, Wait_Time, etc.
Used to provide wait_time for only a few events
Resource Mapping Methodology
A set of requirements that define what data must be
captured to effectively make tuning decisions and a process
for applying the data to achieve the optimal outcome
RMM
Wait Based Tuning
14
DBA Success Stories using RMM
15
DBA solves a “Cold Case”. Problem unresolved for 1
year with traditional tools; Solution identified in 10
minutes during hands-on training
DBA ends “Crit Sit” 2 week situation ends quickly
after identification of Library Cache pin wait and load
locks. Metalink identifies Oracle bug, patch
successfully applied
DBA saves $700K. 90% CPU capacity initiates
expansion from 12 to 24 CPU server. DBA identifies
parallel queries across 16 parallel threads as source of
bottleneck. CPU eliminated as constraint, no new
server required.
Resource Mapping Methodology
Three Key Principles of RMM
1. SQL View: View all statistics at SQL
statement level
2. Time View: Measure Time, not number of
times a resource is utilized
3. Full View: Separately measure every
resource to isolate source of problems
16
Illustrating example: SQL View Principle
17
Example: ‘CEO’ measuring ‘employee’ output
Averaging over entire company gives no useful data
Must measure each job separately
DBA must manage database similarly
Measure and identify bottlenecks for each SQL independently
Illustrating example: Time View Principle
18
Example: ‘CEO’ counting ‘tasks’ vs. ‘time to complete’
Counting system statistics not meaningful
Must measure Time to complete
System stats (buffer size, hit ratios, I/O counts) do not identify
where database customers are waiting
Identify and optimize Wait Time for each SQL as best indicator of
performance
Illustrating example: Full View Principle
Example: ‘CEO’ measuring results with blind spot hiding key
19
processes
Without direct visibility, valuable info is lost
Must have visibility to every process step
Distinctly identify and measure each Oracle resource for each
distinct SQL
Track SQL Time,
Not System Counters
• Watching Counters leads to wrong conclusions: Time is more relevant
• Total System Counters hide information: Need breakdown to
individual SQLs
Total System 80K Reads
Counter
SQL 1
SQL 2
SQL 3
Resources
20
30 Minutes
5R
15M
25 R
5M
50 Reads
I/O
5K Packets
4M
200 Minutes
5M
Network
125 Attempts 216K Writes
6M
50 A
10 M
35 A
100 Minutes
50 A
Locks
4M
200 Minutes
5M
Redo
Applying RMM for Business Results
Five Step Process focusing on what matters
1. Identify
21
2. Allocate
3. Quantify
4. Prioritize
5. Assign
Step 1: Identify
22
Identify SQL Statements having
largest impact
• (SQL View and Time View principles)
Longest wait times = most
significant “pain points” for
customers
Conversely, low cache hit ratios
or high latch usage may not
impose high wait times for users
(so why fix them?)
SQL statements
prioritized by
Total Wait Time
Step 2: Allocate
Allocate impact to real
customers (internal or
external)
Allocate wait time to
Program, Session, Machine
• SQL View principle makes this
connection
Understanding database
customer and application
23
Programs Prioritized by
Total Wait Time
Step 3: Quantify
How much is save in time/money if fixed?
Enabled by Full View and Time View principles
Soft dollar savings
• Data entry clerks
• DBA time spent in problem resolution
Hard dollar savings
• Reduce hardware upgrades
• Meet SLA’s avoiding penality
• Ensure business isn’t lost due to poor performing or
unavailable system
Quantifiable benefit of
Tuning a
specific statement
24
Step 4: Prioritize
25
If last step properly executed, this step is
fairly straight forward
Allow’s DBA to cut through the clutter of
potential new projects, investigations, and
trials.
Better justification for priorities. (e.g. We
aren’t working on your problem since this
other has a higher demonstrable business
impact)
Step 5: Assign
Assign the right people to the problem
• Log_buffer waits
• Network issues
• Same query 10,000/hour
26
Enabled by Full View principle
Avoid finger-pointing by accurately assigning
quickly
Resource Mapping Methodology
RMM
Wait Based Tuning
27
Network, Storage, Application, Web, etc.
Silo Monitoring
Business Management
LIMITED VIEW
IT Management
LIMITED VIEW
Web Team
Sitescope
Custom App Team
Network Team
Database/OS Teams
Storage/OS Teams
Often No Commercial Tools
Software Layers
Web Server
Custom Biz Logic
Network
HP Openview
Database Server
Wait-based tuning
Storage Box
EMC Control Center
Each team uses their own tool to partially monitor their non-Oracle
layers. No view across layers. Management has no clear view.
28
The Solution - Integrated Vision
Business Management
RMM across the stack
IT Management
Web Team
Web Server
Custom App Team
Custom Biz Logic
Network Team
Network
Database/OS Teams
Database Server
Storage/OS Teams
Storage Box
All teams see a complete picture of all layers and dependencies.
Enables more efficient “Umbrella” solution.
29
RMM Achieved Business Benefits
30
RMM Does:
Business Benefit:
35% reduction in database
capacity requirement
Reduce capital investment
Avoid unnecessary additions
Recovers un-used capacity
Standardizes “expert” analysis
ability across entire DBA team
Reduce training & consulting
costs
Quantifies performance impact
Focus tuning efforts on biggest
business impacts
Identifies problem Root Cause
and resolution
Assign human resources and
responsibility
Anticipates + resolves
performance bottlenecks
Maintain SLA and end user
performance
Example 1: Problem Observed
Critical situation: Secure Service Center
application performance unsatisfactory
• Response time between 2400 and 9000 seconds
• Very high network traffic (3x—4x normal),
indicating time-outs and user refreshes
• “CritSit” declared: major effort to resolve problem
31
Observations using Resource Mapping
Methods
1: Identify accumulated Waits
2: Identify specific resources used
Lib cache pin
wait
Lib cache load
lock
32
Notice scale:
> 8000 secs
Results
Library cache pin nearly
unobservable
Library cache load lock no
longer observable
33
Notice scale:
< 1400 secs max
Results
Response time improvement from 8000
seconds (worst case) to 900 seconds
Variance improvement:
• Before: response time 2400 - 8000 sec
• After:
response time 800 - 900 sec
34
Example 2: Performance Drain – Identify
the Source
Slow response reported
DBA and database focus of
35
delays
Database problem?
No – SQL*Net Message
identified as source of delay
2nd highest wait event
RMM Drill Down identifies source of
problem
Single application generates
all SQL*Net Messages
App on same server as
Oracle!
Answer:
Misconfiguration – TCP/IP
used within server
Change to IPC, eliminate NIC
traffic and 30% of wait time
Solution requires knowing: Which SQL, What Wait Time, Which Resource
36
Example 3: Scattered Reads
Situation: LINS06 database - Hourly profile identifies high wait
anomaly
3-10x higher than other periods – requires investigation
wait time
42,000 seconds
10:00-11:00
37
Drill Down to Key RMM Parameters
Notice scale:
> 6000 secs
Db file
scattered
reads
Db file
scattered reads
38
Conclusion
Look for what has an impact
Resource Mapping is more that Wait Time –
Analysis must include:
• SQL level granularity
• Full Resource granularity
39
Isolating the SQL and Resource allows you to
find and fix the Root Cause
DBAs can have an impact and be heroes!
Thank you for coming
Matt Larson
Contact Information
• [email protected]
• 303-938-8282 ext. 110
• Company website
www.confio.com
40