Transcript Tim Chapman
Troubleshoot Customer Performance
Problems Like a Microsoft Engineer
Tim Chapman
Senior Field Engineer, Microsoft
About me…
Tim Chapman
Sr. Dedicated Premier Field Engineer at
Microsoft
Contributing author
• SQL Server 2012 Bible
• SQL Server MVP Deep Dives 2
@chapmandew
Agenda
A brief introduction to SQLDiag and Diag Manager
A troubleshooting overview
A quick look at Waits and Queues
SQLDiag
Command line utility that ships with SQL Server
Located in the installation Binn directory
Gathers perfmon logs, error logs, profiler traces, blocking information, etc
Requires and XML configuration file
This XML file specifies what to collect
Can add custom collectors – allows you to grab the information you need
You execute a PSSDIAG file, which in turn uses SQLDIAG under the covers
PSSDIAG SQLDIAG Collectors
Diag Manager
What we use to create a pssdiag for
you
GUI tool used to create configuration
file
Free download from Codeplex
The more you configure to trace, the
more impact you may have on
performance
I rarely use Trace
Capturing Custom Data Collections
This is the REAL power of using Diag
Manager & SQL Nexus
Diag Manager can capture any scripts
you specify and SQL Nexus can import
them into a database
Once imported, you can run your own
diagnostic scripts to find problems
SQL Nexus
Tool used to import and report on SQLDiag output
Allows you to develop custom collections and reports
Available on Codeplex: http://sqlnexus.codeplex.com/
This means that the source code is available
RML Utilities must be installed prior to installing SQL Nexus
RML Utilities for SQL Server (x86) – http://www.microsoft.com/enus/download/details.aspx?id=8161
RML Utilities for SQL Server (x64) – http://www.microsoft.com/enus/download/details.aspx?id=4511
For more on capturing data…
My 24 hours of PASS session on PSSDiag & SQL Nexus
http://www.sqlpass.org/24hours/2014/summitpreview/Sessi
ons/SessionDetails.aspx?sid=7293
Troubleshooting Techniques
Define the
problem
There are many different
techniques for troubleshooting
Performance Troubleshooting is
an iterative process
Validate
solution
Implement
Solution
Gather data
Analyze Data
Waits and Queues
A performance tuning method to help identify bottlenecks
Waits
Queues
• Time spent waiting on resources inside the database engine
• Use system DMVs to view this information
• Measure system resource and utilization
• Use perfmon to view this information
Often there is a correlation between wait types and performance
counters
How can Waits & Queues help me?
Avoid troubleshooting the “wrong” bottleneck
Get the biggest payoff for your performance tuning efforts
Find non-obvious performance bottlenecks (ie - Waits on memory
grants)
Excellent first step prior to deep tuning using other tools (SQL
Profiler, logs, DMVs, Execution Plans)
Examples
An end-user or customer tells you about vague performance
issue and you’re not sure where to start
You are looking for ways to speed up a database-driven
application and need to identify the primary bottleneck
Benchmarking and trending (great for before-after testing
during development)
Performance Methodology
Often, there is more than one resource bottlenecks on the
system
Approach
Identify resource that is the main bottleneck
Find the statements using the most of that resource
Tune or optimize usage
Repeat process
How tasks are scheduled…High level
Session
Created
Batch
Submitted
Task(s)
Created
Workers
bound to
Tasks
Task
Executed on
Scheduler
Session States
sys.dm_exec_requests
• Only one session can be running or executing
per scheduler at a time
Running
sys.dm_exec_requests
• Sessions waiting for CPU. Next SPID in the
runnable queue is scheduled to start running
Runnable
• Sessions wait in the waiter list until resources
become available
Waiting
sys.dm_exec_requests
OR
sys.dm_os_waiting_tasks
wait_time_ms versus signal_wait_time
Resource
Wait
Time
Signal
Wait
Time
sys.dm_os_wait_stats
Total
Wait
Time
Example: Mapping Waits to Queues
Wait Type
PAGEIOLATCH
ASYNC_IO_COMPLETION
WRITELOG
Perfmon Counters
Average Disk Queue Length
(consistently high)
Disk sec/read and disk sec/write is
also high
Log Bytes Flushed/sec high
Conclusion?
IO subsystem may be issue
High I/O bandwidth issues
Need to now identify and see if any
un-tuned queries contribute to the IO
How to read sys.dm_os_wait_stats
wait_type – the resource unit that SQL Server is waiting on
waiting_tasks_count – #of tasks that have spent time on this resource
wait_time_ms – total time spent waiting overall
max_wait_time_ms – max time spent waiting
signal_wait_time_ms – time spent in the runnable queue
A few notes on wait stats…
Not all wait types are useful to report on
Waits are accumulated after event occurs
Granularity depends on event type
Viewing as a whole may or may not be useful
Time slicing can be more useful for trending and/or problem
analysis
Common High Waits
If you’re seeing an extremely high % of a single wait on your
system, it is very likely one of the following
WRITELOG
When we flush a database change to the transaction log file
Logical Disk: Avg Disk Sec/Write
Logical Disk: Avg Disk Bytes/Write
Database: Log Flushes/Sec
Database: Log Flush Write Time (ms)
sys.dm_io_virtual_file_stats
sys.dm_io_pending_io_requests
CXPACKET
Parallel queries are happening
Not a good or bad thing
Sometimes less desirable for OLTP workloads
Need to know how the application is supposed to be processing
before you can make a definite judgment
sys.dm_os_waiting_tasks
sys.dm_exec_query_stats
SOS_SCHEDULER_YIELD
When a thread voluntarily releases its hold on the scheduler to
allow another thread to perform its work
Not necessarily a problem unless it consumes a very high % of
wait time on the system
Processor: % User Time
System: Context Switches/sec
sys.dm_os_spinlock_stats
sys.dm_exec_requests
PAGEIOLATCH_*
Latching a buf structure to move a page to memory from disk
Long waits may indicate a disk or memory issue
Logical Disk: Avg Disk Sec/Read
Logical Disk: Disk Bytes/Sec
Buffer Manager: Checkpoint Pages/sec
Buffer Manager: Page Life Expectancy
sys.dm_io_virtual_file_stats
sys.dm_io_pending_io_requests
PAGELATCH_*
A task is waiting for a page latch not associated with an IO
request
Can be caused by inserts into the same page or contention
on allocation pages
sys.dm_exec_requests
sys.dm_os_waiting_tasks
sys.dm_os_buffers_descriptors
ASYNC_NETWORK_IO
Typically occurs because the client requesting data from
SQL Server isn’t processing it fast enough
Look at how the client is processing data
Network Adapter: Current Bandwidth
Network Adapter: Bytes Total/sec
Network Adapter: Output Queue Length
OLEDB
Occurs when SQL calls the OLE DB Provider
Often associated with 3rd party tools that heavily call DMVs
Also can be associated with Linked Server calls, RPC calls,
OpenQuery, OpenRowset or Profiler
sys.dm_exec_requests
LCK_*
Waiting to acquire a lock
We accumulate these AFTER the lock has been released
Access Methods: Table Lock Escalations/sec
Locks: Lock wait time (ms)
Locks: Lock waits/sec
Memory Manager: Lock Memory (KB)
Missing Indexes
sys.dm_db_index_operational_stats
sys.dm_tran_locks
sys.dm_exec_requests
RESOURCE_SEMAPHORE
Waiting for a memory grant due to a high number of
concurrent queries or excessive memory grant requests
Not uncommon for DW workloads
Resource Governor can help
Memory Manager: Memory Grants Pending
Memory Manager: Memory Grants Outstanding
sys.dm_exec_query_memory_grants
sys.dm_exec_query_resource_semaphores
tempdb
A performance talk is not complete with mentioning tempdb
Can become a bottleneck if not properly sized/allocated
Faster drives here are better (hint: SSD)
We use tempdb for a LOT (to name a few):
Temporary tables(#) & table variables (@)
Internal work tables (Spools)
Spills (hash/sort/exchange)
Version Store
tempdb cont.
Make sure all files are equally sized upon creation
For # of files, we recommend:
<8 Cores = use 8 tempdb files
>=8 Cores = use 8 unless you still have latch contention
Then add 4 at a time afterwards
And now…Perfmon.
Incredibly useful for SQL Server troubleshooting
Not enough people use this tool
The “hard” part is knowing what to monitor
…but, we are here to help with that today.
Introduces very little overhead on a SQL Server system
Running Performance Monitor
Start Run perfmon
Causes perfmon to open with only a single counter
Start Run perfmon /sys
Causes perfmon to open with your saved counters
Allows you to save .PerfmonCfg settings, which
allows you to move to other machines
Useful Memory Counters
SQL Server:Buffer Manager
Page Life Expectancy
Checkpoint Pages/Sec
Free Pages
Lazy Writes/Sec
Memory Manager: Memory Grants Pending
Process: Working Set
Memory: Available MBytes
Useful CPU Counters
Very high and sustained CPU usage is a clear sign of an
underlying problem
Processor: % Privileged Time
Processor: % Processor Time
Process: *
SQL Statistics: Batch Requests/Sec
Databases: Transactions/Sec
SQL Statistics: Compiles/Sec
Useful Process Counters
Very useful to debug misbehaving processes on a machine
Always look here to see if processes are competing with SQL
Server – such as Anti-virus
Useful counters:
IO Data Bytes/sec
% Processor Time
Working Set
Private Bytes
Useful IO Counters
Logical Disk
Monitors logical partitions on
the machine (drive letters)
Physical Disk
Monitors physical disks on the
machine
Avg Disk Sec/Read
Avg Disk Sec/Write
% Idle Time
Disk Transfers/sec
Power Settings
The server power settings can be one of the easiest
performance gains
Setting from the default (Balanced) to High Performance
can lead to gains of 20% or more
Processor Information:
% Processor Performance
Application Settings
Look at Process counters for specific processes
Find out if the application is requesting too much data
Application time-outs aren’t necessarily a SQL related
problem
Avoid applications installed on the same machine as SQL
Filter Drivers
Typically exists in the form of anti-virus (AV) software
If you insist on using AV on the database instance, make
sure to exclude the SQL Process and all SQL data types
Look at Process: IO Data Bytes/sec for processes driving
high IO levels other than SQL Server
When to Use Which Tool?
PAL is great for overall system performance ( PAL Download )
Benchmark
Get acquainted with a workload
PSSDIAG/Nexus
More targeted performance analysis
Need to view SQL internal resources (waits, blocking chains, query
plans)
Short timespan for collection