Real World Performance Design and Tuning

Download Report

Transcript Real World Performance Design and Tuning

SQL SERVER CONFIGURATION
OPTIONS AND TRACE FLAG SECRETS
Kevin Kline
SQL Sentry, Director of Engineering Services
@KEKline on Twitter, FB, LI
Blogs at http://KevinEKline.com and http://ForITPros.com
FREE SWAG FROM SQL SENTRY!
•
Free Plan Explorer download:
http://www.sqlsentry.net/plan-explorer/
•
Free query tuning consultations:
http://answers.sqlperformance.com.
•
Free new ebook (regularly $10) to attendees.
Send request me: [email protected]
•
SQL Server educational videos, scripts, and
slides: http://SQLSentry.TV
•
Tuning blog: http://www.sqlperformance.com/
•
Monthly eNews tips and tricks:
http://www.sqlsentry.net/newsletter-archive.asp
AGENDA
• Quick Tips for Bare Metal and Windows Server OS
o CPU
o MEMORY
o STORAGE
•
•
•
•
•
Special Tempdb Considerations
Settings Inside the Engine
High-end Testing with HammerDB
Trace Flags
Resources & Wrap up
BARE METAL TUNING - THE FULL
STACK
DB
SQL Serv.
Windows
CPU
PCI Bus
I/O Controller / HBA
Cabling
Array
Key Takeaway: Maximum performance is NOT going to be easy…
Cache
Spindle
WHY ‘BARE METAL’ TUNING?
• ‘Throwing more money/hardware at the problem’ has its
limitations.
• Different workloads require different hardware
configurations.
• Large number of performance problems caused by
ignorance of hardware design.
• Segregation of job duties often precludes much learning
about hardware.
NON-IO BOTTLENECKS – SCALING EXAMPLE
• Interesting shape, what’s causing it?
Add a disk
RAID
Add a
controller
110
140
140
Backplane
limit
CPU – SPEED & BANDWIDTH
• CPU speed
o In general, expect a 6.5% increase in performance per 10% of added CPU
speed.
• Multiple CPUs
o Each added CPU adds processing power on a declining curve, starting at about
a 40% boost an declining curve to about an average 16% boost >= 8 cpus.
o Keep in mind that many SQL Server operations are single-threaded and do not
benefit from multiple cores
• CPU cache
• Best CPU reference material?
o Glenn Berry at http://sqlserverperformance.wordpress.com/
CPU – POWER
• Default Windows Server CPU power settings are
_______.
stupid
the [stupid] default
• Balanced power is _____________.
AMD CPUs
Intel E7 CPUs
MEMORY
• More is better (usually)
• Default settings may hurt you:
o Specify mix/max ram helps performance on systems dedicated to SQL Server
because it can reduce paging
o Parallelized queries consume more memory than non-parallel queries
o Still seeing AWE in use in 64-bit environments
• Windows OS setting Lock Pages in Memory Controversy
and Access Check Cache (v2008 R2 and later)
o Everything old is new again
o Do you know the workload?
MAX SERVER MEMORY
NOT
• By default, SQL engine will __________
acquire all
requested memory upon startup.
• Remember to leave headroom when configuring Max
Server Memory.
buffer cache
• Memory settings applies only to ____________.
TEMPDB USAGE
• Temporary objects, including table variables
• Version Store counters
• Certain features and operations utilize tempdb (it’s not
just used by temporary objects from T-SQL)
o Index rebuild with sort in tempdb and online index rebuild
o Sevice Broker
o RCSI
o Internal transaction objects (hash tables)
SPECIAL CONSIDERATIONS
Tempdb
• Size appropriately!
o Autogrown tempdb will reset after restart
• Not a bad idea to segregate onto its own array. For
example:
o Set the size to fill most of the drive
• Maybe overkill
• Never need to autogrow
• No performance disadvantage
o Microsoft IT
• 200GB drive
• 180GB TempDB
SETTINGS INSIDE THE ENGINE
Reasons for Poor Database Performance
100
80
60
40
20
0
Hardware
Settings
Reasons for Poor Database Performance
Code
DATABASE AND SERVER CONFIGURATION
Database Configuration – Autogrowth
• Can impact performance and introduce fragmentation
• Set fixed MB growth amounts
o SQL Server can use Windows 2003 instant file initialization so large
MB values are OK
• If the service account isn’t a local admin you need to assign it the ‘Performance
Volume Maintenance Tasks’ security policy
o However, it doesn’t work for Transaction logs so be a bit more
conservative
• Autogrowth is a last resort, size your database appropriately
FILE CONFIGURATION
Reducing allocation contention on Disk
• ¼ to ½ data files per logical cpu
o Maximum of 8
o One set of allocation pages per file
o All the files are equal size:
• Proportional fill algorithm balances the load
• Round-robin approach to inserted data
• Data file improvement is considerable, while log file
counter-productive
improvement is _______________.
DATABASE AND SERVER
CONFIGURATION
Tempdb / UserDB – Allocation contention
TEMPDB
PFS
PFS
64MB
SGAM
SGAM
4GB
Overview by Analogy
17
MORE TIPS ON TUNING STORAGE I/O
• There’s no benefit in multiple T-log files, but there -IS- a
benefit in having multiple data files
o If using multiple data files, ensure that they’re all the same size and have the
same autogrow settings.
o Round-robin, proportional fill algorythm.
o Ratio of CPU to database files.
• Presize database files, especially the T-log
• In SQL Server 2008 and later, data compression works
wonders
18
FILE SPACE ALLOCATION
Proportional Fill in action
File 1
Free Space
File 2
Free Space
File 3
Free Space
4 GB
2 GB
0 GB
FILE SPACE ALLOCATION
Proportional Fill in action
File 1
Free Space
File 2
Free Space
File 3
Free Space
File 4
Free Space
4 GB
2 GB
0 GB
THE TRANSACTION LOG – VIRTUAL LOG FILES (VLFS)
8K
VLF(1)
8K
LOGBUFFER
VLF(2)
8K
VLF(3)
8K
VLF(4)
X 128
8K
VLF(5)
8K
VLF(6)
<=60K
• When switching to new VLF – it has to be ”formatted” with 8K
sync write
o While this happens, transactions are blocked
• Up to 128 Log Buffers per database
o Spawned on demand, will not be released once spawned
o Transactions will wait for LOGBUFFER even when no buffer is available
RULES FOR BETTER STORAGE PERFORMANCE
•
•
•
•
•
•
•
•
•
Put logs and data on separate arrays and, if possible, on independent
channels.
Pre-size your data and log files; Don’t rely on AUTOGROW.
Put SQL Server data devices on a non-boot disk.
RAID 1 and RAID10 are much better than RAID5.
Tune TEMPDB separately.
Create data files all the same size per database and in ratio to CPUs.
Add spindles for read speed, controllers for write speed, SSDs for
ultimate speed.
Segregation … for the highly stressed database
Monitor, tune, repeat…
25
SSD - THE GAME CHANGER!
• Some big advantages:
o
o
o
o
Power consumption often around 20% of traditional drives
Random = Sequential (for reads)!
Extremely low latency on access
There are ”no moving, mechanical parts”
• Some caveats:
o All SSD devices are not create equal
o Beware of non-disk related bottlenecks
o Service processors, bandwidth between host/array, etc.
SQL SERVER VIEW OF STORAGE
Tool
Monitors
Granularity
sys.dm_io_virtual_file_stats
Latency, Number of IO’s, Size,
Total Bytes
Database files
sys.dm_os_wait_stats
PAGEIOLATCH, WRITELOG
SQL Server Instance level
(cumulative since last start – most useful to
analyze deltas over time periods)
sys.dm_io_pending_io_requests
I/O’s currently “in-flight”.
Individual I/O’s occurring in real time.
(io_handle can be used to determine file)
sys.dm_exec_query_stats
Number of …
Reads (Logical Physical)
Number of writes
Query or Batch
sys.dm_db_index_usage_stats
Number of IO’s and type of
access (seek, scan, lookup,
write)
Index or Table
sys.dm_db_index_operational_stats
I/O latch wait time, row & page
locks, page splits, etc.
Index or Table
Xevents
PAGEIOLATCH
Query and Database file
SP_CONFIGURE RED HERRING SETTINGS
Don’t Use These Unless Testing Proves Otherwise
•
•
•
•
•
•
•
Boost SQL Server Priority
Max Worker Threads
Lightweight Pooling
CPU Affinity
Locks
Query Wait
Query Cost Governor Limit
SP_CONFIGURE SETTINGS
Bottom-line Behavior
• Min and Max SQL Server Memory and “Lock Pages in
Memory”
• Fill Factor at 70-80% for OLTP applications
• Parallelism is not the boogyman:
o No kneejerk reactions – thinking about disabling MaxDOP (set to 1)?
o May prefer Cost Threshold for Parallelism, especially for OLTP applications
HIGH-END TESTING
Proving your hypothesis
TOOLS USED TO TEST IO
• HammerDB
• Microsoft tools: IOMeter.exe, SQLIOSim.exe, SQL
Server 2012 Distributed Replay Utility
• Quest Software Benchmark Factory
33
TPC-C WORKLOAD
TPC-C SCHEMA AND RATIOS
TPC-C TRANSACTIONS
• Only five transactions:
o
o
o
o
o
New-order: receive a new order from a customer: 45%
Payment: update the customers balance to record a payment: 43%
Delivery: deliver orders asynchronously: 4%
Order-status: retrieve the status of customer’s most recent order: 4%
Stock-level: return the status of the warehouse’s inventory: 4%
LOAD GENERATION WITH HAMMERDB
• Highly efficient & high-performance for multicore processors
• Configurable from menus or directly against config.xml
• 10% utilization on hammerdb can drive 100% CPU utilisation
on system under test (SUT)
• HammerDB uses ~10MB for the application and 2MB per
virtual user, e.g. 64 virtual users use ~138M RAM
• Virtual Users:
o Enabling Virtual User Show Output increases test latency, but is a lot more informative
o Recommendation is to set number of users on a exponential scale (e.g. 2, 4, 8, 16, 32
… n)
SUT REQUIREMENTS
• Supported against SQL Server 2008 and later
• CPU: to stress CPU on SUT, create TPCC database with
5 warehouses X logical cores of SUT
• Storage: ~100MB per warehouse plus 50% for additional
growth
• For example, a SUT with 64 logical cores:
o 64 * 5 = 320 warehouses
o (320 + 160) x 100MB = 48GB disk space
DEMO
• Let’s play with HammerDB
• Download at http://hammerora.sourceforge.net/
• Discussion forums and support at
http://sourceforge.net/projects/hammerora/
TRACE FLAGS
Breaking and Bending the Rules
WHAT ARE TRACE FLAGS?
• A trace flag is a directive used to “set specific server
characteristics or to enable/disable a particular behaviour”
• Enabled at different scopes {Global | Session} and contexts
{Startup | Query}
• Documentation sources: BOL, KB articles / Service Pack &
Cumulative Update “readme”s, White papers, Blogs / user
groups / “water cooler”
CAVEATS!
No legal action complaining if something goes wrong!
USING TRACE FLAGS IN T-SQL
• DBCC { TRACEON | TRACEOFF | TRACESTATUS }
o Use -1 to turn on trace flag globally
• Within a query using OPTION (QUERYTRACE n)
• -T startup flag
• Sometime trace flags seem to do “nothing”
o DBCC TRACEON (3604): Send output to console
o DBCC TRACEON (3605): Send output to ERRORLOG
OTHER WAYS TO SET TRACE FLAGS
• Microsoft says that you should do it by modifying the
registry
o Use –T# separated by semi-colon (;)
o http://msdn.microsoft.com/en-us/library/ms345416.aspx
• SQL Server Configuration Manager
• Registry Editor
TF’ING–A!
SAFE* IN PRODUCTION
• T610 speeds up high volume data loads by writing less
information to the transaction log for minimally logged
inserts into indexed tables.
• T834 enables SQL Server, on 64-bit systems, to use
large-page allocations for the buffer pool.
• T835 enables “lock pages in memory” for Std Ed
-T1118: BLESSED BY ZEUS
• Tells SQL Server to allocate full extents to each tempdb
objects, rather than mixed extents.
o Less contention on internal structures such as SGAM pages
o Story has improved in subsequent releases of SQL Server
o So represents a “edge case”
TF’ING
DEADLOCKS
• T1204 writes information about deadlocks to the
ERRORLOG in a “text format”. T1222 writes information
about deadlocks to the ERRORLOG in a “XML format”
o Resources
o Types of locks
• Command affected
TF’ING
UP LOCKS
• T1200 returns locking information in real-time as your query
executes
o
o
Use during development / testing phase
Great for learning how SQL Server implements locking
• T1211 disables lock escalation based on memory pressure or
number of locks. T1224 disables lock escalation until 40% of
memory is used and then re-enables escalation
o
o
o
When enabled, MSSQL won’t escalate row or page locks to table locks.
T1211 takes precedence over T1224
Microsoft recommends using T1224
• Trace flag 1211 prevents escalation in every case, even under memory pressure and
may help avoid "out-of-locks" errors when many locks are being used.
• Warning! Can generate excessive number of locks, can slow performance, even cause
1204 errors.
TF’ING
PARALLELIZED OPERATIONS
• T2528 disables parallel checking of objects during DBCC
CHECKDB, DBCC CHECKFILEGROUP and DBCC CHECKTABLE.
o
Typically leaves parallel DBCC checks enabled. DBCC operations can dynamically change
their degree of parallelism.
• T2562 perform the entire set of checks in a single ‘batch’ instead of
multiple batches, and increase the efficient of per-thread calls to get
a new list of pages to read.
• T2549 treats each database file as if it’s on a separate physical drive
for the purposes of driving read-ahead for DBCC CHECKDB
• Alternatives:
o
o
MAXDOP option
Resource Governor
TF’ING
DATA CONSISTENCY
• T806 enables DBCC audit checks to be performed on
pages to test for logical consistency problems.
o These checks try to detect when a read operation from a disk does not
experience any errors but the read operation returns data that is not valid.
o Pages will be audited every time that they are read from disk.
o Since page auditing can affect performance, it should only be used in systems
where data stability is in question.
TF’ING
UP THE DATABASE FILES
• T3004 returns more information about instant file
initialization (IFI). Useful to see if SQL Server has been
configured to take advantage of IFI correctly
• T1117 tells SQL Server autogrow all files in a database
at the same time
TF’ING
UP THE TRANSACTION LOG
• T3422 enables log record auditing to troubleshoot log file
corruption
o Careful! It introduces overhead to each transaction log record write.
o Similarly to trace flag 806, you would only use this to troubleshoot corruption
problems
TF’ING
UP CHECKPOINTS
• T3502 enables detailed tracking of CHECKPOINTs to
the ERRORLOG
• T3505 disables automatic checkpoints:
o Setting trace flag 3505 may increase recovery time and can prevent log space
reuse until the next checkpoint is issued.
o Make sure to issue manual checkpoints on all read/write databases at
appropriate time intervals
• Microsoft recommends that you do not change the
recovery interval because it may affect data safety and
availability.
TF’ING
•
•
•
•
•
UP BACKGROUND PROCESSES
T661 disables the ghost record cleanup process.
T8020 disables working set monitoring
T2330 disables all index usage stats gathering
T2371 enables proportional automatic update statistics
T2389 and T2390 enables auto-quick-statistics update
for ascending keys, whether known or unknown,
respectively
TF’ING
UP THE RING BUFFERS
• T818 enables the ring buffer for tracking the last 2,048
successful write operations, not including sort and
workfile I/Os.
• T8011 disables the ring buffer for Resource Monitor
• T8012 disables the ring buffer for schedulers
• T8018 disables exception ring buffer
• T8019 disables stack collections for the exception ring
buffer
TF’ING
UP THE OPTIMIZER
• T652 disables page pre-fetching scans
• T4199 enables all fixes that were previously made for the
query processor under many trace flags/hot fixes based on
“special” policy
• T2301 enables advanced optimizations specific for BI and
processing very large data sets
• T2312 and T9481 enables or disables the new SQL2014
cardinality estimator, respectively
• T2861 enables caching trivial plans
• T8744 disables pre-fetching ranges for nested loop
operations.
LIQUID TF’ING AWESOME
Others
for SQL2005
SQL2012• T8675
showsto information
on the stages of optimization. Very
useful for SQL query debugging and tuning.
7352 : Final query tree
• T8605
shows
8605
: Converted
tree query initial tree representation created by SQL
8606 Server
: Input, simplified, join-collapsed, and normalized trees
8607
Output
tree
Even
more optimizer-related
flags,
someused
of which
only work
• :T8606
shows
additional logical
trees
during
the on 2012, like:
8608 :optimization
Initial memo process.
8615 : Final2373
memo
: Memory before and after deriving properties and rules (verbose)
• T8607
shows
thehash
optimization
output
8675
: Optimization
stages
and
times
7357
: Unique
optimization
used tree.
8609
: Taskthe
and input
operation
• T8608
shows
treetype
forcounts
cost-based optimization when
8619 : Apply
description
first copied
into rule
thewith
Memo
structure. Final Memo structure is
8620
: Add
memo arguments to 8619
visible
with
T8615.
8621 : Rule with resulting tree
BUT I WANT MORE AWESOME!
• Combine T3604 with DBCC {RULEOFF | RULEON}
o DBCC RULEOFF ('GetToScan')
o DBCC RULEON ('GetToScan')
o DBCC RULEON ('JNtoHS')
o DBCC RULEON ('JNtoSM')
• Use DBCC {SHOWONRULES | SHOWOFFRULES} to
see the access methods chosen by the optimizer.
RESOURCES
• Best blogs:
o http://sqlskills.com/blogs/paul
o http://sqlperformance.com
• Trace Flags: http://www.victorisakov.com
• Look on MSDN for:
o
o
o
o
Storage Top 10 Best Practices
Predeployment I/O Best Practices
SQL Server Physical Database Storage Design
SQLIO Disk Subsystem Benchmark Tool
62
WRAP UP
1. Engage with us on social media.
o
We’re thankful for your word of mouth promotions and endorsements!
2. Share your tough SQL Server problems with us:
http://answers.sqlperformance.net
3. Download SQL Sentry Plan Explorer for free:
http://www.sqlsentry.net/plan-explorer/sql-serverquery-view.asp
4. Check out our other award winning tools:
http://www.sqlsentry.net/download