Single - Table Optimization

Download Report

Transcript Single - Table Optimization

Indexing for Performance
for SQL Server 2005
Single - Table Optimization
Chapter Four
Jeff Garbus – [email protected]
Tony Cannizzo – [email protected]
1
Who We Are, What We Do
18 Years of Solving Customer Problems
Across The Database Life Cycle

People




Solutions




Noted Authors
Hands-on Instructors
Experienced Consultants
Business Continuity
Application Availability
Performance Management
Core Competencies



• Deliverables








Performance Assessments
Architectural Review
P&T Training
Emergency Triage
Troubleshooting
Implementation Assurance
Hardware Validation
Contract DBA
Performance and Tuning • NEW!
 Remote Monitoring
Database Administration
Development Best Practices
Acknowledgements




© Soaring Eagle Consulting, Inc
Microsoft SQL Server and Microsoft
SQL Server Management Studio are
trademarks of Microsoft Inc.
This presentation is copyrighted.
This presentation is not for re-sale
This presentation shall not be used
or modified without express written
consent of Soaring Eagle
Consulting, Inc.
Page 4 - 3
Topics







Examine detailed topics in query optimization
Indexes with SARGs
Improvised SARGs
Clustered vs. nonclustered indexes
Queries with OR
Index covering
Forcing index selection
© Soaring Eagle Consulting, Inc
Page 4 - 4
SQL Server 2005
Search Techniques

SQL Server 2005 uses three basic search techniques for
query resolution
 Table Scans

Index Searches

Covered Index Searches
© Soaring Eagle Consulting, Inc
Page 4 - 5
Table Scans

If SQL Server 2005 can’t resolve a query any other way, it
does a table scan
 Scans are expensive

Table scans may be the best way to resolve a query

If there is a clustered index on the table, SQL Server will
try and use it instead of performing a table scan
Table Scan Search
select * from pt_tx where id = 1
© Soaring Eagle Consulting, Inc
Page 4 - 6
Table Scans (Cont’d)
Query Plan

Verify table scans with:
 set statistics io on
Table 'pt_tx'. Scan count 1, logical reads 38,
physical reads 0, read-ahead reads 0
© Soaring Eagle Consulting, Inc
Page 4 - 7
Table Scan Output: Update
update pt_tx set id = id + 1
showplan
© Soaring Eagle Consulting, Inc
Page 4 - 8
Index Selection
Topics
 Optimizer selection criteria

When indexes slow access

When indexes cause deadlocks

Index statistics and usage
© Soaring Eagle Consulting, Inc
Page 4 - 9
Optimizer Selection Criteria

During the index selection phase of optimization the
optimizer decides which (if any) indexes best resolve
the query

Identify which indexes match the where and join clauses
Estimate rows to be returned
Estimate page reads


© Soaring Eagle Consulting, Inc
Page 4 - 10
SARG Matching

Indexes usually correspond with SARGs

Useful indexes will specify a row or rows or set bounds for
the result set
where dob between '3/3/1941' and '4/4/65'

An index may be used if any column of the index matches
the SARG
create unique index nci on authors
(au_lname, au_fname)
© Soaring Eagle Consulting, Inc
Page 4 - 11
SARG Matching (Cont’d)
create unique index nci on authors
(au_lname, au_fname)

Which of the following queries (if any) could be helped by
the index?
select * from authors
where au_lname = 'Smith' or au_fname = 'Jim'
select * from authors
where au_fname = 'Jim'
select * from authors
where au_fname = 'Jim' and au_lname = 'Smith'

If there are not enough rows in the table, indexes that look
useful may never be used
© Soaring Eagle Consulting, Inc
Page 4 - 12
Index Selection
Topics

Review of index types

Optimizer selection criteria

When indexes slow access

When indexes cause deadlocks

Index statistics and usage
© Soaring Eagle Consulting, Inc
Page 4 - 13
Index Types

SQL Server provides three types of indexes




Clustered
Nonclustered
Full text
One clustered index per table

Data is maintained in clustered index order

248 nonclustered indexes per table

Nonclustered indexes maintain pointers to rows
Full text is beyond scope

© Soaring Eagle Consulting, Inc
Page 4 - 14
Clustered Index
Mechanism



With a clustered
index, there will be
one entry on the last
intermediate index
level page for each
data page
The data page is the
leaf or bottom level of
the index
(Assume a clustered
index on last name)
Intermediate Page
Data Page
Albert
Root Page
Albert
Jones
Mason
Quincy
Brown
Albert, John, ...
Exeter
Alexis, Amy, ...
Houston
Amundsen, Fred, ...
Jones
Jude
Klein
Best, Elizabeth, ...
...
Loon
Mason, Emma, ...
Mason
Masonelli, Irving, ...
Neenan
Narin, Mabelle, ...
Parker
Naselle, Juan, ...
Paul
Neat, Juanita
...
© Soaring Eagle Consulting, Inc
Baker, Joe, ...
...
Page 4 - 15
Nonclustered Index
Mechanism

The nonclustered
index has an
extra, leaf level
for page / row
pointers
Intermediate Page
Leaf Page
Amy
Root Page
Amy


Data placement
is not affected
by non-clustered
indexes
(Assume an NCI
on first name)
George
Sam
Zelda
Bob
Amy
Albert, John, ...
Dave
Amy
Alexis, Amy, ...
Elizabeth
Anabelle
Amundsen, Fred, ...
...
Baker, Joe, ...
George
...
...
...
...
Best, Elizabeth, ...
...
Elizabeth
Emma
Mason, Emma, ...
Sam
...
Masonelli, Irving, ...
...
...
Narin, Anabelle, ...
Naselle, Amy, ...
...
...
Zelda
© Soaring Eagle Consulting, Inc
Data Page
...
Neat, Juanita
...
Page 4 - 16
Clustered vs. Nonclustered

A clustered index tends to be 1 I/O faster than a
nonclustered index for a single-row lookup

Clustered indexes are excellent for retrieving ranges
of data

Clustered indexes are excellent for queries with
order by

Nonclustered indexes are a bit slower, take up much more
disk space, but are the next best alternative to a table scan

Nonclustered indexes may cover the query for maximal
retrieval speed

For some queries; covered queries, nonclustered indexes
can be faster

When creating a clustered index, you need free space in
your database approximately equal to 120% of the total
table size
© Soaring Eagle Consulting, Inc
Page 4 - 17
Using Indexes
Clustered Index Indications
 Columns searched by range of values

Columns by which the data is frequently sorted
(order by or group by)

Sequentially accessed columns

Static columns

Join columns (if other than the primary key)
Nonclustered Index Indications
 NCI selection tends to be much more effective if less than about
20% of the data is to be accessed

NCIs help sorts, joins, group by clauses, etc., if other column(s)
must be used for the CI

Index covering
© Soaring Eagle Consulting, Inc
Page 4 - 18
Other Index Limitations


Maximum 16 columns
Maximum 900 bytes column width
© Soaring Eagle Consulting, Inc
Page 4 - 19
Primary Key vs. Clustering
vs. Nonclustering






A primary key is a logical concept, not a physical concept
Indexes are physical concepts, not logical concepts
There is a strong correlation between the logical concept of
a key and the physical concept of an index
By default, when you define relationships as part of table
design, you will build indexes to support the joins / lookups
By default, when you define a primary key, you will create a
unique clustered index on the table
 Unique is good, clustered isn’t always good
When you define a clustered index, the server automatically
appends the key column(s) (plus a unique identifier, if
necessary) to the nonclustered indexes
© Soaring Eagle Consulting, Inc
Page 4 - 20
Key / index features
(New in SQL Server 2005)




Columns that are not part of the index key can be included in
nonclustered indexes. Including the nonkey columns in the index
can speed queries (Index covering) and can exceed the current
index size limitations of a maximum of 16 key columns and a
maximum index key size of 900 bytes
The new ALLOW_ROW_LOCKS and ALLOW_PAGE_LOCKS
options in CREATE INDEX and ALTER INDEX can be used to
control the level at which locking occurs for the index
The query optimizer can match more queries to indexed views
than in previous versions, including queries that contain scalar
expressions, scalar aggregate and user-defined functions, interval
expressions, and equivalency conditions
Indexed view definitions can also now contain scalar aggregate
and user-defined functions with certain restrictions.
 (More in “Views”)
© Soaring Eagle Consulting, Inc
Page 4 - 21
Optimizer Selection
Criteria

During the index selection phase of optimization the
optimizer decides which (if any) indexes best resolve the
query



Identify which indexes match the clauses
Estimate rows to be returned
Estimate page reads
© Soaring Eagle Consulting, Inc
Page 4 - 22
Index Selection Examples
1. What index will optimize this query?
select title
from titles
where title = ‘Alleviating VDT Eye Strain’
2. What indexes optimize these queries?
select title
from titles
where price between $5. and $10.
3. In the second query, what would the net effect be of
changing the range to this?
between $500 and $600
© Soaring Eagle Consulting, Inc
Page 4 - 23
CI vs. NCI
select title
from titles
where price between $5. and $10.

Table facts:
2,000,000 titles (= 14492 pages)
138 rows / page
1 million rows in the range
Index used
Clustered index
Non-clustered index
No index (table scan)
© Soaring Eagle Consulting, Inc
Page reads
7,247 + index levels
(worst case) 1,000,000 + index
pages
14492
Page 4 - 24
CI vs. NCI


It is feasible, occasionally likely, that a table scan is faster
than using a nonclustered index for specific queries
The server evaluates all options at optimization time and
selects the least expensive query
© Soaring Eagle Consulting, Inc
Page 4 - 25
Or Indexing
select title
from titles
where price between $5. and $10.
or type = 'computing'
Questions
 What indexes should (could) be used?
 Will a compound index help?
 Which column(s) should be indexed?
© Soaring Eagle Consulting, Inc
Page 4 - 26
Or Indexing (Cont’d)

How is the following query different
(from a processing standpoint)?
select title
from titles
where price between $5. and $10.
and type = 'computing'

What is a useful index for?
select *
from authors
where au_fname in ('Fred', 'Sally')
© Soaring Eagle Consulting, Inc
Page 4 - 27
Or Clauses

Format
SARG or SARG
select *
from authors
where au_lname = 'Smith'
or au_fname = 'Fred'
(How many indexes may be useful?)
select *
from authors
where au_lname in ('Smith', 'Jones', 'N/A')
© Soaring Eagle Consulting, Inc
Page 4 - 28
Or Strategy
An or clause may be resolved via a table scan, a multiple
match index or using or strategy
Table Scan





Each row is read, and criteria applied
Matching rows are returned in the result set
The cost of all the index accesses is greater than the cost
of a table scan
At least one of the clauses names a column that is not
indexed, so the only way to resolve the clause is to perform
a table scan
© Soaring Eagle Consulting, Inc
Page 4 - 29
Or Strategy (Cont’d)
Multiple match index



Using each part of the or clause, select an index and
retrieve the row
Only used if the results sets can not return duplicate rows
Rows are returned to the user as they are processed
© Soaring Eagle Consulting, Inc
Page 4 - 30
Or: Query Plan
select company, street2
from pt_sample
where id = 2017
or id = 2163
Query Execution Plan
© Soaring Eagle Consulting, Inc
Page 4 - 31
Index Selection and
the Select List
select *
from publishers
where pub_id = 'BB1111'
Questions
 What is the best index?
 Do the columns being selected have a bearing on the
index?
© Soaring Eagle Consulting, Inc
Page 4 - 32
Index Selection and
the Select List
Question
 Should there be a difference between the utilization of
the following two indexes?
select royalty
from titles
where price between $10 and $20
create index idx1 on titles (price)
/* or */
create index idx2 on titles (price, royalty)
© Soaring Eagle Consulting, Inc
Page 4 - 33
Index Covering

The server can use the leaf level of a nonclustered index
the way it usually reads the data pages of a table: this is
index covering

The server can skip reading data pages

The server can walk leaf page pointers

A nonclustered index will be faster than a clustered index if
the index covers the query for a range of data (why?)

Adding columns to nonclustered indexes is a common
method of reducing query time

This has particular benefits with aggregates
© Soaring Eagle Consulting, Inc
Page 4 - 34
Index Covering (Cont’d)

Beware making the index too wide; As index width
approaches row width, the benefit of covering is reduced
# of levels in the index increases
 Index scan time approaches table scan time
Remember that changes to data will cascade into indexes


© Soaring Eagle Consulting, Inc
Page 4 - 35
Composite Indexes

Composite (compound) indexes may be selected by the
server if the first column of the index is specified in a where
clause, or if it is a clustered index
create index idx1
on employee (minit, job_id , job_lvl)
© Soaring Eagle Consulting, Inc
Page 4 - 36
Composite Indexes (Cont’d)
create index idx1
on employee (minit, job_id , job_lvl)

Which queries may use the index?
select * from employee
where minit = 'A'
and job_id != 4
and job_lvl = 135
select * from employee
where job_id != 4
and job_lvl = 135
select *
from employee
where minit = 'A'
and job_lvl = 135
© Soaring Eagle Consulting, Inc
Page 4 - 37
Composite vs. Many Indexes

Each additional index impacts update performance
select pub_id, title, notes
from titles
where type = 'Computer'
and price > $15.

In order to select appropriate indexes, we need to know
how many indexes the optimizer will use, and how many
rows are represented by the where clause
© Soaring Eagle Consulting, Inc
Page 4 - 38
Options
select pub_id, title, notes
from titles
where type = 'Computer'
and price > $15.

CI or NCI on type

CI or NCI on price

One index on each of type & price

Composite on type, price

Composite on price, type

CI or NCI on type, price, pub_id, title, notes
Which are the best options in which circumstances?
© Soaring Eagle Consulting, Inc
Page 4 - 39
Index Usefulness






It is imperative to be able to estimate rows returned for an
index. Therefore, the server will estimate rows returned
before index assignation
If statistics are available (When would they not be?) the
server estimates number of rows using distribution steps or
index density
SQL Server 2005 automatically generates statistics about
index key distributions using efficient sampling algorithms
If you have an equality join on a unique index, the server
knows only one row will match and doesn't need to use
statistics
The query analyzer index analyzer can analyze a query and
recommend indexes
The more selective an index is, the more useful the index
© Soaring Eagle Consulting, Inc
Page 4 - 40
Data Distribution

You have a 1,000,000 row table. The unique key has a
range (and random distribution) of 0 to 10,000,000
Question
 How many rows will be returned by the following query?
 How does the optimizer know whether to use an index or
table scan?
select *
from table
where key between 1000000 and 2000000
© Soaring Eagle Consulting, Inc
Page 4 - 41
Index Statistics

SQL Server keeps distribution information about indexes in
a “statblob” column in the sysindexes table

There is distribution for every index

The optimizer uses this information to estimate the number
of rows returned for a query

The distribution information is built at index creation time
and maintained by the server if set to automatically do so
© Soaring Eagle Consulting, Inc
Page 4 - 42
Distribution Steps

The server creates the statistics by walking the index, and
storing appropriate key values at each step increment
10,000,000 rows have an integer key.
1 page has (2005 bytes / 4 bytes + 2 between)
=~ 500 steps
10,000,000 rows / 500 steps = 20,000 rows / step
© Soaring Eagle Consulting, Inc
Page 4 - 43
Distribution Steps


The optimizer will walk the index, storing the key value
every 20,000 rows
When a query is executed
select *
from table
where key
between 1000000 and 2005000

The number of keys in the range * 20,000 rows / key is the
approximate number of rows affected
© Soaring Eagle Consulting, Inc
Page 4 - 44
Viewing Index Statistics

Viewed with the dbcc show_statistics
dbcc show_statistics (table_name,index_name)
© Soaring Eagle Consulting, Inc
Page 4 - 45
Continued next page
Viewing Index
Statistics (Cont’d)
© Soaring Eagle Consulting, Inc
Page 4 - 46
Continued next page
Explaining DBCC
Show Statistics








Updated date and time: When the statistics were last
updated
Rows: Number of rows in the table
Rows Sampled: Number of rows sampled for statistics
information
Density: Selectivity of the index
Average key length: Average length of an index row
All density: Selectivity of the specified column prefix in the
index
Columns: Name of the index column prefix for which the all
density is displayed
Steps: Number of histogram values in the current
distribution statistics for the specified target on the
specified table
© Soaring Eagle Consulting, Inc
Page 4 - 47
Estimating Logical Page I/O

If there is no index, there will be a table scan, and the
estimate will be the number of pages in the table

If there is a clustered index, estimate will be the number of
index levels plus the number of pages to scan

For a nonclustered index, estimate will be index levels +
number of leaf pages + number of qualifying rows (which
will correspond to the number of physical pages to read)

For a unique index and an equality join, the estimate will be
1 plus the number of index levels
© Soaring Eagle Consulting, Inc
Page 4 - 48
When to Force
Index Selection
Don't Do it
 With every release of the server, the optimizer gets
better at selecting optimal query paths

Forcing the optimizer to behave in a specific manner
does not allow it the freedom to change selection as
data skews

It also does not permit the optimizer to take advantage
of new strategies as advances are made in the server
software
© Soaring Eagle Consulting, Inc
Page 4 - 49
When to Force
Index Selection (Cont’d)
Exceptions
 When you (the developer) have information about a
table that SQL Server 2005 will not have at the time the
query is processed (i.e., using a temp table in a nested
stored procedure)

Occasions when you've proven the optimizer wrong
© Soaring Eagle Consulting, Inc
Page 4 - 50
How to Force
Index Selection
How to Force Index Selection

To force the server to use a specific index for a specific
table, you must first know the index id of the index you
want to use
select *
from titles (2), publishers (1)
where titles.pub_id = publishers.pub_id

In this example, the titles index with the id of 2 will be used
for the titles table, and the publishers index with an id of 1
will be used for publishers
© Soaring Eagle Consulting, Inc
Page 4 - 51
When to Force
Index Selection (Cont’d)

The following SQL will list all table names and their
corresponding index ids
select 'table'=o.name, 'index'=i.name, indid
from sysindexes i, sysobjects o
where i.id = o.id

Allowing you to use the following syntax to force indexes
select *
from titles (index(titleind)), publishers
(index( UPKCL_pubind) )
where titles.pub_id = publishers.pub_id
Instead, identify why the optimizer picked incorrectly
© Soaring Eagle Consulting, Inc
Page 4 - 52
Summary

The optimizer uses indexes to improve query performance
when possible

Watch out for improvised SARGs

Queries with OR may require a table scan

Try to take advantage of covered queries

Be careful when forcing an index
© Soaring Eagle Consulting, Inc
Page 4 - 53
GOT BLAME?
The
Blame Game: “My piece is fine..”
Database
Network
Application
Storage
Insight: Drill into the Problem
Tier
Case Study #1 – Problem in the
Database

Pilot of a newly developed internet pharmacy
application



System crashing frequently


CPU (per MS Perf Mon) was pegging the box at 100%
Solution after 6 weeks?




SQL Server running on a 2-CPU system
currently supporting 120 users
add 2 more CPUs to double server capacity
performance was “acceptable”
but CPU was still at 95%
Panic had set in



going to 800+ users on initial production roll out
how much more hardware would they need?
so far, this increased their software license costs by $70K
Here
is what we found Monday at 10AM
Time
to Value – 1 Day
Monday
95%
of 4 CPU’s
Tuesday
7%
of 2 CPU’s
Drill Down: Instance>
Database> Statement
Drilling down into root causes
Further drill down
Click
Here
Statement-level
information
Significant
io wait
i3 traps, tracks, and stores all
query plans…
… And will tell you when & why the plans change
Drilling Down into Stored
Procedures
REPLACE
WITH MORE
RECENT PICTURE
Who is Doing the Work?
What tables are being accessed the most?
Drill down and identify the queries hitting them.
You’ve found the high-activity table;
how is it being accessed?
Ever wonder which indexes may be safely removed?
New Wait States
Procedure Cache
Actual Plan




Case Study #2: Not in the
Database
4-processor
SQL Server in an Application Service Provider (ASP)
environment
Key queries (stored procedures) were rewritten to run in under a
half second, from 5-20 seconds.
Further follow-up found that application response time was still
in the 5-second + category
We knew the issue was not in the database; but where was it?
How to demonstrate that it’s not the
database
Where is the application spending
its time?



When the vast majority of elapsed time is not in
the database, you have evidence that it’s not your
fault.
In this case, I had been brought in to tune the
database, knocked all the queries down from 5-10
second range to significantly subsecond.
Performance was still in the 3-6 second range for
much of the application. I knew it was not a
database issue, but what was it?
In this case, it was the result of significant
requests to an SSL layer; it turned out that some
of the screens were taking 5-6 seconds to encrypt
due to the quantity of dynamic data displayed.







Why
APM
is
Important
70% - studies have shown that 70%+ of all application performance issues
are directly attributable to the source code, NOT INFRASTRUCTURE
 It’s critical to monitor and track performance of the application code
components
25% - studies have shown that only 25% of performance issues identified
in production could have been anticipated and resolved in test/dev/QA
 Monitoring production performance is critical to application availability
and performance
60% of i3’s customer’s cancel or defer hardware upgrade purchases within
the first year of ownership
 Multi-tier web-based applications are extremely complex with great
interdependency among hardware/software/application components.
This makes it virtually impossible to determine root cause in a timely
fashion – a typical response to the frustration is to upgrade expensive
hardware. I3 streamlines your application components.
100% - i3’s customers have found that their staff can double their IT
responsibility without additional workload
3 – i3 has an average payback of 3 months, 1.8 months at one large
Federal agency
1-3% - i3 has an average overhead of 1-3% in production environments
24% of IT staff time is devoted to addressing application performance
delays