How SQL Server Indexes Work

Download Report

Transcript How SQL Server Indexes Work

How SQL Server Indexes Work
Sharon F. Dooley
[email protected]
SQL Server Indexes
•
SQL Server indexes are based on B-trees
– Special records called nodes that allow keyed access to data
– Two kinds of nodes are special
• Root
• Leaf Root node A O
A
Intermediate node
Leaf
node
A
C
Data
pages
A
T
E
G
I
O
I
K
M
N
C
O
Q
G
T
T
W
E
I
SQL Server B-Tree Rules
• Root and intermediate nodes point only to other nodes
• Only leaf nodes point to data
• The number of nodes between the root and any leaf is the
same for all leaves
• A node always contains between K and K/2 branches,
where K is the branching factor
– Branching factor is the number of keys in the node
• B-trees are always sorted
• The tree will be maintained during insertion, deletion, and
updating so that these rules are met
– When records are inserted or updated, nodes may split
– When records are deleted, nodes may be collapsed
What Is a Node?
• A page that contains key and pointer pairs
Key
Pointer
Key
Pointer
Key
Pointer
Key
Pointer
Key
Pointer
Key
Pointer
Key
Pointer
Key
Pointer
Splitting a B-Tree Node
Bob
Abby
Bob
Carol
Dave
Root (Level 0)
Abby
Ada
Andy
Ann
Node (Level 1)
Ada
Alan
Amanda
Amy
Leaf (Level 2)
Alan
Amanda
Carol
Amy
Dave
Ada
DB
Let’s Add Alice
• Step 1: Split the leaf node
Ada
Bob
Alan
Alan
Alice
Amanda
Amanda
Carol
Amy
Dave
Amy
Ada
Alice
Adding Alice
• Step 2: Split the next level up
Abby
Ada
Ada
Bob
Alan
Andy
Amanda
Alan
Alice
Amanda
Ann
Amanda
Carol
Amy
Dave
Leaf
Amy
Ada
Alice
DB
Adding Alice
(continued)
• Split the root
Abby
Abby
Ada
Ada
Bob
Andy
Carol
Bob
Andy
Amanda
Alan
Alan
Ann
Amanda
Alice
Amanda
Dave
Carol
Amy
Dave
Amy
Ada
Leaf
Alice
DB
Adding Alice
(continued)
• When the root splits, the tree grows another level
Abby
Abby
Andy
Bob
Abby
Ada
Amanda
Ada
Alan
Alice
Alan
Amanda
Bob
Root (Level 0)
Carol
Carol
Andy
Amy
Node
(Level 2)
Ann
Amanda
Carol
Node
(Level 1)
Dave
Dave
Leaf
(Level 3)
Amy
Ada
Alice
DB
Page splits cause fragmentation
•
Two types of fragmentation
– Data pages in a clustered table
– Index pages in all indexes
•
Fragmentation happens because these pages must be kept in order
•
Data page fragmentation happens when a new record must be added to a page
that is full
– Consider an Employee table with a clustered index on LastName,
FirstName
Adams, Carol
Ally, Kent
Baccus, Mary
David, Sue
Dulles, Kelly
Edom, Mike
Farly, Lee
Frank, Joe
...
Ollen, Carol
Oppus, Larry
Extent
Data Page Fragmentation
Adams, Carol
Ally, Kent
Baccus, Mary
Dulles, Kelly
Edom, Mike
David, Sue
Dent, Peter
Farly, Lee
Frank, Joe
...
...
Ollen, Carol
Oppus, Larry
Extent
Extent
Index Fragmentation
•
Index page fragmentation occurs when a new key-pointer pair must be
added to an index page that is full
– Consider an Employee table with a nonclustered index on Social
Security Number
036-11-9987, pointer
036-33-9874, pointer
038-87-8373, pointer
•
046-11-9987, pointer
048-33-9874, pointer
052-87-8373, pointer
Employee 048-12-9875 is added
116-11-9987, pointer
116-33-9874, pointer
...
124-11-9987, pointer
124-33-9874, pointer
125-87-8373, pointer
Extent
Index Fragmentation
(continued)
036-11-9987, pointer
036-33-9874, pointer
038-87-8373, pointer
048-33-9874, pointer
052-87-8373, pointer
046-11-9987, pointer
048-12-9875, pointer
116-11-9987, pointer
116-33-9874, pointer
...
...
124-11-9987, pointer
124-33-9874, pointer
125-87-8373, pointer
Extent
Extent
•
Studying Fragmentation in SQL Server
2000
To determine if there is fragmentation
– In a clustered table or a nonclustered index
DBCC SHOWCONTIG [([table_id | table_name | view_id |
view_name [, index_id | index_name])]
DBCC SHOWCONTIG scanning 'Employees' table...
Table: 'Employees' (1977058079); index ID: 1, database ID: 7
TABLE level scan performed.
- Pages Scanned................................: 707
- Extents Scanned..............................: 90
- Extent Switches..............................: 572
- Avg. Pages per Extent........................: 7.9
- Scan Density [Best Count:Actual Count].......: 15.53%
[89:573]
- Logical Scan Fragmentation ..................: 39.18%
- Extent Scan Fragmentation ...................: 58.89%
- Avg. Bytes Free per Page.....................: 4338.9
- Avg. Page Density (full).....................: 46.39%
DBCC execution completed. If DBCC printed error messages,
contact your system administrator.
Studying Fragmentation in SQL Server 2000
(continued)
• Unless the table or index spans multiple files
– Extent Switches and Extents Scanned should be approximately
equal
• Scan Density should be close to 100 percent
• Avg. Page Density should be high and Avg. Bytes Free Per Page
should be low
• Logical Scan Fragmentation and Extent Scan Fragmentation should be
as close to 0 as possible
• Clearly the Employees table is terribly fragmented!
Studying Fragmentation in SQL Server
2005 and 2008
SELECT object_name(s.object_id), name,
avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats
(db_id('databasename'), object_id('tablename'),
{indexid | NULL}, {partitionnumber | NULL},
{'LIMITED' | 'SAMPLED' | 'DETAILED' | NULL}) as s
INNER JOIN sys.indexes as i
ON s.object_id = i.object_id
AND s.index_id = i.index_id
•
•
If NULL is supplied for the last argument, LIMITED is assumed
The avg_fragmentation_in_percent should be as close to 0 as possible
Studying Fragmentation in SQL Server
2005 and 2008 (continued)
SELECT object_name(s.object_id), name,
avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats
(db_id('bigwind'), object_id('Employees'), null,
null, null) as s
INNER JOIN sys.indexes as i
ON s.object_id = i.object_id
AND s.index_id = i.index_id
WHERE alloc_unit_type_desc = 'IN_ROW_DATA'
•
Results
Employees
LastName_IDx
0.685
Employees
PK_Employees
3.0303
Employees
City_IDX
0
Employees
Region_IDX
3.922
Repairing Fragmentation
•
Repair index fragmentation by rebuilding index
•
Rebuilding clustered index repairs table fragmentation
•
DBCC DBREINDEX
DBCC DBREINDEX (tablename [, indexname [, fillfactor]])
– Can reorganize indexes that implement primary key and unique constraints
•
CREATE INDEX
•
– DROP_EXISTING causes SQL Server to create and drop the index in a
single step
• Faster than dropping with the DROP INDEX command and then
re-creating
ALTER TABLE … ADD CONSTRAINT … PRIMARY KEY or UNIQUE
Repairing Fragmentation
(continued)
DBCC INDEXDEFRAG does not
– Lock the index
– Do as thorough job as the other methods
– Allow specification of a fill factor
• Uses the fill factor from the last CREATE INDEX for this index
DBCC INDEXDEFRAG
( { database_name | database_id | 0 }
, { table_name | table_id}
, { index_name | index_id }
)
2008
• ALTER INDEX index_name ON table_name REORGANIZE 2005
– Same as DBCC INDEXDEFRAG
• ALTER INDEX index_name ON table_name REBUILD
– Allows concurrent access if you add WITH (ONLINE = ON) to the ALTER
INDEX command
– Uses the version store in tempdb
– Same as DBCC DBREINDEX
•
SQL Server Indexes
• SQL Server indexes come in two flavors
– Clustered indexes
• Database rows are in order on the index key
• The data pages are the leaf nodes of the index
– Nonclustered indexes
• Leaf level is in index order but the data is not
• Leaf nodes contain pointers to rows
• One clustered index per table
– Choose wisely
– Should always have a clustered index
• Allows reorganization of the data pages
• 249 nonclustered indexes per table
Clustered Index
Root
Abby
Ada
Alan
Abby
Bob
Ada
Andy
Amanda
Amy
Carol
Dave
Ann
Database and leaf node
Nonclustered Index
Root
Abby
Ada
Amy
Ada
Alan
Amanda
Abby
Bob
Ada
Andy
Amanda
Alan
Amy
Database
Carol
Ann
Leaf node
Dave
•
•
Clustered and Nonclustered Indexes
Interact
Clustered indexes are always unique
– If you don’t specify unique when creating them, SQL Server may add a
“uniqueifier” to the index key
• Only used when there actually is a duplicate
• Adds 4 bytes to the key
The clustering key is used in nonclustered indexes
– This allows SQL Server to go directly to the record from the nonclustered
index
– If there is no clustered index, a record identifier will be used instead
Leaf node of a nonclustered
index on LastName
Leaf node of a clustered
index on EmployeeID
Adams
3
1
Jones
John
Douglas
4
2
Smith
Mary
Jones
1
3
Adams
Mark
Smith
2
4
Douglas
Susan
Clustered and Nonclustered Indexes Interact
(continued)
• Another reason to keep the clustering key small!
• Consider the following query:
SELECT LastName, FirstName
FROM Employee
WHERE LastName = 'Douglas'
• When SQL Server uses the nonclustered index, it
– Traverses the nonclustered index until it finds the
desired key
– Picks up the associated clustering key
– Traverses the clustered index to find the data
Heaps and Chains
•
When you place a clustered index on a table, the pages are chained together in
a doubly linked list
•
•
SQL Server can follow the pointers to move from page to page
When there is no clustered index, the table is called a heap
– Data is located
• Through nonclustered indexes
• By scanning all the pages in the table
Indexes and Inserts
•
•
•
•
•
•
When there is no clustered index on a table, SQL Server uses the Page Free
Space page to find a page with space for the new record
Inserts into tables with a clustered index can cause page splits or hotspots
When a particular part of the database is particularly popular, it is called a
hotspot
Hotspots create contention problems
Clustered indexes on keys that arrive in random order cause page splits
Clustered indexes on keys that arrive in index order create hotspots
– All inserts are again at the end of the table
• Identities
• Dates
– However, there will be no page splits or collapses
Indexes and Updates
•
•
•
When data is modified, indexes may have to be modified as well
Changing data in a table with no indexes
– Data will be changed in place unless the update means that the row will no
longer fit on the page
– If the row won’t fit, it will simply be moved to a new page
Changing a clustering key column
– The row will be deleted from its original location
– It will be inserted into the new location
– All nonclustered indexes must be maintained
– Exception:
• A change that doesn’t affect the index order
Thomas Thompson Tyne
• Changing “Thompson” to “Thompsen” will be done in place
•
Avoid clustered indexes on columns that frequently change
Indexes and Updates
(continued)
•
•
Changing any non-key column in a table with only nonclustered indexes
– Data will be changed in place unless the row will no longer fit
– If the row will no longer fit
• It will be moved to another page
• A forwarding pointer will be left on the original page that points to the
new location
• Index pointers don’t need to be updated
• An additional I/O will now be required
• There will never be more than one forwarding pointer, no matter how
many times a row moves
To see whether updates have produced forwarding pointers, use
2000
DBCC SHOWCONTIG('tablename') WITH TABLERESULTS
SELECT forwarded_record_count FROM sys.dm_db_index_physical_stats
(db_id('databasename'),
2005
2008
object_id('tablename'), NULL, NULL, NULL)
Indexes and Deletes
•
•
Deleting from a heap
– Row is physically deleted
– Remaining rows on page below the deleted record are not moved up at the
time the delete happens
– Page will be compressed when space is needed for another row on the
page
Deleting from a leaf node
– Data pages of clustered index
– Leaf nodes of nonclustered index
– Records may not be physically deleted at the time the delete is issued
– Records may be marked as ghost records
• Used by lock manager
• Not retrievable by users
– Special SQL Server process cleans up the ghost records
• Won’t clean up records that are part of an active transaction
• Doesn’t compress page
Indexes and Deletes
(continued)
• Deleting from a nonleaf node
– No ghost records
– Page is not compressed
• When rows are deleted, both nonclustered and clustered
indexes must be maintained
• When the last row is deleted from a page (index or data),
the page is deallocated and returned to the free space pool
– Unless it is the only page in the table
– A table always has at least one page, even if it is empty
What Are Index Statistics?
• Metric used by the optimizer in determining
whether or not an index is useful for a particular
query
• Stored in
– An image column named statblob in the
sysindexes
table
2008
– An internal and invisible table 2005
• Essentially a histogram
• Statistics are kept for 200 steps
2000
Selectivity
•
•
•
The statistics allow the optimizer to determine the selectivity of an index
– A unique, single-column index always has a selectivity of 1
• One index entry points to exactly one row
Another term for this is density
– Density is the inverse of selectivity
• Density values range from 0 to 1
• A selective index has a density of 0.10 or less
• A unique, single-column index always has a density of 0.0
When the index is composite, it becomes a little more complicated
– SQL Server maintains detailed statistics only on the leftmost column
– It does compute density for each column
• Assume there is an index on (col1, col2, col3)
• Density is computed for
– Col1
– Col1 + Col2
– Col1 + Col2 + Col3
Exploring Statistics
•
To see the index statistics, use
– DBCC SHOW_STATISTICS ('tablename', {'indexname' |
'statisticsname'})
– DBCC SHOW_STATISTICS ('Employees', 'EmployeeName_Idx')
•
Interpreting the step output
RANGE_HI_KEY
Upper-bound value of a histogram step
RANGE_ROWS
Number of rows from the sample that fall within a
histogram step, excluding the upper bound
EQ_ROWS
Number of rows from the sample that are equal in value
to the upper bound of the histogram step
DISTINCT_RANGE_ROWS Number of distinct values within a histogram step,
excluding the upper bound
AVG_RANGE_ROWS
Average number of duplicate values within a histogram
step, excluding the upper bound
Exploring Index Statistics
(continued)
Statistics for INDEX 'EmployeeName_IDX'.
Updated
Rows
Rows Sampled
Avg key
length
-------------------- ------- ------------- ------ ------------ -----------Jan 27 2002 7:00PM 10009
10009
200
1.3958309E-4 28.475372
All density
-----------1.4271443E-4
9.9940036E-5
9.9910081E-5
Average Length
-------------13.252672
24.475372
28.475372
RANGE_HI_KEY
-----------Aaby
Abrahamson
. . .
Zuran
Zvonek
RANGE_ROWS
---------0.0
59.0
13.0
0.0
Density
Columns
-------LastName
LastName, FirstName
LastName, FirstName, EmployeeID
EQ_ROWS
------2.0
2.0
1.0
1.0
Steps
DISTINCT_RANGE_ROWS
-------------------0
43
10
0
The report for SQL
Server 2005 and
later is slightly
different at the
beginning
AVG_RANGE_ROWS
-------------0.0
1.3409091
1.3
0.0
Distribution
steps
Statistics Maintenance
• By default, SQL Server will automatically maintain the statistics
• Index statistics are computed (or recomputed) when the index is
created or rebuilt
• SQL Server keeps track of the updates to a table
– Each INSERT, UPDATE, or DELETE statement updates a
counter in sysindexes named rowmodctr
• Note that TRUNCATE TABLE does not modify this
counter
– Whenever the statistics are recomputed, the counter is set
back to zero
• When you issue a query, the optimizer checks rowmodctr to
see whether the statistics are up to date
– If they are not, the statistics will be updated
•
Statistics Maintenance
(continued)
Note that this may not always happen at the best time in a production system
– Can turn off automatic update
– Can manually update
– In SQL Server 2005 and later, can set the AUTO_UPDATE_STATISTICS_ASYNC
database option
Table type
•
Empty
condition
Threshold when empty
Threshold when not empty
Permanent < 500 rows
Number of changes >= 500
Number of changes >= 500 +
(Number of rows * 20%)
Temporary
Number of changes >= 6
Number of changes >= 500
< 6 rows
Example:
– Assume that a table has 1,000 rows
– The threshold would be 500 + (.20 * 1000)
– You would expect to see the statistics automatically updated
after about 700 modifications
Estimating Page Accesses
•
•
No index
– Number of data pages in the table
Equality query using a unique index
– Nonclustered index
• Number of index levels + 1 (if there is no clustered index)
• Number of nonclustered index levels + the number of levels in the clustered
index
– Clustered index
• Number of index levels
Estimating Page Accesses for a
Clustered Index
• Number of levels + number of data pages
– Number of data pages = number of qualifying rows /
rows per page
Root
Abby
Ada
Alan
Abby
Bob
Ada
Andy
Amanda
Amy
Carol
Dave
Ann
Database and leaf node
•
Estimating Page Accesses for a
Nonclustered
Index
Number of levels + number of qualifying leaf pages + number of rows or
number of levels + number of qualifying leaf pages + (number of rows * number of
clustered index levels)
– Number of qualifying leaf pages
= number of
Abby
Bob
Carol
Dave Root
qualifying rows / rows per page
– Assumes every row
is on a different page
Abby
Ada
Amy
Ada
Alan
Amanda
Ada
Amanda
Alan
Andy
Amy
Ann
Leaf node
Database
Covering Indexes
•
•
When a nonclustered index includes all the data requested in a query (both the items
in the SELECT list and the WHERE clause), it is called a covering index
With a covering index, there is no need to access the actual data pages
– Only the leaf nodes of the nonclustered index are accessed
Leaf node of a nonclustered
index on LastName, FirstName,
Birthdate
•
Adams
Mark
1/14/1956
3
Douglas
Susan
12/12/1947
4
Jones
John
4/15/1967
1
Smith
Mary
7/14/1970
2
The last column is
EmployeeID.
Remember that
the clustering key
is always included
in a nonclustered
index.
Because the leaf node of a clustered index is the data itself, a clustered index covers all
queries
Covering Indexes
(continued)
•
•
This index covers the following queries:
SELECT EmployeeID, LastName, FirstName, Birthdate
FROM Employees
WHERE Birthdate >= '1/12/1941'
SELECT LastName
FROM Employees
WHERE EmployeeID = 7
SELECT LastName, FirstName
FROM Employees
WHERE LastName BETWEEN 'A' AND 'C'
Remember that the number of accesses for a nonclustered index is the number of
levels + the number of qualifying leaf pages + either the number of rows or (the
number of rows * the number of levels in the clustered index)
– A covering index eliminates the “number of rows” term from the equation
– The optimizer is highly likely to use a nonclustered index because of this
Non-Key Index Columns
•
•
•
SQL Server 2005 and later allow you to include columns in a non-clustered
index that are not part of the key
– Allows the index to cover more queries
– Included columns only appear in the leaf level of the index
– Up to 1,023 additional columns
– Can include data types that cannot be key columns
• Except text, ntext, and image data types
Syntax
CREATE [ UNIQUE ] NONCLUSTERED INDEX index_name
ON <object> ( column [ ASC | DESC ] [ ,...n ] )
[ INCLUDE ( column_name [ ,...n ] ) ]
Example
CREATE NONCLUSTERED INDEX NameRegion_IDX
ON Employees(LastName)
INCLUDE (Region)