SQL Server Storage Engine
Download
Report
Transcript SQL Server Storage Engine
SQL Server Storage Engine
Software architect at Red Gate Software
Responsible for SQL tools:
◦
◦
◦
◦
SQL Compare, SQL Data Compare, SQL Packager
SQL Log Rescue
SQL Refactor
… many others
Events (NxtGenUG, VBUG, SQL Bits, PASS, many
other user groups)
SQL Server Central
Blog:
http://www.simple-talk.com/community/blogs/andras/default.aspx
Articles:
http://www.simple-talk.com/author/andr%c3%a1s-belokosztolszki/
Physical storage
◦ Pages, rows, data types, index structure
Data and schema modifications
◦ What happens when you change the schema
◦ What happens when a row is inserted, delted, etc
SQL Server 2008 features
◦ Compression and file streams
Primary database file (*.mdf)
Secondary database files (*.ndf)
◦ Optional, can be more than one
Log files (not covered)
Primary
Database
Secondary
Secondar
y
Log
Log
Data files are dividied up into
8KB pages
All information is stored in
pages (data, schema, database
information, space
allocation(GAM, SGAM, IAM), dlls)
Identified by fileId:PageId (2+4
bytes)
8 pages = 1 extent
Most important for us is the
data page
1:0 1:1 1:2 1:3
1:4 1:5 1:6 1:7
1:8 1:9 1:A 1:B
1:C 1:D 1:E
1:F
Page header (96 bytes)
Data rows
Offset array
DBCC PAGE
◦ (db,file,page,options)
2 – raw, 3 – row details
◦ Trace flag 3604
Demo
Page header
Stat
A
(1)
Stat
B
(1)
Null
offset
(2)
Fixed
Length
Data
Colum
n
Count
(2)
Null
bitmap
Ceiling(ColCnt/
8)
VarLen
Colum
n
Count
(2)
Var.
Offsets
VarLen
Data
Fixed length data will always use its allocated
space (even when it is null)
Must fit a page (max 8060 bytes)
Demo
◦ Some items can overflow: Overflow space
See sys.types
◦ Fixed length (some can be adjusted (time, decimal,
char(), …)
Always consumes this space
◦
◦
◦
◦
Variable length (varchar, varbinary, …)
Bit (packed)
SqlVariant
Binary large objects (ntext, varchar(max), …)
After a certain size stored on other pages
Heap/Index
1
sys.indexes
N
Partition
sys.partitions
sys.partitions sp
JOIN sys.allocation_units au
ON sp.partition_id = au.container_id
1
3
Allocation
Unit
sys.allocation_uni
ts
In row data
LOB
Row overflow
Root level
Interior levels
Level 0
Leaf level
Row Data
The full row record is at the leaf level
◦ Consequently there can be only one clustered index
In the intermediary and root levels a clustered
key is stored, for the first entries of the next
level pages
If the key row length is e.g. 15 bytes, an
intermediary page can store up to (8096/15
=) 539 rows (reference 539 pages)
Exact space usage in sys.allocation_units
Pages are double linked
Root level/
Interior levels
Leaf level
Row Data
See sys.allocation_units
Max 900 bytes per entry!
Index entry contains the key columns, and
◦ Index key columns
◦ Record locator (nonclusered)
Row ID or clustering key (not stored redundantly)
◦ Down pointer (for non leaf pages)
Stat
A
(1)
Fixed
Length
Data
Colum
n
Count
(2)
Null
bitmap
Ceiling(ColCnt/8
)
VarLen
Colum
n
Count
(2)
Var.
Offsets
VarLen
Data
Motivation:
◦ When using a clustered index on heap, an item is looked
up, then one more page read to retrieve extra data
◦ When using a clustered index on a B-tree, the clustered
index structure is also traversed
You can include extra columns in a nonclustered index
These will not be used to look up rows in the
table
Increases the coverage of an index
Increases the size of an index record -> the total
size
Extra maintenance
Everything is stored on pages
Rows have fixed and variable length portions
◦ Differences between certain data types and their
limitations
Index structures
◦ Size estimates for indexes, page estimates for
queries
The fewer pages we load into memory, the
better?
Schema changes
Data changes
• Adding a column
• Inserting a row
• Changing a
column
• Deleting a row
• Dropping a
column
• Altering a row
◦ What can happen:
No rows are modified, only meta information
All rows are examined
E.g. changing nullability
Int to smallint (wasted space!)
All rows are rebuilt
◦ We may end up wasting a lot of valueable space!
How can we reclaim the space?
Demo
Insert: added where there is space
Delete: removed or marked as ghost
Update: Since indexes refer to file:page:slot if
a row no longer fits on a page, it cannot
easily be moved -> it is moved, but a
reference to it is left (forwarded record)
Insert: Since the rows are ordered, if there is not
enough space on a table, the table is split into
two (can happen many times)
Update:
◦ like inserts, if the new row is too big to fit
◦ Changes to clustering columns = delete+insert
Delete: the row is marked as ghost or is deleted
Pad Index
•Intermediary pages only
•Specified as percentage
Fill Factor
•Leaf pages only
•Specified as
percentage
Only when index is created or rebuilt. The free space is NOT maintained. (see
later index reorganization and rebuilding)
sys.dm_db_index_physical_stats()
Logical fragmentation: next leaf page for index
page is not the next page that is allocated to the
index
Extent fragmentation: extents are not contiguous
Page fill
Drop and create the clustered index
◦ Index is offline
ALTER INDEX REORGANIZE
◦ This is the replacement for DBCC INDEXDEFRAG
◦ Reorganizes index pages (and compacts pages and
LOBs) (NO new pages)
ALTER INDEX REBUILD
◦ This is the replacement for DBCC DBREINDEX
◦ Basically drops and recreates the index
Introduced in SQL Server 2008
Stores fixed length data as variable length
◦ E.g. Integer – can use 1,2,3,4 bytes + bits instead
of 4 bytes + bit
CREATE TABLE RowCompressedTable
(…)
Available
in Enterprise
edition
WITH (DATA_COMPRESSION
= Row);
Stat
A
(1)
Colum
n
Count
(1/2)
CD
Array
(4b/co
l)
Short data
Null
bitmap
Ceiling(ColCnt/
8)
VarLen
Colum
n
Count
(2)
Var.
Offsets
VarLen
Data
CD Array: 0 = null, 1 – 9 number of bytes, 10
– long
Self contained
WITH (data_compression = row)
Row compression
Prefix compression
Dictionary compression
When table created, there is no compression
Row compression kicks in when otherwise a
page split would occur
When table with data converted it is rebuilt
sp_estimate_data_compression_savings
Page header
Page header
aaabcc
aaaacc
abcd
aaabb
aaaab
abcd
4b
4b
[]
aaabcc
bbbb
abcd
[]
0bbbb
[]
aaaccc
aaaacc
bbbb
3ccc
[]
0bbbb
Page header
aaabcc
aaaacc
Page header
abcd
aaabcc
aaaacc
4b
0bbbb
abcd
4b
4b
[]
0
0
[]
[]
0bbbb
[]
[]
1
[]
3ccc
[]
0bbbb
3ccc
[]
1
B-tree structure
Many pages need to be looked up
Smaller BLOBs can be inlined
sp_tableoption <tablename>, ‘text in row’, <length>
Data row
Text Pointer
Root entry
Intermediate node
Data fragment
Data fragment
Intermediate node
Data fragment
Data fragment
When BLOBs are not enough:
◦ Large items (over 1Mb)
◦ Very fast read is needed
◦ 2GB++
Can use T-SQL to access
File stream access vie Win32 API
Static data storage
◦ Table and index rows
◦ The way these are linked together
What happens during schema and data
modifications
Lessons to take away
◦ Minimize the number of pages you need to read or
write
◦ Rebuild your tables and use fill factor, and rebuild
indexes durng off peak hours!
◦ Use the specialized data types and storage options
Thanks to SQL Bits & Sponsors
Blog:
http://www.simpletalk.com/community/blogs/andras/default.
aspx
Email: Andras.Belokosztolszki (at) redgate.com