pptx - SQLSaturday

Download Report

Transcript pptx - SQLSaturday

Effective Indexing
With Partitioning and Compression
Neil Hambly
SQL Server Practice Lead
PASS Chapters Leader
12.12.2015
12.12.2015
Thank you to our AWESOME sponsors!
12.12.2015
Please silence
cell phones
3
Explore Everything PASS Has to Offer
FREE ONLINE WEBINAR EVENTS
4
FREE 1-DAY LOCAL TRAINING EVENTS
VOLUNTEERING OPPORTUNITIES
LOCAL USER GROUPS
AROUND THE WORLD
ONLINE SPECIAL INTEREST
USER GROUPS
PASS COMMUNITY NEWSLETTER
FREE ONLINE RESOURCES
BUSINESS ANALYTICS TRAINING
BA INSIGHTS NEWSLETTER
Who is Neil Hambly
PASS London Chapter Leader
PASS Professional Development VC Leader
Melissa Data MVP (2015)
Permanent Roles
Northdoor PLC (London) Oct 2014 - Present
SQL Server Technology Evangelist | Practice Lead
Prior Roles
ASOS:
Confio:
MDSL:
iProfile:
Accenture:
Principal Database Engineer
Senior System Engineer
DB Architect
DB Architect | DBA | DB Developer (Database Lead )
DBA | DB Developer (Global BI - Team Leader)
Contact info
PASS Summit Presenter (2011,2012,2013,2014,2015)
PASS SQLSaturdays (35+) & PASS Virtual Chapters
PASS BA Conference (2012)
PASS Rally’s in US & Europe
UK SQL Relay Organiser & Presenter
Regular UK & International presenter (>100 since 2010)
SQLCruise 2013 Technical Lead
SQL Professional working with SQL Server for 18+ years
Just love’s attending SQL Events & spending time with #SQLFamily
@Neil_Hambly @SQLNorthdoor
http://www.linkedin.com/in/neilhambly
PASS Chapter Website:
www.sqllondon.sqlpass.org
Personal Blog: http://dataidol.com/NeilHambly
Northdoor Email: [email protected]
Contributor TK70-462
Agenda
How to effectively Index with Partitioning & Compression features
Partitioning
•
•
•
Partition Functions (PF); Partition Schema; Filegroups
Partition Actions - Splitting, Merging, Switching & Sliding Windows
•
•
Vardecimal; Row & Page; Unicode
Estimating space savings
•
•
•
Usage & Operational
Finding Objects for Compression
Partition-Info Script – Buffer Pool; Fragmentation & More
•
Performance Queries; Maintenance Operations; Data Loading, DBCC Page {Structures}
•
Single Partition OIR; Managed Lock Priority
Compression
•
Indexing DMVs
•
Demo’s
•
Improvements (through Versions)
•
6
Partitioning – Quick Refresher
7
Metadata for Partition
8
Partition Function
Used to define the # boundary’s {maximum 15000}
CREATE PARTITION FUNCTION partition_function_name
( input_parameter_type ) AS RANGE [ LEFT | RIGHT ]
FOR VALUES ( [ boundary_value [ ,...n ] ] ) [ ; ]
Permissions
Any one of the following permissions can be used to execute CREATE PARTITION FUNCTION:
• ALTER ANY DATASPACE permission.
This permission defaults to members of the sysadmin fixed server role and the db_owner and db_ddladmin fixed database roles.
• CONTROL or ALTER permission on the database in which the partition function is being created.
• CONTROL SERVER or ALTER ANY DATABASE permission on the server of the database in which the partition function
is being created.
9
Example: Monthly partitions
--Create date partition function with increment by month.
DECLARE @DatePartitionFunction nvarchar(max) = N'CREATE PARTITION FUNCTION
DatePartitionFunction (datetime2) AS RANGE RIGHT FOR VALUES (';
DECLARE @i datetime2 = '20100101';
WHILE @i < '20160101'
BEGIN
SET @DatePartitionFunction += '''' + CAST(@i as nvarchar(10)) + '''' + N', ';
SET @i = DATEADD(MM, 1, @i);
END
SET @DatePartitionFunction += '''' + CAST(@i as nvarchar(10))+ '''' + N');';
EXEC sp_executesql @DatePartitionFunction;
GO
-- https://msdn.microsoft.com/en-us/library/ms187802.aspx
Partition Schema
Creates a scheme in the current database that maps the partitions of a
partitioned table or index to filegroups
CREATE PARTITION SCHEME partition_scheme_name AS
PARTITION partition_function_name [ ALL ] TO ( {
file_group_name | [ PRIMARY ] } [ ,...n ] ) [ ; ]
Permissions
The following permissions can be used to execute CREATE PARTITION SCHEME:
• ALTER ANY DATASPACE permission.
This permission defaults to members of the sysadmin fixed server role and the db_owner and db_ddladmin fixed database roles.
• CONTROL or ALTER permission on the database in which the partition scheme is being created.
• CONTROL SERVER or ALTER ANY DATABASE permission on the server of the database in which the partition scheme is
being created.
11
Partitioning Actions
Splitting a range
When partitions are split by using the ALTER PARTITION statement, both partitions
inherit the data compression attribute of the original partition.
ALTER PARTITION SCHEME TransactionsPS1 NEXT USED [FG_1]
ALTER PARTITION FUNCTION pfn_YearMonths() SPLIT RANGE('12/31/2015')
Merging a range
When two partitions are merged, the resultant partition
inherits the data compression attribute of the destination partition.
ALTER PARTITION FUNCTION SalesData_Function() MERGE RANGE ('12/31/2014')
Switching partitions
To switch a partition, the data compression property of the partition
must match the compression property of the table.
ALTER TABLE [Transactions] SWITCH PARTITION 1 TO [Transactions_Archive]
Note: To do partition switching both tables must exist in the same file-group
12
Demo #1
Data Loading (Partitions)
{Performance}
Comparing Row Indexing vs Columnstore
4 GB Data + Indexes (Row Compression)
14
vs
0.7GB Clustered Columnstore Compressed
Demo #2
Partition Switching
{Performance}
Sliding Window example (Left)
Sliding a LEFT Partition Function Window
At the start of a new period with a LEFT partition function, partition 1 contains the old
data, the second from last partition contains the current data and then last partition
(empty) is for future data. The partition can be maintained as follows:
1) SWITCH out partition 1 to a staging table for archive/purge
2) MERGE empty partition 1 with non-empty partition 2 to form a new partition 1 (with
data)
3) SPLIT last partition so that the last 2 partitions are now empty
4) Repeat steps 1 through 3 after the start of the next period
16
Sliding Window example (Right)
Sliding a RIGHT Partition Function Window
The RIGHT partition maintenance strategy is slightly different that the LEFT strategy.
The difference is that partition 2 rather than partition 1 contains the old data at the start
of a new period.
However, just like the LEFT strategy, the second from last partition contains the current
data and then last partition (empty) is for future data. The partition can be maintained as
follows:
1) SWITCH out partition 2 to a staging table for archive/purge
2) MERGE empty partition 1 with empty partition 2 to form a new partition 1 (empty)
3) SPLIT last partition so the last 2 partitions are now empty
4) Repeat steps 1 through 3 after the start of the next period
17
Compression – Quick Refresher
18
Data Types
Data Compression was introduced in SQL 2005 (SP2) through “Vardecimal storage”
This was superseded in SQL Server in 2008 Version (Row & Page level)
Compression feature is only available in Enterprise and Developer SKU’s
Note: NULL and 0 values across all data types are optimized to take 0 bytes in addition to 4 bits/per Column
Data Type benefits of compression
Yes – Depends on data
No - Little benefit (Normally)
bit / smallint / int / bigint
tinyint / uniqueidentifier
decimal / numeric
datetime / datetime2 / datetimeoffset smalldatetime / date / time
char / nchar / nvarchar
varchar / text / ntext / image
binary / timestamp / Rowversion
varbinary
money / smallmoney
float / real
sql_variant / FileStream
cursor / table / xml
19
19
SQL 2005 SP2 (Vardecimal)
The vardecimal works the same way as the varchar works for efficiently storing alphanumeric data.
HOW TO TURN ON THE VARDECIMAL OPTION
Step 1:
exec sp_db_vardecimal_storage_format '<dbname>', 'ON'
This option will enable the vardecimal storage on the specified database.
Step 2:
exec sp_tableoption '<table>', 'vardecimal storage format', 1
This option will turn on the vardecimal storage for the existing table.
HOW TO ESTIMATE WHETHER IT IS WORTH TURNING
ON THIS OPTION
sys.sp_estimated_rowsize_reduction_for_vardecimal '<table>'
Column
precision
Maximum
Original fixed
vardecimal
decimal size
data area
(bytes)
(bytes)
Overhead
to store
offset
(bytes)
Maximum
vardecimal
storage used
(bytes)
1-3
4-6
7-9
10-12
13-15
16-18
19
20-21
22-24
25-27
28
29-30
31-33
34-36
5
5
5
9
9
9
9
13
13
13
13
17
17
17
3
4
5
6
8
9
10
10
11
13
14
14
15
16
2
2
2
2
2
2
2
2
2
2
2
2
2
2
5
6
7
8
10
11
12
12
13
15
16
16
17
18
37-38
17
18
2
20
Page & Row Level Compression
Introduced in 2008 version, we have 2 Compression algorithms available
Compressing the leaf level of tables and indexes with page compression consists of
three operations, these are done in the following order:
1. Row compression
2. Prefix compression
3. Dictionary compression
Page Compression
Note: When using page compression, the non–leaf-level pages of indexes are only row compressed
Row
Compression
Prefix
Compression
Dictionary
Compression
Row Compression
Row Compression
Row compression reduces the amount of space rows take up on a page in
several ways:
Metadata overhead is reduced.
It uses the least amount of storage possible for some data types.
Some numeric data types are reduced in size.
Some character data types remove padding.
NULL and 0 values take up no space on a page.
•
•
•
•
•
22
Page Compression
Page Compression
Page compression can further reduce the amount of space rows take up on a
page by applying 2 further compression algorithm's:
Prefix Compression.
•
For each page that is being compressed, prefix compression uses the following steps:
• For each column, a value is identified that can be used to reduce the storage space for the values in each column.
• A row that represents the prefix values for each column is created and stored in the compression information (CI) structure that
immediately follows the page header.
• The repeated prefix values in the column are replaced by a reference to the corresponding prefix. If the value in a row does not
exactly match the selected prefix value, a partial match can still be indicated
•
Dictionary Compression.
•
•
•
•
After prefix compression has been completed, dictionary compression is applied.
Dictionary compression searches for repeated values anywhere on the page, and stores them in the CI area.
Unlike prefix compression, dictionary compression is not restricted to one column.
Dictionary compression can replace repeated values that occur anywhere on a page.
23
Unicode Compression
Space savings that can be achieved for different locales (15-50%)
Locale
Compression %
English
50%
German
50%
Hindi
50%
Turkish
48%
Vietnamese
39%
Japanese
15%
Standard Compression Scheme for Unicode (SCSU)
SCSU algorithm to compress Unicode values that
are stored in row or page compressed objects
Unicode compression supports the fixedlength nchar(n) and nvarchar(n) data types.
Data values stored off row or in nvarchar(max)
columns are not compressed
SCSU can also switch to UTF-16 internally to handle non-alphabetic languages
Objects to compress
;WITH CTE_Partitions
([ObjectID],[Table_Name],[Index_Name],[Partition],[Index_ID],[Index_Type],[Percent_Update],[Percent_Scan],[Compression_Level],[IN_ROW_DATA],[ROW_OVERFLOW_DATA],[LOB_DATA]) AS
(SELECT
o.object_id
AS [Objectid]
,o.name
AS [Table_Name]
,i.name
AS [Index_Name]
,ios.partition_number
AS [Partition]
,i.index_id
AS [Index_ID]
,i.type_desc
AS [Index_Type]
,ios.leaf_update_count * 100.0 /
(ios.range_scan_count + ios.leaf_insert_count + ios.leaf_delete_count
+ios.leaf_update_count + ios.leaf_page_merge_count + ios.singleton_lookup_count)
AS [Percent_Update]
,ios.range_scan_count* 100.0 / (ios.range_scan_count + ios.leaf_insert_count
+ ios.leaf_delete_count + ios.leaf_update_count + ios.leaf_page_merge_count
+ ios.singleton_lookup_count)
AS [Percent_Scan]
,p.data_compression_desc
AS [Compression_Level]
,ps.in_row_used_page_count
AS [IN_ROW_DATA]
,ps.row_overflow_used_page_count
AS [ROW_OVERFLOW_DATA]
,ps.lob_used_page_count
AS [LOB_DATA]
FROM sys.dm_db_partition_stats
ps
JOIN sys.partitions
p
ON ps.partition_id= p.partition_id
JOIN sys.indexes
i
ON p.index_id = i.index_id AND p.object_id = i.object_id
JOIN sys.dm_db_index_operational_stats(db_id(),NULL, NULL, NULL) ios
ON i.object_id = ios.object_id AND i.index_id = ios.index_id
JOIN sys.objects
o
ON o.object_id= ios.object_id
WHERE(ios.range_scan_count + ios.leaf_insert_count + ios.leaf_delete_count + ios.leaf_update_count + ios.leaf_page_merge_count + ios.singleton_lookup_count) != 0
AND objectproperty(i.object_id,'IsUserTable')= 1
)
Select * From CTE_Partitions
ORDER BY [IN_ROW_DATA] DESC,[Index_Name]ASC, [Percent_Update]ASC
Estimate Space for data in different
compression states
To determine space for a each compression version we can use the system procedure
sp_estimate_data_compression_savings
Example
EXEC sp_estimate_data_compression_savings
‘Sales',
-- [@schema_name = ]
'schema_name'
‘SalesOrderDetailLoad',
-- [@object_name = ]
'object_name'
NULL,
-- [@index_id = ]
index_id
NULL,
-- [@partition_number = ] partition_number
'ROW'
-- [@data_compression = ] 'data_compression'
;
Tip: Use sp_estimate_data_compression_savings to gauge defragmentation savings, just run
with same value for @data_compression as exists already
Guide for Resource Requirements performing Data Compression
Table shows a summary of workspace, CPU, and I/O requirements for compressing a clustered index as compared to
rebuilding the same uncompressed index.
Measurements used:
• X = number of pages before compression (or rebuild)
• P = number of pages after compression (P < X)
• Y = number of new or updated pages (by a concurrent application, applies only to the ONLINE case)
• M = size of the mapping index (estimate based on guidelines in the TEMPDB Capacity Planning white paper)
• C = the CPU time taken to rebuild the uncompressed index
Workspace, CPU, and I/O summary for compressing a clustered index
Workspace
TEMPDB
I/O
UserDB
UserDB Tran
Log
CPU
TEMPDB
UserDB
UserDB Tran
Log
OFFLINE with BULK_LOGGED or SIMPLE Recovery Model
Rebuild
0
X
~0
0
X+2X
~0
C
Compress
0
P
~0
0
X+2P
~0
1.5C to 5C
OFFLINE with FULL Recovery Model
Rebuild
0
X
X
0
X+X
X
~C
Compress
0
P
P
0
X+P
P
1.5C to 5C
ONLINE with FULL Recovery Model
Rebuild
M+Y
X+Y
2X+Y
M+4Y
X+X+Y
2X+Y
~2C
Compress
M+Y
P+Y
2P+Y
M+4Y
X+P+Y
2P+Y
3C to 10C
Mixed Compression Levels
Each partition can be compressed at different level if desired
An example Table with partition based on Date with monthly boundaries
Each year consists of 12 partitions (Example Jan 2014 – Dec 2014)
Alter Table dbo.OurTableName
Rebuild Partition = All
With
(
Data_Compression = Page On Partitions(1 to 8)
, Data_Compression = Row On Partitions(9 to 11)
, Data_Compression = None On Partitions(12)
);
With the above the most recent month (Dec) has
3 months before that are compressed with the
The 1st 8 months before those have applied the
NO-Compression
ROW-Compression
PAGE-Compression
Using a ‘Sliding Window’ strategy, as Data is Aged.. It has an increased compression level
it is less frequently accessed, so a higher compression CPU can be allowed to save more space
Demo #3
Partition Elimination
{Query Performance}
Demo #4
Review Page Structure
DBCC Page | fn_physlocformatter
How Compression can affect other SQL Server Components
Bulk import and export operations
When data is exported, even in native format, the data output is in the uncompressed row format.
This can cause the size of exported data file to be significantly larger than the source data.
When data is imported, if the target table has been enabled for compression, the data is
converted by the storage engine into compressed row format.
This will increase CPU usage compared to when data is imported into an uncompressed table.
When data is bulk imported into a heap with page compression, the bulk import operation will try
to compress the data with page compression as the data is inserted.
Compression does not affect backup and restore.
Compression does not affect log shipping.
Enabling compression may cause query plans to change
Because the data is stored using a different number of pages and number of rows per page.
Data compression is supported by SSMS through the Data Compression Wizard.
31
Some compression considerations….
• Compression on a per partition level
• Rebuild each partition separately [Online if needed]
• Helping to keep fragmentation to min levels
• Determine effective fill-factors
• Base this on the KeySize
• Use compression levels for improved page densities
• Be aware of Activity levels selecting appropriate
{Non | Row | Page} compression level
• Consider using Filtered Multi-columns
• <user created statistics> on each partition
• Align with PF to ensure the Filter applies to 1 partition
32
Further Links to Resources on Data Compression in SQL Server
Whitepapers
Data Compression: Strategy, Capacity Planning and Best Practices (SQL 2008)
http://msdn.microsoft.com/en-us/library/dd894051.aspx
Vardecimal Whitepaper (SQL2005)
http://download.microsoft.com/download/2/7/c/27cd7357-2649-4035-84af-e9c47df4329c/vardecimalstorage.docx
SQL Server I/O Reliability Program
http://www.microsoft.com/sqlserver/en/us/solutions-technologies/mission-critical-operations/high-availability.aspx
WIKI
http://en.wikipedia.org/wiki/Standard_Compression_Scheme_for_Unicode
Blogs
MS SQL Server Storage Engine
http://blogs.msdn.com/b/sqlserverstorageengine/archive/tags/data+compression/
KB Articles
Microsoft SQL Server Database Engine Input/Output Requirements KB 967576
Summary &
Q&A
Thank you to our AWESOME sponsors!