ColumnStore Index

Download Report

Transcript ColumnStore Index

Boosting DWH-Performance
with SQL Server 2016
ColumnStore Index
Introduction
Markus Ehrenmüller-Jensen
Business Intelligence Architect
@MEhrenmueller
[email protected]
runtastic
Pluskaufstraße 7
4061 Pasching
Austria, Europe
http://www.runtastic.com
SQL Server 2005 - 2014
BI Developer
Database Developer
Database Admin
Agenda
Column Store
Non-Clustered Column Store Index
Clustered Column Store Index
What’s new in SQL 2016
Evolution of Microsoft Data Platform
XML ● KPIs
SQL Server
2000
Management Studio ● Mirroring
SQL Server
2005
Compression ● Policy-Based Mgmt ● Programmability
SQL Server
2008
PowerPivot ● SharePoint Integration ● Master Data Services
SQL Server
2008 R2
ColumnStore Index ● AlwaysOn ● Data Quality Services ● Power View ● Cloud Connectivity
SQL Server
2012
In-Memory Across Workloads ● Performance & Scale ● Hybrid Cloud Optimized ● HDInsight ● Cloud BI
SQL Server
2014
Ordinary Report
http://www.flickr.com/photos/marts-pics/3554153954/sizes/l/in/photostream/
ColumnStore Index
http://commons.wikimedia.org/wiki/User:Malene
Improve Query Performance with ColumnStore Index
DEMO
How?
xVelocity (VertiPaq, PowerPivot, BISM Tabular)
Compression
Column-Elimination
Segment/Rowgroup-Elimination
Parallel Read Operations
Query Processor in Batch-Mode
Typically 10x faster
xVelocity
Column Store
Power Pivot
Analysis
Services
SQL Server
Non clustered
Index
SQL 2012+
Clustered
Index
SQL 2014+
Compression
X-Velocity In-Memory Compression Engine
Creation takes up to 1.5x longer then b-tree
Algorithms
Value Scale (1023, 1002  scale 1000; 23, 2)
Bit Array (x, y, z, x  100, 010, 001, 100)
Binary Bitmap
Run-Length (x, x, x, y, z, z  3x, 1y, 2z)
Dictionary (x, y, z, x  1, 2, 3, 1)
Huffman
Lempel-Ziv-Welch
Decision after sample (1% / max 1 Mio rows)
No estimation available
sp_estimate_data_compression_savings not supported for
ColumnStore
CREATE INDEX (elapsed time)
NCCI (22943 ms)
CCI (22879 ms)
CCI Archival (22067 ms)
PAGE (5668 ms)
Clustered (4925 ms)
ROW (3282 ms)
INSERT (elapsed time)
PAGE (72946 ms)
CCI Archival (54221 ms)
CCI (54123 ms)
NCCI (42596 ms)
ROW (21652 ms)
Clustered (14905 ms)
Heap (6941 ms)
Compression
Heap & NCCI (1622 MB)
Heap (1382 MB)
ROW (794 MB)
PAGE (202 MB)
CCI (25 MB)
Archival (11 MB)
Dictionary
Primary (global)
Across all segments
Mandatory
Preferable
Secondary (local)
Per segment
Optional
sys.column_store_dictionaries
ARCHIVAL Compression
LZ777 compression on top
Additional ~30% of space-savings
Can be applied per partition
Column-Elimination
Only needed column are read
As opposite to RowStore
Where only whole rows can be read
RowStore vs. ColumnStore
Page 1 of
row store
Page 2 of
row store
Rowgroup
Rowgroup
Key AlternateKey Name
1 AR-5381
Adjustable Race
2 BA-8327
Bearing Ball
3 BE-2349
Ball Bearing Cage
4 BE-2908
Ball Bearing Grease
5 BL-2036
Blade
6 CA-5965
LL Crankarm
7 CA-6738
ML Crankarm
Segment
for
column 1
Segment
for
column 1
Segment
for
column 2
Segment
for
column 2
Segment
for
column 3
Segment
for
column 3
Stock
1000
1000
800
800
800
500
500
Segment
for
column 4
Segment
for
column 4
Segment
Part of row group for a single column
# of rows per row group is the same for all segments
No sort order inside a segment
Max. 1,048.576 rows
Trimmed segement(s) because of to few rows, DOP or
memory pressure
The bigger, the better compression
The smaller, the better segment-elimination
Stored as BLOB (through 8k pages)
Directory
Allocation status, # of rows, min/max value
sys.column_store_segments
Segment-Elimination
Segment is the smallest unit
Reading 1 Mio rows or not
Min/max value
Alignment.sql
Batch-Mode
SQL 2012+
Chunks of 1000 rows per batch
Better CPU efficiency
Vs. Row mode
Every row processed after each other
Typically 10x faster
Heap (4291 ms)
Clustered (2151 ms)
ROW (1695 ms)
NCCI (970 ms)
PAGE (938 ms)
CCI (140 ms)
CCI Archival (99 ms)
Use Cases
> 1 Mio rows
Aggregations, groupings & filters (DWH/OLAP)
Write once, read multiple times
Less distinctive values per column
Sweet spot
Design (eg. matching data type, no functions), star &
snowflake schema, inner joins
Can substitute datamarts/aggregation-tables
ROLAP & Tabular Model DirectQuery
Query Optimizer includes CS Index
Suggested_tables.sql
Restrictions (2014)
Data types:
ntext, text, and image, vardecimal, varchar(max) and
nvarchar(max), rowversion (and timestamp), sql_variant, CLR types
(hierarchyid and spatial types), xml, uniqueidentifier
Page/Row compression
Replication
Change Tracking, Change Data Capture
Filestream
Enterprise Edition only
Nonclustered ColumnStore Index (NCCI)
May combined with other indices
Decide which column to include
Only one NCCI per table
Redundant storage
No sort order
SQL Server 2012+
NCCI Restrictions
No constraints allowed (unique)
Indexed table is read-only
NCCI Best Practice
Memory, Memory, Memory
Include all columns
MAXDOP > 1
Fact-tables and big dimension tables
Choose Clustered Index wise
Update/Insert
Disable/enable index
Partitioning
View/Union all
Clustered ColumnStore Index (CCI)
Physical columnar storage (instead of row based)
Not really clustered (as stored in unsorted segments)
No reduntant storage
All columns included automatically
No other index allowed
UPDATE-able
Supports more data types
Switching between ROW-mode and BATCH-mode allowed
SQL Server 2014+
CCI: Structure
CREATE CLUSTERED
COLUMNSTORE INDEX
Deleted Bitmap
Columnstore
INSERT
DELETE
REORGANIZE
Deltastore(s)
REBUILD
CCI: Deltastore
Ordinary Rowstore
Uncompressed heap
Does not benifit from batch mode
Compression is expensive operation
OPEN / CLOSED / INVISIBLE / TUMBSTONE /
(COMPRESSED)
Tuple-Mover
CCI: Tuple Mover
Closed Delta Store  new Segment (compressed)
Closedown
1,048.576 rows for INSERT
102.400 rows for BULK INSERT
Every 5 minutes
Will pause 15 sec after it has done its job
Invoke REBUILD/REORGANIZE after trickle load
Update/Insert will be blocked
CCI: Deleted Bitmap
Deleted rows of ColumnStore
2 storage format
Bitmap (in memory)
B-Tree (on disk)
Fully deleted segments are not eliminated
REBUILD to get rid of those rows
CCI: Restrictions (2014)
No Constraints (unique, primary, foreign key)
No Triggers
Not all datatypes supported
No ISOLATION LEVEL SNAPSHOT
Enterprise Edition only
CCI Best Practice
Memory, Memory, Memory
MAXDOP > 1
Fact-tables and big dimension tables
suggested_tables.sql
Choose Clustered Index wise
Go for BULK INSERTs
INSERT is locking whole Row group
DELETE is locking whole segment
ColumnStore Index
Non-clustered
Clustered
SQL Server 2012+
Additional index (redundancy)
Read-only
Subset of columns
Max. one NCCI per table
SQL Server 2014+
Master
Update-able
All columns
No additional index allowed
SQL Server v2016
UPDATE-able & Filtered NCCI
Deleted Buffer  deleted Bitmap
Non-Clustered Index (B-Tree) on CCI
Mapping index in between
In-Memory ColumnStore Index
Less restrictions
Primary key, foreign key, all isolation levels, batch
mode support, inline-definition, …
Wrap up
Column Store
Non-Clustered Column Store Index
Clustered Column Store Index
What’s new in SQL 2016
Call to Action
Try
Non-Clustered Column Store Index
Clustered Column Store Index
Questions?
Markus Ehrenmüller-Jensen
Business Intelligence Architect
@MEhrenmueller
[email protected]
runtastic
Pluskaufstraße 7
4061 Pasching
Austria, Europe
http://www.runtastic.com
SQL Server 2005 - 2014
BI Developer
Database Developer
Database Admin