Transcript BDBI

BDBI Radio
Baker’s Dozen Business Intelligence Webcast Radio
13 Weekly tips on Microsoft SQL Server/BI/SharePoint Technologies
http://www.BDBIRadio.com
Week of 02/24/2013: 13 topics for the SQL Server 2012 Columnstore Index
BDBI Radio
• Hosted by Kevin S. Goff, Microsoft SQL Server MVP and
author of CoDe Magazine “Baker’s Dozen” Productivity
Series
• Weekly webcast - Sundays, 11 AM to 12:30 PM EST
• Site: http://www.BDBIRadio.com
• Overall blog/site: http://www.KevinSGoff.net
• Program format:
– First 60 minutes - 13 quick tips on different SQL/BI/SharePoint topics
• In any week, tips might cover one topic (13 features in PowerPivot or
13 SQL Interview topics), or 13 random tips on anything from data
warehousing to MDX/DAX programming, to maybe even a few .NET
tips for scenarios when the .NET and BI worlds collide
– Second 30 minutes, open Q/A (email me at [email protected])
– Webcast recorded, recordings will be available on website
• Once in a while, a special guest interview
7/7/2015
13 topics for Columnstore Index
2
BDBI Radio
• Upcoming community events (confirmed):
– BI User Group in NYC on March 11
• http://msbigdatanyc.com/
• Topic: Differences between SSAS OLAP and SSAS Tabular
– SQL User Group in NYC on April 25
• http://nycsqlusergroup.com/
• Topic TBD
– Northern Virginia SQL User Group on May 20
• http://www.novasql.com
• Topic TBD
• Other News:
– SQL Server 2012 Service Pack 1 available
• http://blogs.msdn.com/b/analysisservices/archive/2012/11/07/announcingmicrosoft-sql-server-2012-service-pack-1-sp1.aspx
7/7/2015
13 topics for Columnstore Index
3
BDBI Radio
•
•
•
•
Today: 13 pieces of information for the Columnstore index
New index in SQL Server 2012
More than just an index
A real game-changer, one of the biggest features in the SQL
database engine of all time
• Some companies upgraded to SQL 2012 just because of this
feature
• Represents another example where MS is devoting serious
attention to the underlying database engine
– Earlier versions of SQL Server (2005) focused largely on language and
developer enhancements
– SQL 2008 and 2012 have seen underlying database management/engine
changes (Change Data Capture and Columnstore Index)
7/7/2015
13 topics for Columnstore Index
4
BDBI Radio
• 13 pieces of information for the SQL 2012 Columnstore index
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
7/7/2015
Quick demonstration
Introduction to the Columnstore Index
Characteristics of the Columnstore Index
Who benefits from this?
Columnstore indexes vs Rowstore Index
Execution plan using the Columnstore index
Batch Mode Processing – new processing mode for the Columnstore index
Where the Columnstore index can’t directly be used
Selective vs non-Selective queries – where Columnstore index isn’t used
General Usage rules
Restriction rules on the Columnstore index
Overall Performance Benchmarks
New database engine features in SQL v.Next
13 topics for Columnstore Index
5
BDBI Radio:
• 1 – Quick Code Example
7/7/2015
13 topics for Columnstore Index
6
BDBI Radio
• 2 - Introduction to the Columnstore Index
• New relational, xVelocity memory-optimized database index in SQL
Server 2012, “baked in” to the database engine
• More and more functionality in DB engine (xVelocity, CDC)
• Potentially Significant performance enhancements for data warehousing
and data mart scenarios – a real game changer
– (not really for OLTP databases, we’ll see why later)
• Best for queries that scan/aggregate large sets of data
• My opinion? One of the coolest things ever in SQL Server
• In a regular index, indexed data from each row kept together on single
page-data in each column spread across all pages of index
• In a columnstore index, data from each column is kept together so each
data page contains data only from a single column (compressed, more
fits in memory, more efficient IO)
7/7/2015
13 topics for Columnstore Index
7
BDBI Radio
•
•
•
•
•
•
•
•
•
3 - Characteristics of the Columnstore Index
Highly compressed - Exploits similarity of data within column
IO Statistics - dramatically reduces # of logical reads!!!
Not stored in standard buffer pools, but rather in a new optimized
cache
Smart IO and caching using read-ahead reads
Part of Microsoft’s xVelocity technology – compression is factor of
8 (and twice as efficient as page compression)
Once posted, only READONLY
Best for data warehouse/mart queries that scan/aggregate large
amounts of data–might lower need for OLAP aggregation
Some queries might run at least 10x faster (or more)
7/7/2015
13 topics for Columnstore Index
8
BDBI Radio
• 4 - Who benefits from this?
• Queries and reports against Data Warehouses/Data Marts (works
best with Fact/Dimension tables modeled in a star schema)
• Load from Data Warehouses/Marts into OLAP Cubes
• SSAS OLAP Databases that use the ROLAP methodology or passthrough mode
• New Analysis Services Tabular Model uses xVelocity engine
• Some companies took the release candidate and put into
production, simply for this feature (some case studies show
queries that went from 17 minutes to 3 seconds!)
7/7/2015
13 topics for Columnstore Index
9
BDBI Radio
• 5 - Columnstore indexes vs Rowstore Index
• Columnstore index stores each column in
separate set of pages (vs. storing multiple
data rows per page)
• Only columns needed are fetched
• Easier to compress redundant column data
• Uses xVelocity found in PowerPivot
• Improved IO scan/buffer hit rates
• Segment elimination: each partition is broken
into million row segments with metadata for
min/max values – segment is not read if query
scope does not include min/max values
• Query will only fetch necessary columns
• In reality, not “really” an index – more like a
compressed “cube”
7/7/2015
13 topics for Columnstore Index
10
BDBI Radio
• 6 - Execution plan using the Columnstore index
• Columnstore index
was 5% of the batch
• Clustered index was
65% of the batch
• Covering index (which
would have been the
best approach prior to
SQL Server 2012) was
35% of the batch
• Time Statistics, 12x
faster than covering
index, 20x faster than
clustered index
7/7/2015
13 topics for Columnstore Index
11
BDBI Radio
• 7 - Batch Mode Processing – new processing mode for the
Columnstore index
• Certain
execution plan
operators (Hash
Match in
particular) use
new Batch
execution mode
• Reads rows in
blocks of 1,000
• Big performance
benefit over
row-based
execution
7/7/2015
13 topics for Columnstore Index
12
BDBI Radio
•
•
•
•
8 – Where the Columnstore index can’t directly be used
Issue w/OUTER JOIN: can’t use directly against table
Will “work”, but will use slower row execution mode
Must pre-aggregate separately and then do OUTER JOIN (will use
batch mode)
7/7/2015
13 topics for Columnstore Index
13
BDBI Radio
• 9 - Selective vs non-Selective queries – where Columnstore index
isn’t used
7/7/2015
13 topics for Columnstore Index
14
BDBI Radio
•
•
•
•
•
•
•
10 – General Usage Syntax and Rules
Syntax is simple: use new COLUMNSTORE keyword
1 Columnstore index per table: cannot be clustered
Order of columns does not matter
Include all columns from table
No INCLUDE statement, No ASC/DESC
General MS recommendation: if queries will frequently use a
certainly column on the predicate, create a clustered index on that
column and then create the columnstore index.
– Even though column store index isn’t “ordered” itself, you’ll get better
segment elimination
CREATE NONCLUSTERED COLUMNSTORE INDEX
[IX_BPO_ColumnStore]
ON [BigPurchaseOrderHeader]
(PurchaseOrderID, VendorID, OrderDate,
ShipMethodID, Freight, TotalDue)
7/7/2015
13 topics for Columnstore Index
15
BDBI Radio
•
•
•
•
11 – Restrictions and Rules
Cannot be clustered, cannot be created against a view
Cannot act as a PK or FK, cannot include sparse columns
Can’t work on tables with Change Data Capture/Change
Tracking or FileStream, can’t participate in replication,
nor when page/row compression exists
• Cannot be used with certain data types, such as binary,
text/image, rowversion/timestamp, CLR data types
(hierarchyID/spatial), nor with data types created with
MAX keyword…e.g. varchar(max
• Cannot be modified with ALTER – must be dropped and
recreated
• It’s a read-only index - cannot insert rows and expect
columnstore index to be maintained.
7/7/2015
13 topics for Columnstore Index
16
BDBI Radio
• 11 – Restrictions and Rules (continued)
• Note: range partitioning is supported….(use partitioning to
load a table, index it with a columnstore index, and switch it
in as newest partition. )
–
–
–
–
Partition by day, split the last partition
Load data into staging table and then create columnstore index
Switch it in
SQL Server 2012 permits 15,000 partitions per table)
• Not optimized for certain statements (OUTER JOIN, UNION,
NOT IN <subquery>)
• Not optimized for certain scenarios (high selectivity, queries
lacking any joining, aggregating)
• Not optimized for a JOIN statement on a composite set of
columns
• Best practice – always use integer keys for FKs
7/7/2015
13 topics for Columnstore Index
17
12 - Columnstore index: Performance
Index
CPU time
(ms)
Total Time
(ms)
Logical Reads
Read-ahead
Reads
Clustered index
4337
3899
27631
0
Non-clustered covering index
2246
2393
21334
8
Column Store index
140
199
4180
12652
30000
25000
Clustered
Index
20000
Non-clustered
covering index
15000
10000
Column Store
Index
5000
0
Total Time (ms)
Logical Reads
BDBI Radio
• 13 - New database engine features in SQL v.Next
• Columnstore indexes will be updatable!!!
– No longer a readonly index
• Will be able to create a columnstore index as a clustered index
• Not related directly to columnstore index, but next major version
of SQL Server will have a new in-memory row store for OLTP
databases (codename “Hekaton”)
7/7/2015
13 topics for Columnstore Index
19