SQL Server 2016 Operational Analytics
Download
Report
Transcript SQL Server 2016 Operational Analytics
SQL Server 2016
Operational Analytics
Sponsorzy strategiczni
Sponsorzy srebrni
Łukasz Grala
Microsoft MVP Data Platform | MCT | MCSE
• Architect - Mentor Data Platform & Business Intelligence Solutions
• Trainer Data Platform and Business Intelligence
• University Lecturer
• Author Webcasts and Publications
• Microsoft MVP Data Platform
• Leader PLSSUG Poznań
• Phd Student on Poznan University of Technology, Faculty of Computing Science
(topics – database and datawarehouse architecture, data mining, machine learning)
[email protected]
[email protected]
Marcin Szeliga
• Data Philosopher
• BI Expert and Consultant
• Data Platform Architect
• 20 years of experience with SQL Server
• Ph.D. Candidate at Politechnika Śląska
• [email protected]
Microsoft platform leads the way on-premises and cloud
Leader in 2014 for Gartner Magic Quadrants
Operational
Database
Management
Systems
Data
Warehouse
Database
Management
Systems
Business
Intelligence
and Analytics
Platforms
x86 Server
Virtualization
Cloud
Infrastructure
as a Service
Enterprise
Application
Platform as a
Service
Public Cloud
Storage
Do more. Achieve more.
SQL Server 2016 improvements
Performance
Operational analytics
Insights on operational data; Works
with in-memory OLTP and disk-based
OLTP
In-memory OLTP
enhancements
Greater T-SQL surface area,
terabytes of memory supported, and
greater number of parallel CPUs
Query data store
Monitor and optimize query plans
Native JSON
Expanded support for JSON data
Temporal database support
Query data as points in time
Security
Always encrypted
Sensitive data remains encrypted
at all times with ability to query
Row-level security
Apply fine-grained access control to
table rows
Dynamic data masking
Real-time obfuscation of data to
prevent unauthorized access
Other enhancements
Audit success/failure of database
operations
TDE support for storage of inmemory OLTP tables
Enhanced auditing for OLTP with
ability to track history of record
changes
Availability
Scalability
Enhanced AlwaysOn
Enhanced database caching
Three synchronous replicas for auto
failover across domains
Cache data with automatic, multiple
TempDB files per instance in multicore environments
Round robin load balancing of
replicas
Automatic failover based on
database health
DTC for transactional integrity across
database instances with AlwaysOn
Support for SSIS with AlwaysOn
Mission-critical performance
Performance
Operational analytics
Insights on operational data; Works
with in-memory OLTP and disk-based
OLTP
In-memory OLTP
enhancements
Greater T-SQL surface area,
terabytes of memory supported, and
greater number of parallel CPUs
Query data store
Monitor and optimize query plans
Native JSON
Expanded support for JSON data
Temporal database support
Query data as points in time
Security
Always encrypted
Sensitive data remains encrypted
at all times with ability to query
Row-level security
Apply fine-grained access control to
table rows
Dynamic data masking
Real-time obfuscation of data to
prevent unauthorized access
Other enhancements
Audit success/failure of database
operations
TDE support for storage of inmemory OLTP tables
Enhanced auditing for OLTP with
ability to track history of record
changes
Availability
Scalability
Enhanced AlwaysOn
Enhanced database caching
Three synchronous replicas for auto
failover across domains
Cache data with automatic, multiple
TempDB files per instance in multicore environments
Round robin load balancing of
replicas
Automatic failover based on
database health
DTC for transactional integrity across
database instances with AlwaysOn
Support for SSIS with AlwaysOn
What does operational mean?
• Refers to Operational Workload (i.e. OLTP)
• Examples:
•
•
•
•
Enterprise Resource Planning (ERP) – Inventory, Order, Sales,
Machine Data – Data from machine operations on factory floor
Online Stores (e.g. Amazon, Expedia)
Stock/Security trades
• Mission Critical
• No downtime (High Availability) – impact on revenue
• Low latency and high transaction throughput
What does analytics mean?
• Analytics
• Studying past data (e.g. operational, social media) to identify potential trends
• To analyze the effects of certain decisions or events (e.g. Ad campaign)
• Analyze past/current data to predict outcomes (e.g. credit score)
• Goals
• Enhance the business by gaining knowledge
to make improvements or changes
Traditional BI architecture
Key Issues
• Complex Implementation
• Requires two Servers (CapEx
and OpEx)
IIS Server
• Data Latency in Analytics
ETL
• More businesses
demand/require real-time
Analytics
Minimizing data latency for analytics
Benefits
• No Data Latency
• No ETL
• No Separate DW
IIS Server
Challenges
• Analytics queries are resource intensive
and can cause blocking
This is
• How toANALYTICS
minimize Impact on Operational
OPERATIONAL
workload
• Sub-optimal execution of Analytics on
relational schema
SQL Server 2016
Quick Recap: Columnstore Index
Data stored as rows
Data stored as columns
C1
C2
C3
C4
C5
…
Ideal for OLTP
Efficient operation on small set of rows
Improved compression:
Data from same domain
compress better
Reduced I/O:
Fetch only columns needed
16
Improved performance:
More data fits in memory
Optimized for CPU utilization
Ideal for DW workload
Clustered Columnstore Performance: TPC-H
17
Operational Analytics with columnstore index
Delete bitmap
Btree Index
Delta rowgroups
•
•
•
•
•
19
Minimizing CSI overhead
DML Operations
Btree Index
Delete bitmap
HOT
•
•
•
Delta rowgroups
Order Management Application –
CREATE NONCLUSTERED COLUMNSTORE INDEX ….. WHERE order_status = ‘SHIPPED’
20
Operational Analytics with columnstore
on In-Memory Tables
Hash Index
Range Index
DRT
In-Memory OLTP Table
Tai
l
No explicit delta rowgroup
Rows (tail) not in columnstore stay in in-memory OLTP
table
No columnstore index overhead when operating on tail
Background task migrates rows from tail to columnstore
in chunks of 1 million rows not changed in last 1 hour
Deleted Rows Table (DRT) – tracks deleted rows
Updateable
CCI
Columnstore data fully resident in memory
Persisted together with operational data
No application changes required
22
Query processing
Demo time
Performance improvments
Scan type
Elapsed time (s)
Speedup
Row store scan, interop
44.441
Row store scan, native
28.445
1.6x
CSI scan, interop
0.802
55.4x
Insert, Update, Delete costs and query time
Operation
Elasped time
(s) with CSI
CSI scan, interop
0.802
Insert 400 000 rows
53.5
CSI scan, interop
0.869
Update 400 000 rows
42.4
CSI scan, interop
1.181
Delete 400 000 rows
38.3
CSI scan, interop
1.231
Elasped time
(s) No CSI
Increase %
Update
Increase % Query
BASE
47.8
11.9%
8.4%
28.9
46.7%
47.3%
30.5
25.6%
53.5%
Single thread insert and update
Operation
Rows affected
Row store (s)
Secondary CSI (s)
Primary CSI (s)
1000 updates
10 000
0.893
1.400
6.866
10% insert
18M
233.9
566
291.4
2% update
3.96M
123.2
314.3
275.9
Single thread scan
Millions of rows
Row store
Secondary CSI
Primary CSI
New built
180
99.1
4.7
1.71
After 1000 updates
180
99.4
5.4
1.75
After 10% inserts
198
108.7
14.5
9.5
After 2% updates
198
109.5
16.8
10.0
Comparing performance
Operation
Billions of value
per second
No SIMD
Billions of value per
second
SIMD
Speedup
Bit unpacking 6bits
2.08
11.55
5.55x
Bit unpacking 12 bits
1.91
9.76
5.11x
Bit unpacking 21 bits
1.96
5.29
2.70x
Compaction 32 bits
1.24
6.70
5.40x
Range predicate 16 bits
0.94
11.42
5.06x
Sum 16 bit values
2.86
14.46
5.06x
128-bit bitmap filter
0.97
11.42
11.77x
64KB bitmap filter
1.01
2.37
2.35x
Query performance (1)
Predicate or aggregation
Duration SQL2014 (ms)
Duration SQL2016 (ms)
Speedup
Billion of
rows per s
Q1-Q4: select count(*) from LINEITEM where <predicate>
L_ORDERKEY = 235236
220
140
1.57x
12.9
L_QUANTITY = 1900
664
68
9.76x
26.5
L_SHIPMODE='AIR'
694
147
4.72x
12.2
L_SHIPDATE between
'01.01.1997' and
'01.01.1998'
512
87
5.89x
20.7
Query performance (2)
Predicate or aggregation
Duration SQL2014 (ms)
Duration SQL2016 (ms)
Speedup
Billion of
rows per s
Q5-Q6: select count(*) from PARTSUPP where <predicate>
PS_AVAILQTY < 10
50
27
1.85x
8.9
PS_AVAILQTY = 10
45
15
3.00x
16
Q7-Q8: select <aggregates> from LINEITEM
avg(L_DISCOUNT)
1272
196
6.49x
9.1
avg(L_DISCOUNT),
min(L_ORDERKEY),
max(L_ORDERKEY)
1978
356
5.56x
5.1
Availability Groups as data warehouse
Always on Availability Group
Secondary
Replica
Primary
Replica
Secondary
Replica
Secondary
Replica
Key points
• Mission Critical Operational
Workloads typically
configured for High
Availability using AlwaysOn
Availability Groups
• You can offload analytics to
readable secondary replica
Minimizing data latency for analytics
IIS Server
SSAS Enterprise Readiness: Tabular
New DirectQuery
DirectQuery for Oracle, Teradata, ASP
DirectQuery support for MDX query(Excel Tools)
High-end Server Hardware
Sponsorzy strategiczni
Sponsorzy srebrni