Transcript Powerpoint
Antonio Abalos Castillo
How to load your data faster
and safer using Change
Tracking in SQL Server
Thank you to our sponsors!
Agenda
1. Why faster data loads?
2. What is Change Tracking?
3. Design overview
4. Demo/implementation
5. Extra hints
Why faster data loads?
Corporations load and replicate data in a variety of ways
They become unreliable or miss data over time
They use unsupported ways to identify increment of data
They are difficult to maintain
Not optimal when identifying the updated data
Need extra programming effort
Do not follow standards
Why faster data loads?
Benefits of using this approach
No programming overhead at the source
Avoid using timestamps, row GUIDs or any other programming artifact
Change Tracking is transparent to applications
Maintenance cost is 0
Very low performance impact in the source database
Multiple target systems can get data from the same source DB using this approach
We get just the latest version, according to our last status. All different row status in the middle
are skipped
Running the delta more often will decrease the execution time
MERGE is the fastest data loading method (SCD remains as a bad example)
Minimally logged operations will help performance (maybe more than you think)
What is Change Tracking?
Change tracking is a lightweight solution that provides an efficient change
tracking mechanism for applications
Available since SQL Server 2008
Requires Standard edition of SQL Server or higher
Lightweight:
The incremental performance overhead that is associated with using change tracking
on a table is similar to the overhead incurred when an index is created for a table and
needs to be maintained
https://technet.microsoft.com/en-us/library/hh710064(v=sql.110).aspx
https://msdn.microsoft.com/en-us/library/bb933875(v=sql.110).aspx
What is Change Tracking?
Each insert/update/delete in each table will be tracked by:
1. The ID columns used in the table
2. [optional] the columns that were updated
Changes are accumulated and reported by SQL Server
according to the last version we got
https://msdn.microsoft.com/en-us/library/hh710064(v=sql.110).aspx
What is Change Tracking?
Enable Change Tracking
Database level
ALTER DATABASE AdventureWorks2012
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)
For each audited table
ALTER TABLE dbo.SalesOrderDetail
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = ON)
What is Change Tracking?
Get changes from Change Tracking
Get current version
SET @ver = CHANGE_TRACKING_CURRENT_VERSION();
Get minimum valid version
SET @mvv =
CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('dbo.Sales'));
What is Change Tracking?
Get changes from Change Tracking
Get changes for one table
DECLARE @last_ver BIGINT = 82;
SELECT CT.SalesID,
CT.SYS_CHANGE_OPERATION,
CT.SYS_CHANGE_COLUMNS
FROM
CHANGETABLE(CHANGES dbo.Sales,
@last_ver) AS CT
https://technet.microsoft.com/en-us/library/cc280358(v=sql.105).aspx
Source
•
•
Change Tracking enabled
Isolation aware
ETL
•
•
Minimally logged operations
Automatic delta/full load detection
Target
Design overview
•
•
Staging area
MERGE delta over target data
Design overview
Requirements:
• SQL Server source database
• Change Tracking enabled
• Integration Services
• MERGE statements (SQL 2008+)
Other data sources:
• Change Data Capture (Oracle)
Demo
Demo scenario
Windows Azure VNET
Server A
Server B
SQL Server
Source database
Change Tracking
SQL Server
Target database
Logging
SSIS
Data flow
Extra hints – Best practices
Best practices
• Transaction isolation strategy
•
•
•
•
Index maintenance jobs can break big transactions at the
source
Watch out for complex data flows that may need to break
down into simpler ones
•
•
•
Enable SNAPSHOT isolation in the source database
Or create a source snapshot database
The best is to have a one-to-one copy of the source table, but
this is not always possible
How do we deal with deleted rows? (joining tables)
Do we need to track changes in columns?
Extra hints - Trick list
•
•
•
•
•
•
Use trace flag 610 (carefully)
Use tab-lock in destination
Use ORDER hint in
destination
Boost up DFT memory
Boost up DFT number of rows
Run parallel tasks
The Data Loading Performance Guide
https://msdn.microsoft.com/en-us/library/dd425070.aspx
Extra hints - Other tricks
•
•
•
•
•
•
•
•
Databases in “simple” recovery model
Change page torn detection to NONE
Create a DATA file group and set it as DEFAULT
Create as many files as CPU in each file group (depends
on storage)
Separate the log file from the data files in different disks
Consider using heaps for fast-load processes
Consider using partitioned tables for regular tables
Increase parallelism
Extra hints - Security considerations
Catalog views
sys.change_tracking_databases
sys.change_tracking_tables
Permissions
SELECT permission on at least the primary key columns on the
change-tracked table to the table that is being queried
VIEW CHANGE TRACKING permission on the table for which
changes are being obtained
https://msdn.microsoft.com/en-us/library/hh710064(v=sql.110).aspx
Extra hints - Change Tracking Vs. Change
Data Capture
Change data capture (CDC)
Change tracking (CT)
Yes
Yes
Historical data
Yes
No
Whether column was
changed
Yes
Yes
DML type
Yes
Yes
Collects historical values, and
therefore much more data than
CT
You have no idea on how many
updates were made to a row, nor
the values that were updated
Tracked changes
DML changes
Tracked information
https://technet.microsoft.com/en-us/library/cc280519(v=sql.105).aspx
https://msdn.microsoft.com/en-us/library/bb933994.aspx
Other references
Brent Ozar’s guide to Change Tracking
https://www.brentozar.com/archive/2014/06/performancetuning-sql-server-change-tracking/
Good guide for a data load using Change Tracking
implementation
https://www.timmitchell.net/post/2016/01/20/using-sqlserver-change-tracking-for-incremental-loads