Transcript Range: 200
Victor Isakov
Email [email protected]
Blog www.victorisakov.com
Twitter @victorisakov
LinkedIn www.linkedin.com/in/victorisakov
Website www.sqlserversolutions.com.au
• Computing power
holds Moore Law
due to parallelism
• CPU clock
frequency stalled
• Focus
• Cores
• Hyper-threading
• Cache
• RAM prices continue to fall
• 32GB DDR3 RAM modules are affordable
• DDR4 expected to hit mainstream in 2014-2015
$ per GB of PC Class Memory
1000000
100000
1000
100
10
2011
2009
2008
2007
2005
2004
2002
2001
2000
2000
1999
1998
1997
1996
1995
1994
1994
1993
1992
1991
1
1990
US$/GB
10000
• What areas of the database engine
need to be more efficient?
• Relational engine (green)
• Storage engine (blue)
• Focus on new streamline components
for OLTP:
• Memory-optimized engine
• Compiler
Baseline
2,400 tps
-
Memory-Optimised tables
17,000 tps
7x
Memory-Optimised tables + Native SPs
65,000 tps
27x
• Test harness
• 16 cores
• 256GB RAM
• 2 x SSDs
• 80 users
TDS Handler / Session Management
Natively Compiled
SPs and Schema
Hekaton
Interpreter for T-SQL, query
plans, expressions
Access Methods
Query
Interop
Non-durable Table
SQL
Server
Procedure / Plan cache for
ad-hoc T-SQL and SPs
Parser,
Catalog,
Algebrizer,
Optimizer
Hekaton
Compiler
KEY:
T1
T2
T4
T3
Memory Optimized Tables & Indexes
Buffer Pool for Tables & Indexes
Checkpoint / Recovery
T1
T2
T3
T4
T1
T2
T3
T4
Generated
DLL
ROW HEADER
PAYLOAD (table columns)
8 bytes * (IdxLinkCount – 1)
Begin Timestamp
End Timestamp
Statement Id
8 bytes
8 bytes
4 bytes
Index Link Count
2 bytes
Begin Timestamp / End Timestamp determines row’s validity
• There is no data or index page
• Row size limited to 8060 bytes to allow data to be moved to SQL table
• Not every SQL Server table schema is supported
•
1.
2.
3.
4.
Allocate a new row (Transaction: TX2)
Assign old row TX2
Update links to chain into new index
Commit transaction (Timestamp: 200)
S
100
200
TX2
∞
3
0
0
John
Smith
Kirkland
K
TX2
200
TX2
∞
3
2
0
0
John
Smith
Seattle
R
Idx1: LastName
Idx2: City
1. Assign row to transaction TX3
2. Commit transaction (Timestamp: 300)
S
100
200
∞
3
0
0
John
Smith
Smith
Kirkland
K
TX2
200
Idx1: LastName
300
∞
TX3
2
3
0
0
John
Smith
Seattle
Idx2: City R
• Remember we don’t update•
•
•
•
in-place
Or delete immediately
Old versions of records
consuming memory
No active transactions that
qualify to read these older
versions of data
Need to reclaim this “wasted”
memory
100
200
1
John
Smith
Kirkland
200
∞
1
John
Smith
Redmond
100
∞
1
Peter
Spiro
Seattle
50
100
1
Jim
Spring
Kirkland
300
∞
1
Ken
Stone
Boston
• Scanners can remove
expired rows when they
come across them
• Offloads work from GC
thread
• Ensures that frequently
visited areas of the index
are clean of expired rows
TX4: Begin = 210
Oldest Active Hint = 200
100
200
1
John
Smith
Kirkland
200
∞
1
John
Smith
Redmond
100
∞
1
Peter
Spiro
Seattle
50
100
1
Jim
Spring
Kirkland
300
∞
1
Ken
Stone
Boston
SCENARIO
SYMPTOM(S)
Inserting more
rows than can fit
into memory
Hekaton
transactions start
failing
DIAGNOSIS
Read ERRORLOG
Recovering
database cannot fit
into memory
Database does not
come online
Memory pressure
from Hekaton
workload on nonHekaton workloads
Non-Hekaton
transactions start
failing
Query DMVs to
determine whether
Hekaton is the root
cause
RESOLUTION
Free up memory
Add memory
Identify and stop
long running
transactions
DELETE
(TS150)
DELETE
(TS250)
UPDATE
(TS550)
INSERT
(TS560)
Hekaton Checkpoint
Range: 500 -
Range: 400 - 499
Range: 300 - 399
Range: 200-299
Memory Optimized Table
Filegroup
Range: 100 - 199
SQL Server
Transaction Log
• Data file has a preallocated size (128 MB)
• Hekaton Engine
switches to new data
file when it estimates
that current set of log
records will fill the file
• Once a data file is
closed, it becomes
read-only
• Row deletes are tracked
in delta file
• I/O is sequential
Transaction Timestamp
0
Range
Data
File
100
Timestamp (INSERT) TableID RowID
Payload
Timestamp (INSERT) TableID RowID
Payload
Timestamp (INSERT) TableID RowID
Payload
Data file contains rows inserted within a given transaction range
Checkpoint
File Pair
Delta
File
Timestamp (INSERT)
Timestamp (DELETE)
RowID
Timestamp (INSERT)
Timestamp (DELETE)
RowID
Timestamp (INSERT)
Timestamp (DELETE)
RowID
Delta file contains deleted rows within a given transaction range
Range: 400 - 499
Range: 300 - 399
Range: 200-299
Range: 100 - 199
Merge
200-399
Range: 400 - 499
Range: 200 - 399
Range: 300- 399
Memory Optimized Table
Filegroup
Range: 200 - 299
Range: 100 - 199
Timestamp: 500
Timestamp: 600
Memory Optimized Table
Filegroup
• Recovery Steps
Memory Optimized Tables
Recovery Data
Loader
Recovery Data
Loader
Recovery Data
Loader
FILTER
FILTER
FILTER
Delta map
Delta map
Delta map
– Analyze
– Load HK
– REDO
– UNDO (SQL)
• Impact on RTO
Data
Delta
Data
Delta
Data
Delta
File 1
File 1
File 2
File 2
File 3
File 3
Memory Optimized
Container
Memory Optimized
Container
– Load speed of data
– Size of durable
tables
NODE 1
NODE 2
T1
T2
T2
T3
CRASH
T3
• Failover Cluster
Instance starts on the
other node and starts
automatic recovery
• REDO done / Hekaton
tables loaded
• Database available
• Fail-over time will
depend upon size of
durable Hekaton
tables
Secondary
Primary
CRASH
Secondary
Replica
Primary
Replica
Secondary
Primary
Replica
Replica
• Supports full Availability Group topology
• Including multiple secondaries with fast failover
• Secondary has memory resident Hekaton tables
• With non-durable tables only the schema is available
• Data / delta files are “re-constructed” on the secondary
• Only SQL Server tables are supported on readable secondaries
• Under consideration
http://channel9.msdn.com/Events/TechEd/Australia/2013
http://www.microsoftvirtualacademy.com/
http://technet.microsoft.com/en-au/
http://msdn.microsoft.com/en-au/
http://technet.microsoft.com/en-us/evalcenter/dn205290.aspx
www.windowsazure.com
http://www.windowsazure.com/en-us/documentation/services/hdinsight/?fb=en-us
www.powerbi.com