How In-Memory Affects Database Design
Download
Report
Transcript How In-Memory Affects Database Design
How In-Memory Affects Database Design
Louis Davidson
drsql.org
1
Who am I?
drsql.org
• Been in IT for over 19 years
• Microsoft MVP For 11 Years
• Corporate Data Architect
• Written five books on
database design
• Ok, so they were all versions
of the same book. They at least
had slightly different titles each time
• Basically: I love Database Design, and InMemory technologies are changing the
game
2
Contact info
• Louis Davidson - [email protected]
• Website – http://drsql.org <-- Get slides here
• Twitter – http://twitter.com/drsql(@drsql)
• SQL Blog http://sqlblog.com/blogs/louis_davidson
• Simple Talk Blog – What Counts for a DBA
http://www.simple-talk.com/community/blogs/drsql/default.aspx
drsql.org
3
3
Questions are Welcome
• Please limit questions to one’s I know the
answer to.
drsql.org
4
A tasty allegory…
• Bacon is awesome
• Bacon is an extremely powerful tool for rapid
fat and calorie intake
• Even bacon isn't good for everything
drsql.org
http://www.lazygamer.net/general-news/diablo-iii-players-burned-off-820-968-kgs-of-bacon/
https://www.flickr.com/photos/runnerone/6232183896/in/photostream/
5
Attention!
• This presentation was originally based on
SQL Server 2014
• SQL Server 2016 promises to greatly
improve the feature set
• I will note where this does and does not
affect your database design experience as
I go along with asterisks *
drsql.org
6
The process I went through
• Start with basic requirements
– Sales system
– Stream of customer and order data
– Apply In-Memory OLTP to see how it changed
things
– Keep it very simple
• Learn a lot
– This presentation was borne out of what I learned
from that process (and Kalen Delaney’s precon,
whitepaper, and other reading that is linked
throughout the slides)
drsql.org
• Build a test and apply what I have learned and
morph until I get to what works
• Build something real in my day job, if
applicable
7
7
Introduction: What exactly is
In-Memory OLTP in SQL Server 2014+?
• A totally new, revamped engine for data storage, co-located in the same database
with the existing engine
– Obviously Enterprise Only…
• Purpose built for certain scenarios*
• Terminology can be confusing
–Existing tables: Home - On-Disk, but ideally cached In-Memory
–In-Memory tables: Home - In-Memory: but backed up by On-Disk Structures
• If you have enough RAM, On-Disk tables are also in memory
drsql.org
–But the implementation is very very different
• In-Memory is both very easy, and very difficult to use
9
Design Basics
(And no, I am not stalling for time due to lack of material)
• Designing and Coding is Like the Chicken and
the Egg
I was first
–Design is what you do before coding
–Coding patterns can greatly affect design
–Engine implementation can greatly affect design
and coding patterns
–Developing software follows a natural process
As if…
drsql.org
• We will discuss how In-Memory technologies
affect the entire design/development lifecycle
Children
Relics
10
Design Basics - Separate your design mind into (minimally)
three phases
• Conceptual/Logical (Overall data requirements in a data model format)
• Physical Implementation Choice
–Type of database system: Paper, Excel, Access, SQL Server, NoSQL, etc
–Engine choices: In-Memory, On-Disk, Compression, Partitioning, etc
–Note: Bad choices usually involve pointy hair and a magazine article with very little thinking
and testing
• Physical (Relational Code)
drsql.org
• We will look at each of these phases and how in-mem may affect your design of
each output
11
Conceptual/Logical Design
(Though Not Everyone’s Is)
• This is the easiest part of the presentation …to type
• You still need to understand the customers needs and model
–Entities and Attributes
–Uniqueness Conditions
–General Predicates
• As I see it, nothing changes…
drsql.org
12
Logical Data Model
drsql.org
13
Physical Implementation Overview
Client App
TDS Handler and Session Management
No improvements in
communication stack,
parameter passing, result
set generation
10-30x more efficient
(Real Apps see 2-30x)
Reduced log bandwidth &
contention. Log latency
remains
Natively
Compiled SPs
and Schema
Engine for
Memory_optimized Tables
& Indexes
Query
Interop
Proc/Plan cache for ad-hoc
T-SQL and SPs
Interpreter for TSQL, query
plans, expressions
Access Methods
Existing SQL
Component
In-Memory
OLTP
Component
Buffer Pool for Tables &
Indexes
SQL Server.exe
Memory-optimized
Table Filegroup
Transaction Log
Data Filegroup
http://download.microsoft.com/documents/hk/technet/techdays2014/Day2/Session2/DBI394-SQL%20Server%202014%20In-Memory%20OLTP%20-%20Depp%20Dive.pdf
drsql.org
Checkpoints are
background sequential IO
Native
Compiler
Parser,
Catalog,
Algebrizer,
Optimizer
Key
Physical Implementation
(Technically it’s all software!)
• Everything is different, and I am going to give just an overview of physical
details…
• In-Mem data structures coexist in the database alongside On-Disk ones
• Data is housed in RAM, and backed up in Delta Files and Transaction Logs
–Delta files are stored as filestream storage
–The transaction log is the same one as you are used to (with lighter utilization)
• Tables and Indexes are extremely coupled
• MVCC (Multi-Valued Concurrency Control) used for all isolation
drsql.org
15
Physical Design
(No, let’s not get physical)
• Your physical design will almost certainly need to be altered from “normal”
• So much changes, even just changing the internal table structure
• In this section, we will discuss:
–Creating storage objects
• Table Creation
• Index Creation (which is technically part of the table creation)*
• Altering a Table’s Structure*
–Accessing (Modifying/Creating) data
drsql.org
• Using Normal T-SQL (Interop)
• Using Compiled Code (Native)
• Using a Hybrid Approach
• No Locks, No Latches, No Waiting
16
Creating Storage Objects - Tables
• The syntax is the same as on-disk, with a few additional settings
• You have a durability choices
– Individual In-Mem Table: SCHEMA_ONLY or SCHEMA_AND_DATA
– Database level for transactions: Delayed (also for on-disk tables)
• Basically Asynchronous Log Writes
• Aaron Bertrand has a great article on this here: http://sqlperformance.com/2014/04/io-subsystem/delayed-durability-in-sqlserver-2014
• You also have less to work with...
– Rowsize limited to 8060 bytes (Enforced at Create Time)
• Not all datatypes allowed (LOB types,CLR,sql_variant, datetimeoffset, rowversion)*
drsql.org
– No check constraints *
– No foreign keys *
– Just one unique index per table *
• Every durable (SCHEMA_AND_DATA) table must have a unique index/ primary key
• Note: There are memory optimized temporary tables too: See Kendra Little’s article here:
http://www.brentozar.com/archive/2014/04/table-variables-good-temp-tables-sql-2014/
17
Data quality…What if?
Troublesome
Two people are travelling to Indianapolis
via train, and both order chicken from
two different wait persons, but there is
only one order of chicken still available
Extremely Troublesome
Note: The “what if?” test ought to be applied to all of your designs
drsql.org
If Train A is given access to Location L on
Track 1 at 11:30 AM, and Train B is given
access to the same Location at the same
time going in a different direction.
Dealing with Un-Supported Datatypes…
• Say you have a table with 10 columns, but 1 is not allowed in a In-Memory table
• First: Ask yourself if the table really fits the criteria we aren’t done covering
• Second: If so, consider vertically partitioning
–CREATE TABLE In_Mem (KeyValue, Column1, Column2, Column3)
CREATE TABLE On_Disk (KeyValue, Column4)
• It is likely that uses of disallowed LOB types wouldn’t be good for the OLTP
aspects of the table in any case.
drsql.org
• Note: 2016 allows LOB (varbinary(max), nvarchar(max), varchar(max)) but it is
still something you may need to consider, as memory isn’t free…
20
Creating Storage Objects - Index creation
• Syntax is inline with CREATE TABLE
• Indexes are linked directly to the table
– 8 indexes max per table due to internals
– Only one unique index allowed (the primary key) *
– Indexes are never persisted, but are rebuilt on restart
• String index columns must be a binary collation (case AND accent sensitive)*
• Cannot index nullable column *
• Two types
– Hash
drsql.org
• Ideal for single row lookups
• Fixed size, you choose the number of hash buckets (approx 1-2 * # of unique values
http://msdn.microsoft.com/en-us/library/dn494956.aspx)
– Bw Tree
• Best for range searches
• Very similar to a BTree index as you (hopefully) know it, but optimized for MVCC and pointer connection to
table
21
A Taste of the Physical Structures
• Basic data record for a row
Record Header
Data For Columns (Payload)
• Record Header
Begin Timestamp
End Timestamp
StatementId
IndexCount
IndexPointers
...
8
3
2
1
drsql.org
22
22
Hash Index - Simplified
TableNameId
Country
OtherColumns
1
USA
Values
2
USA
Values
3
Canada
Values
Identity Column
Country
1
1
2
0
Φ
1
USA
2
3
3
4
4
0
Φ Φ
2
USA
0
Φ Φ
3
Canada
5
drsql.org
5
6
7
8
9
10
23
23
Hash Index - Simplified
TableNameId
Country
OtherColumns
1
USA
Values
2
Canada
Values
3
Canada
Values
Identity Column
1
Country
0
Φ
1
USA
2
2
3
1
3
0
100
Φ
2
USA
4
4
5
5
100
Φ Φ
2
Canada
Φ
3
Canada
drsql.org
6
7
8
9
0
10
24
24
Bw Tree Index – Even More Simplified
Page 0
Page Mapping
Table
C
R
Z
Non-Leaf Pages
By Page ID
Page 0
Page 1
Page 1
A
B
Page 2
C
D
Page 3
G
J
S
T
Z
Leaf Pages
Page 2
Page 3
0
240
240
Φ
OtherVals
B
DifferentRow
B
JustDifferent
Data Pages
0
Φ
D
drsql.org
0
B
AnotherRow
25
25
Do you want to know more?
• For more in-depth coverage
–check Kalen Delaney's white paper ... http://t.co/T6zToWc6y6
–Or for an even deeper (nerdier?) versions: “Hekaton: SQL Server’s Memory-Optimized
OLTP Engine” http://research.microsoft.com/apps/pubs/default.aspx?id=193594 or The
Bw-Tree: A B-tree for New Hardware Platforms
(http://research.microsoft.com/pubs/178758/bw-tree-icde2013-final.pdf)
–Books Online: http://technet.microsoft.com/en-us/library/dn133186.aspx
–TechDays Presentation:
http://download.microsoft.com/documents/hk/technet/techdays2014/Day2/Session2/DBI394SQL%20Server%202014%20In-Memory%20OLTP%20-%20Depp%20Dive.pdf
drsql.org
–Buy Kalen Delaney’s Ebook:
http://www.amazon.com/gp/product/B00QMWX8PO/ref=docs-os-doi_0
–SQL Server 2016: In-Memory OLTP Enhancementss
http://sqlperformance.com/2015/05/sql-server-2016/in-memory-oltp-enhancements
26
Creating Storage Objects - Altering a Table *
• The is the second easiest slide in the deck
(to write!)
• No alterations allowed - Strictly Drop and
Recreate*
• Cannot rename table
ALTER
drsql.org
27
DEMO IN SLIDES – PREPARING TO (AND ACTUALLY) CREATING TABLES
drsql.org
28
Setting the Database To Allow In-Mem
29
29
drsql.org
CREATE DATABASE HowInMemObjectsAffectDesign
ON PRIMARY
a filegroup
to hold the delta
( NAME = N'HowInMemObjectsAffectDesign',Add
FILENAME
=
N‘Drive:\HowInMemObjectsAffectDesign.mdf' , SIZE = 2GB , files
MAXSIZE = UNLIMITED, FILEGROWTH = 10% ),
FILEGROUP [MemoryOptimizedFG] CONTAINS
MEMORY_OPTIMIZED_DATA
( NAME = N'HowInMemObjectsAffectDesign_inmemFiles',
FILENAME = N'Drive:\InMemfiles' , MAXSIZE = UNLIMITED)
LOG ON
( NAME = N'HowInMemObjectsAffectDesign_log', FILENAME =
N'Drive:\HowInMemObjectsAffectDesign_log.ldf' , SIZE = 1GB ,
MAXSIZE = 2GB , FILEGROWTH = 10%);
GO
Creating a Memory Optimized Permanent Table
CREATE TABLE Customers.Customer
(
CustomerId
integer NOT NULL IDENTITY ( 1,1 ) ,
CustomerNumber char(10)
COLLATE Latin1_General_100_BIN2 NOT NULL,
CONSTRAINT XPKCustomer PRIMARY KEY NONCLUSTERED
HASH ( CustomerId) WITH ( BUCKET_COUNT = 50000),
INDEX CustomerNumber NONCLUSTERED ( CustomerNumber)
drsql.org
) WITH ( MEMORY_OPTIMIZED = ON ,
DURABILITY = SCHEMA_AND_DATA)
go
30
30
Creating a Memory Optimized Permanent Table
CREATE TABLE Customers.Customer
Character column must be binary
(
to index/compare in native code *
Hash Index used for
CustomerId
integer
NOT
NULL IDENTITY ( 1,1 ) ,
Primary Key. Estimated
CustomerNumber
Rows in Table 25000 char(10)
COLLATE Latin1_General_100_BIN2 NOT NULL,
CONSTRAINT XPKCustomer
PRIMARY
Bw Tree
Index on KEY NONCLUSTERED
CustomerWITH
Number
HASH ( This
CustomerId)
( BUCKET_COUNT = 50000),
table is memory
INDEX
drsql.org
optimized
Thisoftable is as durable as
(ok, that was kind
CustomerNumber
NONCLUSTERED
( CustomerNumber)
obvious) the database settings
allow
) WITH ( MEMORY_OPTIMIZED = ON ,
DURABILITY = SCHEMA_AND_DATA)
go
31
31
Accessing the Data - Using Normal T-SQL (Interop)
• Using typical interpreted T-SQL
• Most T-SQL will work with no change (you may need to add isolation level hints,
particularly in explicit transaction)
• A few Exceptions that will not work
–TRUNCATE TABLE - This one is really annoying :)
–MERGE (In-Mem table cannot be the target)
–Cross Database Transactions (other than tempdb)
–Locking Hints
drsql.org
32
Accessing the Data using Compiled Code (Native)
• Instead of being interpreted, the stored procedure is compiled to machine code
• Limited syntax (Like programming with both hands tied behind your back)
• Allowed syntax is listed in what is available, not what isn't
–http://msdn.microsoft.com/en-us/library/dn452279.aspx
• Some really extremely annoying ones:
drsql.org
–SUBSTRING supported; LEFT, RIGHT, not so much
–No Subqueries *
–OR, NOT, IN, not supported in WHERE clause *
–String Comparisons must be with columns of Binary Collation *
–Can’t use on-disk objects (tables, sequences, views, etc)
–Can’t call a stored procedures from another stored procedure *
• So you may have to write some "interesting" code
33
DEMO IN SLIDES – NATIVE STORED PROCEDURE
drsql.org
34
Creating a Natively Optimized Procedure
(I write my C# the new fashioned way, with T-SQL)
Works just like for views
CREATE PROCEDURE Customers.Customer$CreateAndReturnThere is no Ownership
and functions. Can’t
@Parameter1 Parameter1Type = change
'defaultValue1',
chaining. All code
the underlying
@Parameter2
Parameter2Type =object
'defaultValue2',
executes as the
while this object
Alert parser that this will
procedure owner
…
references it
be a natively compiled
@ParameterNobject
ParameterNType = 'defaultValueN‘
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = N'us_english' )
<code>
drsql.org
END
Procedures are atomic
transactions
35
Accessing Data Using a Hybrid Approach
• Native code is very fast but very limited (* Still true, but less so)
• Use Native code where it makes sense, and not where it doesn’t
• Example: Creating a sequential value
drsql.org
–In the demo code I started out by using RAND() to create CustomerNumbers
and SalesOrderNumbers.
–Using a SEQUENCE is far more straightforward
–So I made one interpreted procedure that uses the SEQUENCE outside of native
code, then calls the native procedure
36
Accessing the Data - No Locks, No Latches, No Waiting
• On-Disk Structures use Latches and Locks to implement isolation
• In-Mem use Optimistic-MVCC
• You have 3 Isolation Levels:
–SNAPSHOT, REPEATABLE READ, SERIALIZABLE
–Evaluated before, or when the transaction is committed
–This makes manual data integrity checking "interesting"
• Essential difference, your code now must handle errors
drsql.org
37
Concurrency is the #1 difference you will deal with
• Scenario1: 2 Connections - Update Every Row In 1 Million Rows
• Any Isolation Level
• On-Disk
–Either: 1 connection blocks the other
–Or: Deadlock
• In-Mem
–One connection will fail, saying: “the row you are trying to update has been updated since
this transaction started” EVEN if it never commits.
drsql.org
38
Another slide on Concurrency
(Because if I had presented it concurrently with the other one, you wouldn’t have liked that)
• Scenario2: 1 Connection Updates All Rows, Another Reads All Rows (In an explicit
transaction)
• On-Disk
–Either: 1 connection blocks the other
–Or: Deadlock
• In-Mem
drsql.org
–Both Queries Execute Immediately
–In SNAPSHOT ISOLATION the reader will always succeed
–In REPEATABLE READ or SERIALIZABLE
• Commits transaction BEFORE updater commits: Success
• Commits transaction AFTER updater commits: Fails
39
The Difficulty of Data Integrity
• With on-disk structures, we used constraints for most issues (Uniqueness, Foreign
Key, Simple Predicates)
• With in-memory code, we have to implement in stored procedures
–Uniqueness on > 1 column set suffers from timing (If N connections are inserting the same
data...MVCC will let them) *
–Foreign Key type checks can't reliably be done because: *
• In Snapshot Isolation Level, the row may have been deleted while you check
• In Higher Levels, the transaction will fail if the row has been updated
–Check constraint style work can be done in stored procedures for the most part.
drsql.org
• Note: Constraints in 2016 will often be more important that for on disk tables
because of the lack of blocking operations
40
Problem: How to Implement Uniqueness on > 1 Column Set:
INDEXED VIEW?
• CREATE VIEW Customers.Customers$UniquenessEnforcement
WITH SCHEMABINDING
AS
SELECT customerId, emailAddress, customerNumber
FROM
customers.Customer
GO
• CREATE UNIQUE CLUSTERED INDEX emailAddress ON
Customers.Customers$UniquenessEnforcement (emailAddress)
GO
drsql.org
• Msg 10794, Level 16, State 12, Line 8
The operation 'CREATE INDEX' is not supported with memory optimized tables.
41
Problem: How to Implement Uniqueness on > 1 Column Set:
Multiple Tables?
drsql.org
• Wow, that seems messy… And what about duplicate customerId
values in the two subordinate tables?
42
Problem: How to Implement Uniqueness on > 1 Column Set:
Simple code
• You can’t…exactly. But what if EVERY caller has to go through the following block:
• DECLARE @CustomerId INT
SELECT @CustomerId = CustomerId
FROM Customers.Customer
WHERE EmailAddress = @EmailAddress
drsql.org
IF @customerId is null… Do your insert
• This will stop MOST duplication, but not all. Two inserters can check at the same
time, and with no blocks, app locks, or constraints even available, you may get
duplicates.
• Remember the term: Optimistic Concurrency Control
• Even still, this sort of code is reducing the value, isn’t it?
43
Foreign Keys and Unique Index/Constraints in 2016
(Pure conjecture based on how things work now)
• In the traditional engine, these are implemented with locks
• In the in-mem engine, you have to expect that it will be implemented much like
the isolation levels
• Basically, if two transactions do operations that would have blocked, the other
connection will likely fail either:
–At COMMIT (Currently PRIMARY KEY Violations fail at COMMIT)
–At first sign of trouble (As is the case when you modify existing resources)
drsql.org
44
When Should You Make Tables In-Memory
Louis's Advice
• Read Microsoft’s Opinion Here:
http://msdn.microsoft.com/en-us/library/dn133186.aspx
• Things to factor in
– High concurrency needs/Low chance of collisions
– Minimal uniqueness protection requirements *
– Minimal data integrity concerns (minimal key update/deletes) *
– Limited searching of data (binary comparisons only) *
– Limited need for transaction isolation/Short transactions
– You are able to answer all “What If?” scenarios successfully.
• Basically, the “very hot” tables in a strict OLTP workload...
drsql.org
– I don’t see this changing, but the scenarios where it fits will expand in 2016
• NOT a way to “FIX” bad code… Not at all…
• In fact, most applications will need to be re-engineered to deal with MVCC.
46
The Choices I made
• Louis has improved his methods for estimating performance, but your
mileage will still vary.
• Louis’ tests are designed to reflect only one certain usage conditions and
user behavior, but several factors may affect your mileage significantly:
–
–
–
–
–
How & Where You Put Your Logs
Computer Condition & Maintenance
CPU Variations
Programmer Coding Variations
Hard Disk Break In
drsql.org
• Therefore, Louis’ performance ratings are a minimally useful tool for
comparing the performance of different strategies but may not
accurately predict the average performance you will get.
• I seriously suggest you test the heck out of the technologies yourself
using my code, your code, and anyone else’s code you can to make sure
you are getting the best performance possible.
• The Choices (For Me) will differ in 2016…
Model Choices – Logical Model
drsql.org
48
Model Choices – Physical Model
drsql.org
49
Model Choices – Tables to Make In-Mem (First Try)
drsql.org
50
Model Choices – Tables to Make In-Mem (Final 2014 Thinking)
drsql.org
The Grand Illusion
(So you think your life is complete confusion)
• Performance gains are not exactly what you may expect, even when they are massive
• In my examples (which is available on my website), I discovered when loading 20000
rows (10 connections of 2000 each)
– (Captured using Adam Machanic's http://www.datamanipulation.net/SQLQueryStress/ tool)
– On-Disk Tables with FK and Instead Of Trigger - 0.0472 seconds per row - Total Time – 1:12
– On-Disk Tables without FK or Instead Of Trigger - 0.0271 seconds per row - Total Time – 0:51
– In-Mem Tables using Interop code - 0.0202 seconds per row - Total Time 0:44
– In-Mem Tables with Native Code - 0.0050 second per row - Total Time – 0:31
– In-Mem Tables, Native Code, SCHEMA_ONLY – 0.0003 seconds per row - Total Time – 00:30
– In-Mem Tables (except CustomerAddress), Hybrid code – 0.0163 – Total Time – 0:42
• But should it be a lot better? Don't forget the overhead... (And SQLQueryStress has extra
for gathering stats)
drsql.org
– In-Mem Tables using 2016 enhancements – Coming soon to a SQLblog near you when enough features
available
Contact info
• Louis Davidson - [email protected]
• Website – http://drsql.org <-- Get slides here
• Twitter – http://twitter.com/drsql (@drsql)
• SQL Blog http://sqlblog.com/blogs/louis_davidson
• Simple Talk Blog – What Counts for a DBA
http://www.simple-talk.com/community/blogs/drsql/default.aspx
drsql.org
53
53