How In-Memory Affects Database Design

Download Report

Transcript How In-Memory Affects Database Design

How In-Memory Affects Database Design
Louis Davidson
Certified Nerd
drsql.org
1
Silver
Gold
Platinum
SQLSaturday Orlando Sponsors
drsql.org
SQL Server vNext
Wednesday, October 15
6:00 PM
Nova Southeastern University
4850 Millenia Blvd
Mark Souza
General Manager, Cloud and
Enterprise Engineering Team
drsql.org
http://magicpassopass.eventbrite.com
Follow along!
#SQLSaturdayOrlando
orlando.sqlsaturday.com
@SQLSaturdayOrlando
@MagicPASS
magicpass.sqlpass.org
@OrlPASS
orlando.sqlpass.org
drsql.org
Who am I?
•
•
•
•
drsql.org
Been in IT for over 19 years
Microsoft MVP For 10 Years
Corporate Data Architect
Written five books on
database design
• Ok, so they were all versions
of the same book. They at least
had slightly different titles each time
• Basically: I love Database Design,
and In-Memory technologies are
changing the game
5
Questions are Welcome
• Please limit questions to one’s I
know the answer to.
drsql.org
6
A tasty allegory…
• Bacon is awesome
• Bacon is an extremely powerful
tool for rapid fat and calorie
intake
• Even bacon isn't good for
everything
drsql.org
http://www.lazygamer.net/general-news/diablo-iii-players-burned-off-820-968-kgs-of-bacon/
7
https://www.flickr.com/photos/runnerone/6232183896/in/photostream/
The process I went through
• Start with basic requirements
– Sales system
– Stream of customer and order data
– Apply In-Memory OLTP to see how it changed
things
– Keep it very simple
• Learn a lot
– This presentation was borne out of what I
learned from that process (and Kalen
Delaney’s precon, whitepaper, and other
reading that is linked throughout the slides)
drsql.org
• Build a test and apply what I have learned
and morph until I get to what works
• Build something real in my day job, if
applicable
88
Attention: There Is Homework
(lots of it)
• I can’t teach you everything
about In-Memory in 1 mere
hour, particularly the internals
• The code will be
available/demonstrated, but it
is still very rudimentary
• It will get you started, but is
only just the tip of the iceberg
drsql.org
9
Introduction: What exactly is
In-Memory OLTP in SQL Server 2014?
• A totally new, revamped engine for data storage, co-located in
the same database with the existing engine
– Obviously Enterprise Only…
• Purpose built for certain scenarios
• Terminology can be confusing
–Existing tables: Home - On-Disk, but ideally cached In-Memory
–In-Memory tables: Home - In-Memory: but backed up by On-Disk
Structures
• If you have enough RAM, On-Disk tables are also in memory
drsql.org
–But the implementation is very very different
• In-Memory is both very easy, and very difficult to use
10
Design Basics
(And no, I am not stalling for time due to lack of material)
• Designing and Coding is Like the Chicken
and the Egg
I was first
As if…
–Design is what you do before coding
–Coding patterns can greatly affect design
–Engine implementation can greatly affect
design and coding patterns
–Developing software follows a natural process
Children
Relics
11
drsql.org
• We will discuss how In-Memory
technologies affect the entire
design/development lifecycle
Design Basics - Separate your design mind into
three phases
1. Logical (Overall data requirements in a data model format)
2. Physical Implementation Choice
A. Type of database system: Paper, Excel, Access, SQL Server, NoSQL, etc
B. Engine choices: In-Memory, On-Disk, Compression, Partitioning, etc
Note: Bad choices usually involve pointy hair and a magazine article with
very little thinking and testing
3. Physical (Relational Code)
drsql.org
• Before the engine choice I always suggested 3 before 2
• We will look at each of these phases and how in-mem may affect
your design of each output
12
Logical Design
(Though Not Everyone’s Is)
• This is the easiest part of the presentation
• You still need to model
–Entities and Attributes
–Uniqueness Conditions
–General Predicates
• As I see it, nothing changes…
drsql.org
13
Logical Data Model
drsql.org
14
Physical Implementation Overview
Client App
TDS Handler and Session Management
No improvements in
communication stack,
parameter passing, result
set generation
10-30x more efficient
(Real Apps see 2-30x)
Reduced log bandwidth &
contention. Log latency
remains
Natively
Compiled SPs
and Schema
Engine for
Memory_optimized Tables
& Indexes
Proc/Plan cache for ad-hoc
T-SQL and SPs
Interpreter for TSQL, query
plans, expressions
Access Methods
Query
Interop
Existing SQL
Component
Hekaton
Component
Generated
.dll
Buffer Pool for Tables &
Indexes
SQL Server.exe
Memory-optimized
Table Filegroup
Transaction Log
Data Filegroup
http://download.microsoft.com/documents/hk/technet/techdays2014/Day2/Session2/DBI394-SQL%20Server%202014%20In-Memory%20OLTP%20-%20Depp%20Dive.pdf
drsql.org
Checkpoints are
background sequential IO
Hekaton
Compiler
Parser,
Catalog,
Algebrizer,
Optimizer
Key
Physical Implementation
(Technically it’s all software!)
• Everything is different, and I am going to give just an overview of
physical details…
• In-Mem data structures coexist in the database alongside OnDisk ones
• Data is housed in RAM, and backed up in Delta Files and
Transaction Logs
–Delta files are stored as filestream storage
–The transaction log is the same one as you are used to (with lighter
utilization)
drsql.org
• Tables and Indexes are extremely coupled
• MVCC (Multi-Valued Concurrency Control) used for all isolation
16
Physical Design
(No, let’s not get physical)
• Your physical design will almost certainly need to be affected
• So much changes, even just changing the internal table structure
• In this section, we will discuss:
–Creating storage objects
• Table Creation
• Index Creation (which is technically part of the table creation)
• Altering a Table’s Structure
–Accessing (Modifying/Creating) data
drsql.org
• Using Normal T-SQL (Interop)
• Using Compiled Code (Native)
• Using a Hybrid Approach
• No Locks, No Latches, No Waiting
17
Creating Storage Objects - Tables
• The syntax is the same as on-disk, with a few additional settings
• You have a durability choices
– Individual In-Mem Table: Schema_Only or Schema_and_Data
– Database level for transactions: Delayed (also for on-disk tables)
• Basically Asynchronous Log Writes
• Aaron Bertrand has a great article on this here: http://sqlperformance.com/2014/04/io-subsystem/delayed-durability-in-sqlserver-2014
• You also have less to work with...
– Rowsize limited to 8060 bytes (Enforced at Create Time)
• Not all datatypes allowed (LOB types,CLR,sql_variant, datetimeoffset, rowversion)
– No check constraints
– No foreign keys
– Limited unique constraints (just one unique index per table)
•
drsql.org
• Every durable (Schema_and_Data) table must have a primary key
Note: There are memory optimized temporary tables too: See Kendra Little’s article here:
http://www.brentozar.com/archive/2014/04/table-variables-good-temp-tables-sql-2014/
18
Dealing with Un-Supported Datatypes…
drsql.org
• Say you have a table with 10 columns, but 1 is not allowed in a
In-Memory table
• First: Ask yourself if the table really fits the criteria we aren’t
done covering
• Second: If so, consider vertically partitioning
• CREATE TABLE In_Mem (KeyValue, Column1, Column2, Column3)
CREATE TABLE On_Disk (KeyValue, Column4)
• It is likely that uses of disallowed types wouldn’t be good for the
OLTP aspects of the table in any case.
19
Creating Storage Objects - Index creation
• Syntax is inline with CREATE TABLE
• Indexes are linked directly to the table
– 8 indexes max per table due to internals
– Only one unique index allowed (the primary key)
– Indexes are never persisted, but are rebuilt on restart
• String index columns must be a binary collation (case AND accent sensitive)
• Cannot index nullable column
• Two types
– Hash
• Ideal for single row lookups
• Fixed size, you choose the number of hash buckets (approx 1-2 * # of unique values
http://msdn.microsoft.com/en-us/library/dn494956.aspx)
drsql.org
– Bw Tree
• Best for range searches
• Very similar to a BTree index as you (hopefully) know it, but optimized for MVCC and pointer connection to
table
20
A Taste of the Physical Structures
• Basic data record for a row
Record Header
Data For Columns (Payload)
• Record Header
Begin Timestamp
End Timestamp
StatementId
IndexCount
IndexPointers
...
8
3
2
1
drsql.org
2121
Hash Index - Simplified
TableNameId
Country
OtherColumns
1
USA
Values
2
USA
Values
3
Canada
Values
Identity Column
Country
1
1
2
0
Φ
1
USA
2
3
3
4
4
5
0
Φ Φ
2
USA
0
Φ Φ
3
Canada
5
6
7
drsql.org
8
9
10
2222
Hash Index - Simplified
TableNameId
Country
OtherColumns
1
USA
Values
2
Canada
Values
3
Canada
Values
Country
Identity Column
1
1
Φ
0
USA
2
2
3
1
3
0
Φ
100
2
USA
4
4
5
5
6
100
Φ Φ
2
Canada
Φ
3
Canada
7
drsql.org
8
9
0
10
2323
Bw Tree Index – Even More Simplified
Page 0
Page Mapping
Table
C
R
Z
Non-Leaf Pages
By Page ID
Page 0
A
B
Page 3
Page 2
Page 1
Page 1
D
C
S
J
G
T
Z
Leaf Pages
Page 2
Page 3
0
0
Φ
OtherVals
B
DifferentRow
B
JustDifferent
Data Pages
0
Φ
D
AnotherRow
drsql.org
240
240
B
2424
Do you want to know more?
• For more in-depth coverage
– check Kalen Delaney's white paper ... http://t.co/T6zToWc6y6
– Or for an even deeper (nerdier?) versions: “Hekaton: SQL Server’s MemoryOptimized OLTP Engine”
http://research.microsoft.com/apps/pubs/default.aspx?id=193594 or The BwTree: A B-tree for New Hardware Platforms
(http://research.microsoft.com/pubs/178758/bw-tree-icde2013-final.pdf)
– Books Online: http://technet.microsoft.com/en-us/library/dn133186.aspx
drsql.org
– TechDays Presentation:
http://download.microsoft.com/documents/hk/technet/techdays2014/Day2/Session
2/DBI394-SQL%20Server%202014%20In-Memory%20OLTP%20%20Depp%20Dive.pdf
25
Creating Storage Objects - Altering a Table
• The is the second easiest slide
in the deck
• No alterations allowed - Strictly
Drop and Recreate
– You can rename a table, which
makes this at east easier
ALTER
drsql.org
26
DEMO IN SLIDES – PREPARING TO (AND ACTUALLY) CREATING
TABLES
drsql.org
27
Setting the Database To Allow In-Mem
drsql.org
CREATE DATABASE HowInMemObjectsAffectDesign
ON PRIMARY
( NAME Add
= N'HowInMemObjectsAffectDesign',
FILENAME =
a filegroup to hold the
N‘Drive:\HowInMemObjectsAffectDesign.mdf'
, SIZE = 2GB , MAXSIZE = UNLIMITED,
delta files
FILEGROWTH = 10% ),
FILEGROUP [MemoryOptimizedFG] CONTAINS MEMORY_OPTIMIZED_DATA
( NAME = N'HowInMemObjectsAffectDesign_inmemFiles', FILENAME =
N'Drive:\InMemfiles' , MAXSIZE = UNLIMITED)
LOG ON
( NAME = N'HowInMemObjectsAffectDesign_log', FILENAME =
N'Drive:\HowInMemObjectsAffectDesign_log.ldf' , SIZE = 1GB , MAXSIZE = 2GB ,
FILEGROWTH = 10%);
GO
2828
Creating a Memory Optimized Permanent Table
CREATE TABLE [Customers].[Customer]
Character column must be
binary to index/compare in
( Hash Index used for Primary
Key. Estimated Rows in
Table
[CustomerId]
integer
NOT NULL IDENTITY ( 1,1 ) , native code
25000-50000
[CustomerNumber]
char(10)
COLLATE Latin1_General_100_BIN2 NOT NULL,
Bw Tree Index on Customer
CONSTRAINT [XPKCustomer]
PRIMARY
KEY NONCLUSTERED
Number
HASH ( [CustomerId])
WITH ( BUCKET_COUNT = 50000),
This table is memory
INDEX
optimized
(ok, that was kindNONCLUSTERED
of obvious)
[CustomerNumber]
This table is as durable as the
database settings allow
( [CustomerNumber])
drsql.org
) WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA)
go
2929
Accessing the Data - Using Normal T-SQL (Interop)
• Using typical interpreted T-SQL
• Most T-SQL will work with no change (you may need to add
isolation level hints, particularly in explicit transaction)
• A few Exceptions that will not work
drsql.org
–TRUNCATE TABLE - This one is really annoying :)
–MERGE (In-Mem table cannot be the target)
–Cross Database Transactions (other than tempdb)
–Locking Hints
30
Accessing the Data using Compiled Code (Native)
• Instead of being interpreted, the stored procedure is compiled to machine
code
• Limited syntax (Like programming with both hands tied behind your back)
• Allowed syntax is listed in what is available, not what isn't
– http://msdn.microsoft.com/en-us/library/dn452279.aspx
• Some really extremely annoying ones:
– SUBSTRING supported; LEFT, RIGHT, not so much
– No Subqueries
– OR, NOT, IN, not supported in WHERE clause
– Can’t use on-disk objects (tables, sequences, views, etc)
– String Comparisons must be with columns of Binary Collation
drsql.org
• So you may have to write some "interesting" code
31
DEMO IN SLIDES – NATIVE STORED PROCEDURE
drsql.org
32
Creating a Natively Optimized
(I write my C# the new fashioned way, with T-SQL)
CREATE PROCEDURE Customers.Customer$CreateAndReturn
Works just like for views and
There is no Ownership
@Parameter1 Parameter1Type =functions.
'defaultValue1',
Can’t change the
chaining. All code executes as
object while this
@Parameter2 Parameter2Type =underlying
'defaultValue2',
the procedure owner
object references it
…
Alert parser that this will be a
@ParameterN
ParameterNType
= 'defaultValueN‘
natively compiled
object
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = N'us_english' )
<code>
drsql.org
END
Procedures are atomic
transactions
33
Accessing Data Using a Hybrid Approach
• Native code is very fast but very limited
• Use Native code where it makes sense, and not where it
doesn’t
• Example: Creating a sequential value
drsql.org
–In the demo code I started out by using RAND() to create
CustomerNumbers and SalesOrderNumbers.
–Using a SEQUENCE is far more straightforward
–So I made one Interpreted procedure that uses the SEQUENCE
outside of native code, then calls the native procedure
34
Accessing the Data - No Locks, No Latches, No
Waiting
• On-Disk Structures use Latches and Locks to implement
isolation
• In-Mem use Optimistic-MVCC
• You have 3 Isolation Levels:
–SNAPSHOT, REPEATABLE READ, SERIALIZABLE
–Evaluated before, or when the transaction is committed
–This makes data integrity checking "interesting"
drsql.org
• Essential difference, your code now must handle errors
35
Concurrency is the #1 difference you will deal with
• Scenario1: 2 Connections - Update Every Row In 1 Million Rows
• Any Isolation Level
• On-Disk
–Either: 1 connection blocks the other
–Or: Deadlock
• In-Mem
drsql.org
–One connection will fail, saying: “the row you are trying to update has
been updated since this transaction started” EVEN if it never commits.
36
Another slide on Concurrency
(Because if I had presented it concurrently with the other one, you wouldn’t have liked that)
• Scenario2: 1 Connection Updates All Rows, Another Reads All
Rows (In an explicit transaction)
• On-Disk
–Either: 1 connection blocks the other
–Or: Deadlock
• In-Mem
–Both Queries Execute Immediately
–In SNAPSHOT ISOLATION the reader will always succeed
–In REPEATABLE READ or SERIALIZABLE
drsql.org
• Commits transaction BEFORE updater commits: Success
• Commits transaction AFTER updater commits: Fails
37
The Difficulty of Data Integrity
• With on-disk structures, we used constraints for most issues
(Uniqueness, Foreign Key, Simple Predicates)
• With in-memory code, we have to implement in stored
procedure
–Uniqueness on > 1 column set suffers from timing (If N connections are
inserting the same data...MVCC will let them)
–Foreign Key can't reliably be done because:
• In Snapshot Isolation Level, the row may have been deleted while you check
• In Higher Levels, the transaction will fail if the row has been updated
drsql.org
–Check constraint style work can be done in stored procedures for the
most part.
38
Problem: How to Implement Uniqueness on > 1
Column Set: INDEXED VIEW?
• CREATE VIEW Customers.Customers$UniquenessEnforcement
WITH SCHEMABINDING
AS
SELECT customerId, emailAddress, customerNumber
FROM
customers.Customer
GO
• CREATE UNIQUE CLUSTERED INDEX emailAddress ON
Customers.Customers$UniquenessEnforcement (emailAddress)
GO
drsql.org
• Msg 10794, Level 16, State 12, Line 8
The operation 'CREATE INDEX' is not supported with memory optimized tables.
39
Problem: How to Implement Uniqueness on > 1
Column Set: Multiple Tables?
drsql.org
• Wow, that seems messy… And what about duplicate customerId
values in the two subordinate tables?
40
Problem: How to Implement Uniqueness on > 1
Column Set: Simple code
• You can’t…exactly. But what if EVERY caller has to go through the
following block:
• DECLARE @CustomerId INT
SELECT @CustomerId = CustomerId
FROM Customers.Customer
WHERE EmailAddress = @EmailAddress
drsql.org
IF @customerId is null… Do your insert
• This will stop MOST duplication, but not all. Two inserters can check at
the same time, and with no blocks, app locks, or constraints even
available, you may get duplicates.
• Remember the term: Optimistic Concurrency Control
41
When Should You Make Tables In-Memory Microsoft's Advice
• From http://msdn.microsoft.com/en-us/library/dn133186.aspx
Implementation Scenario
Benefits of In-Memory OLTP
High data insertion rate from multiple
concurrent connections.
Primarily append-only store.
Unable to keep up with the insert workload.
Eliminate contention.
Reduce logging.
Read performance and scale with periodic
batch inserts and updates.
High performance read operations,
especially when each server request has
multiple read operations to perform.
Unable to meet scale-up requirements.
Eliminate contention when new data arrives.
Lower latency data retrieval.
Minimize code execution time.
Intensive business logic processing in the
database server.
Insert, update, and delete workload.
Intensive computation inside stored
procedures.
Read and write contention.
Eliminate contention.
Minimize code execution time for reduced
latency and improved throughput.
Low latency.
Require low latency business transactions
which typical database solutions cannot
achieve.
Eliminate contention.
Minimize code execution time.
Low latency code execution.
Efficient data retrieval.
Session state management.
Frequent insert, update and point lookups.
High scale load from numerous stateless
web servers.
Eliminate contention.
Efficient data retrieval.
Optional IO reduction or removal, when
using non-durable tables
drsql.org
Implementation Scenario
42
When Should You Make Tables In-Memory
Louis's Advice
• More or less the same as Microsoft's really (duh!)
• Things to factor in
–High concurrency needs/Low chance of collisions
–Minimal uniqueness protection requirements
–Minimal data integrity concerns (minimal key update/deletes)
–Limited searching of data (binary comparisons only)
–Limited need for transaction isolation/Short transactions
drsql.org
• Basically, the “hot” tables in a strict OLTP workload...
43
The Choices I made
drsql.org
Louis has improved his methods for estimating performance, but your mileage will still vary.
• Louis’ tests are designed to reflect only one certain usage conditions and user behavior, but
several factors may affect your mileage significantly:
• How & Where You Put Your Logs
• Computer Condition & Maintenance
• CPU Variations
• Programmer Coding Variations
• Hard Disk Break In
• Therefore, Louis’ performance ratings are a minimally useful tool for comparing the
performance of different strategies but may not accurately predict the average performance
you will get.
• I seriously suggest you test the heck out of the technologies yourself using my code, your
code, and anyone else’s code you can to make sure you are getting the best performance
possible.
Model Choices – Logical Model
drsql.org
45
Model Choices – Physical Model
drsql.org
46
Model Choices – Tables to Make In-Mem (First Try)
drsql.org
47
Model Choices – Tables to Make In-Mem (Final)
drsql.org
The Grand Illusion
(So you think your life is complete confusion)
• Performance gains are not exactly what you may expect, even when they are
massive
• In my examples (which you have seen), I discovered when loading 20000 rows
(10 connections of 2000 each)
– (Captured using Adam Machanic's http://www.datamanipulation.net/SQLQueryStress/ tool)
A. On-Disk Tables with FK, Instead Of Trigger - 0.0472 seconds per row - Total Time – 1:12
B. On-Disk Tables withOUT FK, Instead Of Trigger - 0.0271 seconds per row - Total Time – 0:51
C. In-Mem Tables using Interop code - 0.0202 seconds per row - Total Time 0:44
D. In-Mem Tables with Native Code - 0.0050 second per row - Total Time – 0:31
E. In-Mem Tables, Native Code, SCHEMA_ONLY – 0.0003 seconds per row - Total Time – 00:30
F. In-Mem Tables (except CustomerAddress), Hybrid code – 0.0163 – Total Time – 0:55
drsql.org
• But should it be a lot better? Don't forget the overhead... (And SQLQueryStress
has extra for gathering stats)
Contact info
• Louis Davidson - [email protected]
• Website – http://drsql.org <-- Get slides here
• Twitter – http://twitter.com/drsql
• SQL Blog http://sqlblog.com/blogs/louis_davidson
drsql.org
• Simple Talk Blog – What Counts for a DBA
http://www.simple-talk.com/community/blogs/drsql/default.aspx
5050
Demo
As Much Code Review As We Have Time For!
drsql.org