KnowWhatYourCodeisDoingtoSQLServerx
Download
Report
Transcript KnowWhatYourCodeisDoingtoSQLServerx
Presentation by Kevin G. Boles
[email protected]
Kevin G. Boles
SQL Server Consultant Extraordinaire
“The Wolfe” for the Relational Engine
Indicium Resources, Inc.
AIM/Twitter: TheSQLGuru
GTalk/GMail: [email protected]
MVP 2007-2012, MCT, MCITP, yada-yada-yada
World-class Better-than-average relational
engine expert
Almost 45,000 hours invested in SQL Server
Absolutely LOVE doing targeted performance
analysis and tuning work, and designing and
building high-scale, highly-performing
database applications
VERY good at making others better at
interacting with SQL Server
Couldn’t SSAS my way out of a paper bag
◦ … with both ends open!
GURUISM:
Anything that allows developers to
slap together code more quickly
is inversely proportional to the
performance and scalability
you will get from that code!!
Can be WAY more productive as a coder using it
Unfortunately, several traps that are easy to fall into have given it a
reputation for performing poorly
Many of these problems center around database access, but not all
RTFM!!!
◦ Read The “Fine” Manual
Other ORMs have identical/similar issues
Disclaimer: I am NOT an EF/Nhibernate/Pick-Your-Favorite-ORM Guru!!
At its heart, Entity Framework is a way of exposing .NET objects
without actually knowing their values, but then fetching / updating
those values from the database behind the scenes when you need
them.
It’s important to be aware of when EF is going to hit the database – a
process called materialization.
Let’s say we have a database with an entity db.Schools. We might
choose to write something like:
string city = "New York";
List<School> schools = db.Schools.ToList();
List<School> newYorkSchools = schools.Where(s => s.City == city).ToList();
On line 2 when we do .ToList(), Entity Framework will go out to the
database to materialize the entities, so that the application has
access to the actual values of those objects, rather than just having
an understanding of how to look them up from the database. It’s
going to retrieve every row in that Schools table, then filter the list in
.NET.
NO WHERE CLAUSE!!
LET SQL SERVER DO THE FILTERING!!
We can do that either with ...
List<School> newYorkSchools = db.Schools.Where(s => s.City == city).ToList();
... or even ...
IQueryable<School> schools = db.Schools;
List<School> newYorkSchools = schools.Where(s => s.City == city).ToList();
In our database, every Pupil belongs to a School, referencing the Schools table
using a foreign key on the SchoolId column. Equivalently, in our EF model, the
Schools object has a virtual property Pupils.
We want to print a list of how many pupils attend each school:
string city = "New York";
List<School> schools = db.Schools.Where(s => s.City == city).ToList();
var sb = new StringBuilder();
foreach(var school in schools)
{
sb.Append(school.Name);
sb.Append(": ");
sb.Append(school.Pupils.Count);
sb.Append(Environment.NewLine);
}
If we look in ANTS at what happens when this code runs, we see a query run once
to get a list of schools in New York, but another query is also run 500 times to
fetch Pupil information.
This happens because BY DEFAULT, EF uses a loading strategy called
Lazy Loading, where it doesn’t fetch any data associated with the
virtual Pupils property on the School object when the first query is
run.
This leads to the name “N+1 select problem”, because N plus 1
queries are executed, where N is the number of objects returned by
the original query.
If you know that you’re definitely going to want the Pupil data, you’d
be better doing things differently – especially if you want it for a
large number of School objects.
This is particularly important if there is high latency between your
application and the database server (“CLOUD” ANYONE?!?)
Use the Eager Loading data access strategy, which fetches the related data
in a single query when you use an Include() statement.
List<School> schools = db.Schools
.Where(s => s.City == city)
.Include(x => x.Pupils)
.ToList();
But that gets them all. Make sure you really NEED them all. Isn’t it silly to
get ALL rows from database just to count them in client?
Sproc?
Other? Key is DON’T GET DATA ITERATIVELY AND DON’T GET DATA YOU
DON’T NEED!!
Let’s say we want to print the name of every pupil at a certain
SchoolId
int schoolId = 1;
List<Pupil> pupils = db.Pupils
.Where(p => p.SchoolId == schoolId)
.ToList();
foreach(var pupil in pupils)
{
textBox_Output.Text += pupil.FirstName + " " + pupil.LastName;
textBox_Output.Text += Environment.NewLine;
}
But a LOT more data than the first and last names (FirstName and
LastName) has been retrieved
The problem here is that, at the point when the query is run, EF has
no idea what properties you might want to read, so its only option is
to retrieve all of an entity’s properties, i.e. every column in the table.
That causes two problems:
◦ We’re transferring more data than necessary. This impacts EVERYTHING in the
application stack - from SQL Server I/O, CPU, RAM, network, right through to
memory usage in our client application. EVERYTHING is retrieved, jpegs, XML,
comments, etc!
◦ By selecting every column (effectively running a “Select * From…” query), we
make it almost impossible to index the database usefully. Covering indexes are
not available as a tuning mechanism.
Fortunately we can tell Entity Framework to select only certain
specific columns
We can either select a dynamic object:
var pupils = db.Pupils
.Where(p => p.SchoolId == schoolId)
.Select(x => new { x.FirstName, x.LastName })
.ToList();
Or we could choose to define a separate class, sometimes called a DTO (Data Transfer
Object), to select into
List<PupilName> pupils = db.Pupils
.Where(p => p.SchoolId == schoolId)
.Select(x => new PupilName
{
FirstName = x.FirstName,
LastName = x.LastName
})
.ToList();
…
public class PupilName
{
public string FirstName { get; set; }
public string LastName { get; set; }
}
Data types matter, and if not enough attention is paid to them, even
disarmingly simple database queries can perform surprisingly poorly
We want to search for Pupils with zip code 90210. Easy:
string zipCode = "90210";
var pupils = db.Pupils
.Where(p => p.PostalZipCode == zipCode)
.Select(x => new {x.FirstName, x.LastName})
.ToList();
Unfortunately it takes a very long time for the results to come back
from the database. There are several million rows in the Pupils table,
but there’s an index covering the PostalZipCode column which we’re
searching against, so it should be quick to find the appropriate rows.
Indeed the results are returned instantly if we directly query the
database from SQL Server Management Studio using
SELECT FirstName, LastName FROM Pupils p WHERE p.PostalZipCode = '90210'
What gives?!?
BAD STUFF DUE TO
WRONG DATA TYPE!!
Query Plan Warning:
Type conversion: Seek Plan for CONVERT_IMPLICIT(nvarchar(20),
[Extent1].[PostalZipCode],0)=[@p__linq__0]
So [Extent1].[PostalZipCode] was implicitly converted to NVARCHAR(20)
If we look back at the complete query which was run we can see why. Entity
Framework has declared the variable as NVARCHAR, which seems sensible as strings
in .NET are Unicode, and NVARCHAR is the SQL Server type which can represent
Unicode strings.
But looking at the Pupils table we can see that the PostalZipCode
column is VARCHAR(20)
So SQL Server is being FORCED to:
Evaluate EVERY PostalZipCode in the table!!
◦ INDEX SCAN
lock/blocking/latches
IO
Buffer Pool slammed, forcing other useful pages out and harming entire server app performance
Etc.
◦ Then CONVERT the data type
CPU BURN
VOIDING ACQUIRING ACCURATE STATISTICS!!!!
All THAT happened because YOU the developer slapped some code
together quickly and didn’t RTFM! :-D
Solution: You just need to edit the model to explicitly tell Entity
Framework to use VARCHAR, using column annotation
public string Adderss2 { get; set; }
[Column(TypeName = "varchar")]
public string PostalZipCode { get; set; }
After making this trivial change, the parameter will be sent to SQL
Server as VARCHAR, so the data type will match the column in the
Pupils table, and an Index Seek operator can be used
Generally, these data type mismatches don't happen if EF creates the
database for you and is the only tool to modify its schema
BUT THAT ISN’T ALWAYS THE CASE! And there is lots of debate about
whether that is even a good idea or not. ;-)
Often we want to do a search that is based on several criteria. For
example, we might have a set of four search boxes for a user to
complete, where empty boxes are ignored, so write something like:
//Search data as input by user
var searchModel = new Pupil
{
FirstName = "Ben",
LastName = null,
City = null,
PostalZipCode = null
};
Only 1 of 4 possible filters has value
List<Pupil> pupils = db.Pupils.Where(p =>
(searchModel.FirstName == null || p.FirstName == searchModel.FirstName)
&& (searchModel.LastName == null || p.LastName == searchModel.LastName)
&& (searchModel.City == null || p.City == searchModel.City)
&& (searchModel.PostalZipCode == null || p.PostalZipCode ==
searchModel.PostalZipCode)
)
.Take(100)
.ToList();
We HOPE that the LastName, City, and PostalZipCode clauses, which
all evaluate to true because in this case they are null, will be
optimized away in .NET, leaving a query along the lines of ...
DECLARE @p__linq__0 NVARCHAR(20) = 'Ben'
SELECT TOP 100
PupilId ,
FirstName ,
LastName,
etc...
FROM
dbo.Pupils
WHERE
FirstName = @p__linq__0
-- Generated by ANTS Performance Profiler
-- Executed against .\SQL2014
USE [EFSchoolSystem]
DECLARE @p__linq__0 NVarChar(4000) SET @p__linq__0
DECLARE @p__linq__1 NVarChar(4000) SET @p__linq__1
DECLARE @p__linq__2 NVarChar(4000) SET @p__linq__2
DECLARE @p__linq__3 NVarChar(4000) SET @p__linq__3
DECLARE @p__linq__4 NVarChar(4000) SET @p__linq__4
DECLARE @p__linq__5 NVarChar(4000) SET @p__linq__5
DECLARE @p__linq__6 NVarChar(4000) SET @p__linq__6
DECLARE @p__linq__7 NVarChar(4000) SET @p__linq__7
=
=
=
=
=
=
=
=
'Ben'
'Ben'
''
''
''
''
''
''
-- Executed query
SELECT TOP (100)
[Extent1].[PupilId] AS [PupilId] ,
[Extent1].[FirstName] AS [FirstName] ,
[Extent1].[LastName] AS [LastName] ,
[Extent1].[Address1] AS [Address1] ,
[Extent1].[Adderss2] AS [Adderss2] ,
[Extent1].[PostalZipCode] AS [PostalZipCode] ,
[Extent1].[City] AS [City] ,
[Extent1].[PhoneNumber] AS [PhoneNumber] ,
[Extent1].[SchoolId] AS [SchoolId] ,
[Extent1].[Picture] AS [Picture]
FROM
[dbo].[Pupils] AS [Extent1]
WHERE
(@p__linq__0 IS NULL OR [Extent1].[FirstName] = @p__linq__1)
AND (@p__linq__2 IS NULL OR [Extent1].[LastName] = @p__linq__3)
AND (@p__linq__4 IS NULL OR [Extent1].[LastName] = @p__linq__5)
AND (@p__linq__6 IS NULL OR [Extent1].[PostalZipCode] = @p__linq__7)
BAD BAD BAD!!!
Sproc?!?
Conditionals in EF logic
Make SQL Server recompile the plans each time – from within EF
Write a custom database command interceptor to modify the EFgenerated SQL before it’s run, to add a “option(recompile)” hint. You
can write a class a little like this:
public class RecompileDbCommandInterceptor : IDbCommandInterceptor
{
public void ReaderExecuting(DbCommand command,
DbCommandInterceptionContext<DbDataReader> interceptionContext)
{
if(!command.CommandText.EndsWith(" option(recompile)"))
{
command.CommandText += " option(recompile)";
}
}
//and implement other interface members
}
And use it like this:
var interceptor = new RecompileDbCommandInterceptor();
DbInterception.Add(interceptor);
var pupils = db.Pupils.Where(p => p.City = city).ToList();
DbInterception.Remove(interceptor);
Note that this interception is enabled globally, not for the specific
instance of the context, so you probably want to disable it again so
that other queries aren’t affected
The reuse of execution plans is often a good thing because it avoids
the need to regenerate a plan each time a query is run
In order for a plan to be reused, the statement text must be
identical, which as we just saw, is the case for parameterized queries
So far we’ve seen that Entity Framework usually generates
parameterized queries when we include values through variables,
But with .Skip() or .Take() this doesn’t happen
When implementing a paging mechanism we might choose to write
the following:
var schools = db.Schools
.OrderBy(s => s.PostalZipCode)
.Skip(model.Page * model.ResultsPerPage)
.Take(model.ResultsPerPage)
.ToList();
Looking at the executed query we see that the ResultsPerPage (100)
and Page (417*100) integers are part of the query text, not
parameters
Next time we run this query for, say, page 567, a very slightly
different query will be run with a different number, but it will be
different enough that SQL Server won’t reuse the execution plan
Enable a SQL Server setting called ‘optimize for ad-hoc workloads ’ .
This makes SQL Server less aggressive at caching plans, and is
generally a good thing to enable, but it doesn’t address the
underlying issue.
The problem occurs in the first place because (due to an
implementation detail) when passing an int to the Skip() and Take()
methods, Entity Framework can’t see whether they were passed
absolute values like Take(100), or a variable like
Take(resultsPerPage), so it doesn’t know whether the value should be
parameterized.
But there’s an easy solution in EF 6, which includes versions of Skip()
and Take() which take a lambda instead of an int, enabling it to see
that variables have been used, and parameterize the query
int resultsToSkip = model.Page * model.ResultsPerPage;
var schools = db.Schools
.OrderBy(s => s.PostalZipCode)
.Skip(() => resultsToSkip) //must pre-calculate this value
.Take(() => model.ResultsPerPage)
.ToList();
When modifying data in SQL Server, Entity Framework will run
separate INSERT statements for every row being added.
The performance consequences of this are not good if you need to
insert a lot of data! (round trip, tlog buffer flush, etc).
You can use a NuGet package, EF.BulkInsert, which batches up Insert
statements instead, in much the way that the SqlBulkCopy class
does. This approach is also supported out of the box in Entity
Framework 7 (released Q1 2016).
If there’s a lot of latency between the application and the database,
this problem will be more pronounced.
Kevin G. Boles
◦ @TheSQLGuru
◦ [email protected]
Don’t forget about the #sqlhelp hash tag
This deck is based off of an AWESOME Simple-Talk post by Ben
Emmett, which has a LOT MORE great advice on EF performance
problems outside of those that hit SQL Server!!!
It shows screenshots from RedGate’s AWESOME ANTS Performance
Profiler
https://www.simple-talk.com/dotnet/.net-tools/entity-frameworkperformance-and-what-you-can-do-about-it/
Sample Code:
https://github.com/bcemmett/EntityFrameworkSchoolSystem –
setup instructions are included in the readme