SQL CLR Demystified

Download Report

Transcript SQL CLR Demystified

SQL CLR Demystified
A look at the integration of the CLR into SQL Server
Matt Whitfield
Atlantis Interactive UK Ltd
What will be covered
• Overview of CLR
integration
• SqlContext
• Stored Procedures
• Scalar & Table
Functions
• CLR DML Triggers
• CLR DDL Triggers
• Aggregate Functions
• CLR Types
• Visual Studio
Database Projects
• A look at permission
sets
• Examples:
– Data maniplation for
performance
– Environment
manipulation for
maintenance
Overview
• CLR integration into SQL Server allows us to run
CLR code that conforms to certain restrictions
• CLR integration means that we can use common
business rule validation code in the middle tier
and the database
• For some operations, CLR integration allows us
to extract maximum performance
• CLR integration allows us to perform
maintenance tasks that would have been very
difficult or impossible before
CLR Disabled by Default
• CLR integration is disabled by default in
SQL Server, and therefore must be
enabled:
EXEC sp_configure ‘clr enabled’, 1;
RECONFIGURE WITH OVERRIDE;
GO
• This can also be done using the
Surface Area Configuration tool.
What Framework version?
• The CLR in SQL Server is always loaded as the
2.0 runtime
• Until .NET 4.0, a single CLR host process could
only load a single version of the .NET runtime
• It is possible future versions of SQL Server may
allow more up-to-date runtime versions to be
loaded
• We can confirm the loaded version with:
select * from sys.dm_clr_properties
• However, we can use types from .NET 3.0 and
3.5, as these releases did not update mscorlib
Some Things To Note
• When SQL Server loads assemblies, they are cached in
memory. When the O/S signals memory pressure to SQL
Server, explicit garbage collection may be run, and the
assemblies may be unloaded. This can cause
performance issues if it happens frequently.
• SQL CLR code cannot be costed accurately by the query
optimiser, as it does not look at what the code actually
does – this can affect execution plans.
• SQL CLR code can sometimes prevent parallelism, and
SQL CLR procedures are usually single threaded.
Sometimes this can hurt performance, other times you
may find you achieve the same amount of work in the
same time using 1 thread instead of 8.
SqlContext
• SqlContext is a class that gives us access
to four members:
– IsAvailable
– Pipe
– TriggerContext
– WindowsIdentity
• IsAvailable indicates whether the other
members can be used
SqlContext.Pipe
• Pipe is the main point of interaction with
the SQL Server when returning result sets
or messages
• Messages can be sent to the client using
the Send(string) function:
SqlContext.Pipe.Send(“Hello World”);
• Single row result sets can be sent using
the Send(SqlDataRecord) function.
Multi-row Result Sets
• SendResultsStart is used
to specify the schema of
the result set
SendResultsStart(SqlDataRecord)
– No data is sent by the
SendResultsStart call
• 0 or more calls to
SendResultsRow are
then sent with data
SendResultsRow(SqlDataRecord)
– Empty result sets can be
sent by making no calls to
SendResultsRow
• SendResultsEnd marks
the end of the result set
SendResultsEnd()
SqlContext.WindowsIdentity
• This returns a standard
System.Security.Principal.WindowsIdentity
object
• The property can only be used under
EXTERNAL_ACCESS or UNSAFE
permission sets
• For 99% of CLR work, using this property
is not required
SqlContext.TriggerContext
• TriggerContext gives you access to:
– ColumnCount – number of columns in the target table
– EventData – SqlXml that returns the same as the
EVENTDATA T-SQL function
– TriggerAction – a TriggerAction enumeration value
that specifies which action caused the trigger to fire
– IsUpdatedColumn – method that determines if a
column was updated – same as the T-SQL UPDATE()
function
– …but what about the inserted and deleted virtual
tables that are available in normal triggers?
Context Connection
• To access data from within CLR code, the
special ‘context connection’ connection string is
used…
using (SqlConnection connection =
new SqlConnection("context connection=true"))
{
connection.Open();
...
}
• We can then run any SQL statement as we
would normally from .NET
• The inserted and deleted virtual tables are
available through this method when in a DML
trigger
Stored Procedures
• What is a stored procedure?
– A procedure that contains a sequence of
operations, which may affect existing data,
return result sets or both
– CLR Stored Procedures are no different
– CLR Stored Procedures are implemented as
static members of a class
– The containing class is irrelevant to the
execution of the procedure, and is an
organisational unit only
Hello World
public partial class StoredProcedures
{
[Microsoft.SqlServer.Server.SqlProcedure]
public static void HelloWorld()
{
SqlContext.Pipe.Send("Hello World!");
}
}
•
•
Procedure is just a method, decorated with the SqlProcedure attribute.
Procedure is registered in SQL Server as follows:
CREATE PROCEDURE [dbo].[HelloWorld]
AS EXTERNAL NAME [SQLAssembly].[StoredProcedures].[HelloWorld]
GO
Simple Data Access
• In part 1:
– we will open a context connection
– we will execute a command
– we will read the results, and use those to populate a Dictionary
object
• In part 2:
– We will create record metadata
– We will create a record set
– We will populate the record set from the Dictionary
• While a very mundane example, and not something you
would usually use the CLR for, it shows the basic
mechanics of moving data into and out of the CLR
Simple Data Access (1/2)
public static void RowCounter()
{
Dictionary<string, int> typeDictionary = new Dictionary<string, int>();
using (SqlConnection connection = new SqlConnection("context connection=true"))
{
connection.Open();
using (SqlCommand command = new SqlCommand("SELECT [type] from [sys].[objects];", connection))
{
using (SqlDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
string type = reader.GetString(0);
if (typeDictionary.ContainsKey(type))
{
typeDictionary[type] = typeDictionary[type] + 1;
}
else
{
typeDictionary.Add(type, 1);
}
}
}
}
}
...
Simple Data Access (2/2)
...
SqlDataRecord record =
new SqlDataRecord(new SqlMetaData[] {
new SqlMetaData("Type", SqlDbType.NChar, 2),
new SqlMetaData("Count", SqlDbType.Int)
});
SqlContext.Pipe.SendResultsStart(record);
foreach (KeyValuePair<string, int> kvp in typeDictionary)
{
record.SetString(0, kvp.Key);
record.SetInt32(1, kvp.Value);
SqlContext.Pipe.SendResultsRow(record);
}
SqlContext.Pipe.SendResultsEnd();
}
Things to note
• That we did not create a new
SqlDataRecord object for every row, we
just populated the existing object
• That the name exposed to SQL Server
does not have to match the name of the
implementing method
• The return type of the stored procedure
can be int to return a value, as with T-SQL
stored procedures
Scalar Functions
• What is a scalar function?
– A statement or set of statements that return a
single value of a specific type based on zero
or more parameters
– CLR Scalar Functions are no different
– CLR Scalar Functions are implemented much
the same as stored procedures – a single
static method of a class
String Length Scalar Function
[Microsoft.SqlServer.Server.SqlFunction]
public static long fn_LengthTestCLR(string input)
{
return input.Length;
}
CREATE FUNCTION [dbo].[fn_LengthTestCLR]
(@input nvarchar (MAX))
RETURNS bigint
AS EXTERNAL NAME
[SQLAssembly].[UserDefinedFunctions].[fn_LengthTestCLR]
GO
Performance Comparison
• Consider the equivalent T-SQL function, and the following SQL
Statement:
CREATE FUNCTION dbo.fn_LengthTest(@input varchar(MAX))
RETURNS [bigint]
AS
BEGIN
RETURN LEN(@input)
END
SELECT SUM(dbo.fn_LengthTest(o.[name]))
FROM [sys].[objects] [o] CROSS JOIN
[sys].[all_columns] [ac] CROSS JOIN
[sys].[all_columns] [ac1]
CLR would be slower, right?
• No!
• On a dual core 2GHz CPU –
– SQL Function – 9.036 seconds
– SQL CLR Function – 3.918 seconds
• Overhead of calling a SQL CLR function is
lower than that of calling a T-SQL function
Table Functions
• What is a table function?
– A statement or set of statements that return a single
result set with a defined schema based on zero or
more parameters
– CLR Table Functions can only be multi-statement
equivalents, there is no concept of an ‘inline CLR
table function’
– Implemented as a pair of methods – one that returns
an IEnumerable, and one that sets outbound
parameters for each object – both are static members
Regular Expression Function
• In Part 1:
– We will implement an object that we can return to hold
the data required to fill each row
• In Part 2:
– We will create a list of our storage object
– We will run a regular expression match based on the
parameters
– We will populate our list with the results of the regular
expression
– We will return the list
• In Part 3:
– We will set the values to be stored in each row
A Regular Expression Function (1/3)
private struct _match
{
public readonly int MatchNumber;
public readonly int GroupNumber;
public readonly string CaptureValue;
public _match(int matchNumber,
int groupNumber,
string captureValue)
{
MatchNumber = matchNumber;
GroupNumber = groupNumber;
CaptureValue = captureValue;
}
}
A Regular Expression Function (2/3)
[SqlFunction(FillRowMethodName = "FillRow",
TableDefinition = "MatchNumber int, " +
"GroupNumber int, " +
"CaptureValue nvarchar(MAX)")]
public static IEnumerable GetCaptureGroupValues(string input,
string pattern)
{
List<_match> matchList = new List<_match>();
int matchIndex = 0;
foreach (Match m in Regex.Matches(input, pattern))
{
int groupIndex = 0;
foreach (Group g in m.Groups)
{
matchList.Add(new _match(matchIndex, groupIndex++, g.Value));
}
matchIndex++;
}
return matchList;
}
A Regular Expression Function (3/3)
public static void FillRow(Object obj,
out int MatchNumber,
out int GroupNumber,
out SqlString CaptureValue)
{
_match match = (_match)obj;
MatchNumber = match.MatchNumber;
GroupNumber = match.GroupNumber;
CaptureValue = match.CaptureValue;
}
Creation SQL
CREATE FUNCTION [dbo].[GetCaptureGroupValues]
(@input [nvarchar] (MAX),
@pattern [nvarchar] (MAX))
RETURNS TABLE
([MatchNumber] [int] NULL,
[GroupNumber] [int] NULL,
[CaptureValue] [nvarchar] (MAX) NULL)
AS EXTERNAL NAME
[SQLAssembly].[UserDefinedFunctions].[GetCaptureGroupValues]
GO
• Again, the name exposed to SQL Server does not have to match the
name of the main body function
• The SQL definition can specify different column names and
parameter names to the main method and fill row method
Things to note
• That we used a struct for the storage object,
rather than a class, because of reduced
instantiation costs
• That our function was decorated with a
SqlFunction attribute that named the fill row
method and specified the shape of the output
table
• That the fill row method took an object as it’s first
parameter, and had an outbound parameter for
each of the columns we wanted to fill
CLR Triggers
• Known bug in Visual Studio 2005 and
2008 that means SqlTrigger attribute does
not allow you to specify the schema of the
target table
• Fixed in Visual Studio 2010
• Workaround is to comment out the
SqlTrigger attribute, and deploy manually
DML Triggers
• What is a DML Trigger?
– A statement or set of statements run when
data in a target table is modified
– CLR DML Triggers are no different
– CLR DML Triggers still have access to the
inserted and deleted virtual tables
– CLR DML Triggers are implemented as static
methods in a class
A Simple DML Trigger
• We will specify the event type and target
object using the SqlTrigger attribute
• We will connect to the database using the
context connection
• We will select the rows from the inserted
virtual table, and count them
• We will send a message to the SqlContext
pipe stating the number of rows
A Simple DML Trigger
[Microsoft.SqlServer.Server.SqlTrigger(Target="CallLog",
Event="FOR UPDATE")]
public static void TestDMLTrigger()
{
int i = 0;
using (SqlConnection connection = new SqlConnection("context connection=true"))
{
connection.Open();
using (SqlCommand command = new SqlCommand("SELECT * from inserted;", connection))
{
using (SqlDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
i++;
}
}
}
}
SqlContext.Pipe.Send(i.ToString() + " rows in inserted");
}
DDL Triggers
• What is a DDL Trigger?
– A statement or set of statements run when a
schema modification event occurs
– CLR DDL Triggers are no different
– CLR DDL Triggers have access to the event
data via SqlContext.TriggerContext
– CLR DDL Triggers are implemented as static
methods in a class
A Simple DDL Trigger
• We will specify the target and the event
type using the SqlTrigger attribute
• We will create an XmlDocument based on
the TriggerContext object from the
SqlContext object
• We will then use XPath to find the created
object’s schema and name
• We will send a message to the SqlContext
pipe stating the name of the created object
A Simple DDL Trigger
[Microsoft.SqlServer.Server.SqlTrigger(Target="DATABASE",
Event="FOR CREATE_TABLE")]
public static void TestDDLTrigger()
{
XmlDocument document = new XmlDocument();
document.LoadXml(SqlContext.TriggerContext.EventData.Value);
string objectName = document.SelectSingleNode("//SchemaName").InnerText +
"." +
document.SelectSingleNode("//ObjectName").InnerText;
SqlContext.Pipe.Send("Table " + objectName + " was created.");
}
Aggregate Functions
• What is an aggregate function?
– A function that takes a series of values and
accumulates them into a single, aggregated result
– CLR Aggregate functions are no different
– Implemented as a class with four methods:
•
•
•
•
Init
Accumulate
Merge
Terminate
– Aggregate functions in SQL Server 2005 can take
only one parameter, in SQL Server 2008 this
restriction is lifted
Aggregate Function Methods
• Init method is called first, here we initialise
our member fields
• Accumulate is called to add a value to the
aggregate
• Merge is called to add together two
aggregates (for example, as a result of
parallelism)
• Terminate is called to return our value
A Simple Aggregate
• We will use the SqlUserDefinedAggregate
attribute to specify the serialisation format of our
aggregate
• We will track the minimum value in Accumulate,
and also track that we have seen values
• In Merge, we will check if the other aggregate
class has seen values - if it has, then we take
the minimum value seen
• In Terminate we will return the minimum value
we have seen if we have seen any values, or
NULL if we have not seen any values
A Simple Aggregate (1/2)
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)]
public struct MinValue
{
int minValue;
bool hasValues;
public void Init()
{
minValue = Int32.MaxValue;
hasValues = false;
}
public void Accumulate(SqlInt32 Value)
{
hasValues = true;
if (Value.Value < minValue)
{
minValue = Value.Value;
}
}
...
A Simple Aggregate (2/2)
...
public void Merge(MinValue Group)
{
if (Group.hasValues)
{
if (Group.minValue < minValue)
{
minValue = Group.minValue;
}
hasValues = true;
}
}
public SqlInt32 Terminate()
{
if (hasValues)
{
return minValue;
}
else
{
return SqlInt32.Null;
}
}
}
SqlUserDefinedAggregate
• This attribute contains parameters which can
affect the query optimiser, and can cause
incorrect results if set wrong:
– IsInvariantToDuplicates – should only be true if
aggregating the same value many times does not
affect the result
– IsInvariantToNulls – should only be true if aggregating
NULL values does not affect the result
– IsInvariantToOrder – should only be true if the result
is not dependent on the order of the aggregated
values
– IsNullIfEmpty – should only be true if the aggregate of
0 values is NULL
Native and UserDefined
Serialisation
• If your aggregate contains reference types then you
must implement custom serialisation, and specify
Format.UserDefined in the SqlUserDefinedAggregate
attribute
• When specifying UserDefined format, your aggregate
must implement the IBinarySerialize interface, and
specify it’s maximum size in the
SqlUserDefinedAggregate attribute
• Maximum byte size can be 1 to 8000 in SQL Server
2005, or -1 for any value between 8001 bytes and 2 GB
in SQL Server 2008.
• The same serialisation interface applies to both
Aggregates and User Defined Types
CLR Types
• What is a user defined type?
– A type based on a system type, which may be
bound to old-style rules and defaults
– CLR Types are different
– CLR Types can be complex, and can have
properties and methods
– CLR Types can have static methods (and the
built in CLR Types in SQL Server 2008 have
some)
A Simple CLR Type
• Part 1:
– We will define the serialisation format of the type
– We will provide the IsNull property and the Null static property to
return a NULL instance
• Part 2:
– We will define methods to convert the type to and from strings
• Part 3:
– We will define a property for the symbol, a method to change the
symbol and a static method to create a new currency with a
defined symbol and value
• Part 4:
– We will declare our member fields and provide implementation of
the IBinarySerialize interface to load and store our type
A Simple CLR Type (1/4)
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedType(Format.UserDefined,
MaxByteSize=24)]
public struct Currency : INullable, IBinarySerialize
{
public bool IsNull
{
get
{
return _null;
}
}
public static Currency Null
{
get
{
Currency h = new Currency();
h._null = true;
return h;
}
}
A Simple CLR Type (2/4)
public override string ToString()
{
return _symbol + _value.ToString();
}
public static Currency Parse(SqlString s)
{
if (s.IsNull ||
string.IsNullOrEmpty(s.Value) ||
s.Value.Length < 2 ||
string.Equals(s.Value, "NULL",
StringComparison.OrdinalIgnoreCase))
{
return Currency.Null;
}
Currency u = new Currency();
u._symbol = s.Value[0];
u._value = Decimal.Parse(s.Value.Substring(1));
return u;
}
A Simple CLR Type (3/4)
public char Symbol
{
get
{
return _symbol;
}
}
[SqlMethod(IsMutator=true)]
public void ChangeSymbol(char newSymbol)
{
_symbol = newSymbol;
}
public static Currency CreateCurrency(char symbol, Decimal value)
{
Currency c = new Currency();
c._symbol = symbol;
c._value = value;
return c;
}
A Simple CLR Type (4/4)
private char _symbol;
private Decimal _value;
private bool _null;
public void Read(BinaryReader r)
{
_null = r.ReadBoolean();
_symbol = r.ReadChar();
_value = r.ReadDecimal();
}
public void Write(BinaryWriter w)
{
w.Write(_null);
w.Write(_symbol);
w.Write(_value);
}
}
The CLR Type in use
•
Using our static method to create an instance:
DECLARE @i [Currency]
SET @i = [Currency]::[CreateCurrency]('$', 0.53)
PRINT CONVERT([varchar], @i)
•
Using our instance mutator method to change the symbol:
DECLARE @i [Currency]
SET @i = [Currency]::[CreateCurrency]('$', 0.53)
SET @i.ChangeSymbol('£')
PRINT CONVERT([varchar], @i)
•
Retrieving the value of a property:
DECLARE @i [Currency]
SET @i = [Currency]::[CreateCurrency]('$', 0.53)
PRINT @i.Symbol
Things to note
• We can call static methods and properties on CLR types
with [type-name]::[member-name]
• IBinarySerialize must be used to serialise types that
cannot be serialised natively
• A CLR type should always be able to parse it’s own
string representation
• Methods that change the state of the type must be
marked as mutator methods
• Although in .NET we can define operator overloads to
specify how operators (e.g. +, -, *, /) are applied to our
classes, SQL Server does not use these.
Permission Sets
• Each assembly in SQL Server has an
associated ‘permission set’ – one of:
– SAFE
– EXTENAL_ACCESS
– UNSAFE
• The permission set assigned to the assembly
determines how limited the CLR functionality is
• This, in turn, determines how much damage we
can do to the stability of the host process (i.e.
SQL Server itself)
Allowed operations
SAFE
EXTERNAL_ACCESS
UNSAFE
Code access security
permissions
Execute only
Execute + access to
external resources
Unrestricted
Programming model
restrictions
Yes
Yes
No
Verifiability
requirement
Yes
Yes
No
Local data access
Yes
Yes
Yes
Ability to call native
code
No
No
Yes
Programming model restrictions
• A few of the programming model restrictions for SAFE
and EXTERNAL_ACCESS assemblies are as follows:
–
–
–
–
–
–
No static fields may be used to store information
PEVerify type safety checking test is passed
No synchronization may be used
Finalizer methods may not be used
No threading may be used
No self-affecting code may be used
• There are more requirements that must be met under
the above two permission sets – the above is just
some of the more common restrictions.
Visual Studio Database Projects
• Visual Studio offers us the ability to create
a ‘CLR Database Project’ which enables
simplified deployment and debugging
• Some issues present, though these are
mostly solved in VS2010
• Database projects handle the deployment
of the objects that we create simply, which
can be really helpful during
implementation
Performance Example
Phil Factor Speed Phreak Competition 4
The Log Parsing Problem
• Producing aggregated results from an IIS log file
• Made difficult by the need to determine
‘sessions’ – i.e. groups of rows from the same IP
within a set time-span
• Requirement to produce three aggregated result
sets – visitor summary by day, visitor summary
and page summary by day
How did the CLR help?
• CLR methods can yield excellent results when
doing forward-only navigation through result
sets, performing aggregation and calculation
along the way
• Iteration over the log allowed the data to be
parsed into ‘visits’ by applying the session
timeout
• Iteration over collection of visits allowed the data
to be aggregated into country and page
summaries
What was the result?
• Execution in 301ms, as opposed to the
closest T-SQL approach which took
1779ms.
• Efficiency gains through the fact that the
CLR procedure would only occupy 1
thread
• Maintenance benefits in readability and
simplicity
Maintenance Example
• CLR Procedure to retrieve the free disk space
on any specified server – by Tara Kizer
• Uses System.Diagnostics namespace to
interface with WMI to retrieve results
• Requires UNSAFE assembly permission set
because PerformanceCounterCategory is
synchronised
• Simple use of framework classes to retrieve the
information required
When should we use the CLR?
• To improve performance:
– In situations where a forward-only scan over a result set requires
row by row processing to produce an aggregated result set
– In situations where complex logic cannot simply be expressed in
T-SQL
• To improve maintenance:
– In situations where information required can be simply obtained
from any .NET process
– In situations where interaction with the O/S is required (will
require elevated permission set)
• For functionality:
– In situations where commonly available .NET Framework
classes provide desired functionality (classic example is Regular
Expression handling
– To bring business logic from the middle tier into the database
And when should we not?
• When the functionality presented offers little over
it’s T-SQL equivalent
• Instead of regular DML statements (i.e. a CLR
procedure to insert a row is not the way to go!)
• To replicate row by row update logic from a
cursor – conversion to set-based T-SQL yields
far higher performance
• When the desired operation really does not
belong in the database
• When the database has to be portable to other
types of database server
Summary
• We learned:
– How to implement each type of CLR object
– How to get data out of and back into SQL Server
– That the only SQL CLR object type which differs
significantly from it’s T-SQL equivalent is the CLR
User Defined Type
– That the CLR can offer significant performance
improvements, particularly for forward-only scans and
scalar functions
– That we are restricted as to what we can achieve in a
CLR method based on the permission set of the
assembly