SQL 2016 R Services aka leveraging your local data lake

Download Report

Transcript SQL 2016 R Services aka leveraging your local data lake

SQL 2016 R Services
January 10, 2017
a.k.a. leveraging your local data lake
Consultant for over 20 years with experience in:
• BI development (primarily MS SQL Server)
• Application development (primarily .NET)
• SCRUM Master
• Project Management
• Business Analysis
• Quality Assurance
• Team Development
Kevin Queen
Director at Oakwood Systems Group Inc.
[email protected]
[email protected]
Agenda
1
Overview and Why R in SQL
2
Architecture and Setting Up
3
Demo using stored procedures
4
Demo using remote client
General Process Overview
Obtain data
Define the
problem and
desired outcome
Model, Train, Test
and Evaluate
Data Munge
Present and
Incorporate into
Solutions
These are iterative processes. However your organization depicts this, the end goal is Actionable Intelligent Information.
Building results into solutions enhances the return on investment.
Why R in SQL Server?
Traditional R development has commonly been
done in a stand alone desktop environment.
• Limited memory (and R works “in memory”)
• Have to download data
• Need to individually update packages and R
Why R in SQL Server? – bring the code to the data
SQL Server may already have your data
• Larger memory than work station (hopefully)
• Ability to use SQL and R in combination
• Ease of integration into applications to leverage and
apply models, predictors, etc.
• Still have to manage packages and updates (but more on
this later)
R Server can also run stand alone, or on Hadoop, and other
platforms – but this presentation focuses on SQL Server R
Development Environment (Client Server)
Microsoft OpenR is required – ScaleR handles distributing the workload
between the client and the server as well as parallel processing – ‘rx’
Production Environment
No matter how you develop the R scripts, in
database R services provides ease of execution
for production use.
Keys to Using SQL R Services
Leverage stored procedures to run R
scripts (similar to AzureML R module)
• Import table, view or query as a data set
• Perform R commands
• Export results to database
•
•
•
•
•
R script development is still on the
developer desktop
Need Server admin privileges to install
packages on server.
R versions are controlled by SQL updates.
Graphics outputs are downloaded from DB
Graphics can be incorporated into SSRS
Leverage Open R Server / Client to run
client side scripts on the server
•
•
•
•
•
•
R script development is still on the developer
desktop but accesses full database which
remains on server
Need Server admin privileges to install
packages on server – though a fix is promised.
Must sync Open R versions client/server
ScaleR package (comes with Open R) required
as this handles splitting tasks on client vs
server
Can use full database datasets
SQLBindR may address problem with R sync
9
Getting Set Up on the Server
The easy part: Install R Services as an option in SQL
Server 2016 SP1 (any edition should work with SP1)
Almost as easy: Go to your R Service install for the
specific SQL instance and install any needed packages.
•
Admin console go to C:\Program Files\Microsoft SQL
Server\MSSQL13.MSSQLSERVER\R_SERVICES\bin> (default instance and
location)
•
Run R.exe and install.packages() with lib = the corresponding library folder.
Each SQL instance has its own R environment!
10
Getting Set Up on the Client
Install an IDE (RStudio*, Microsoft R Client, or RTools
for Visual Studio 2015)
* You will also need to install Open R and set that as
your R environment
You may have to update R Server to V 9.0
https://support.microsoft.com/en-us/kb/3210262 -- contains fix
https://msdn.microsoft.com/en-us/library/mt590540.aspx -- more info
11
Demo – using R in Stored Procedures
SSMS – SQL-R-Exploration.sql
SQL Name
R Module Name in SQL
@input_data_1
InputDataSet
@output_data_1_name=N’name’
Dataframe in R named ‘name’
@params = N’@model varbinary(max)
Dataframe called “model”
Stored procedure returns
OutputDataSet
https://msdn.microsoft.com/en-us/library/mt683486.aspx
Homework: follow the above but actually score the model
Demo – using Remote Client
VS 2015 – SQLRWalk.R
•
•
•
•
Requires ScaleR library
Compute Context set up to manage client-server transfers
Notice impact of “marshalling” between client and server
Use the .R file you download, not what is shown on the web
pages as they fixed errors in the file only.
https://msdn.microsoft.com/en-us/library/mt634286.aspx
Homework: follow the above but actually score the model
Conclusions
Pros
•
•
•
•
Ability to run against larger data sets using server
Ability to leverage existing data in SQL
Ability to leverage SQL and R both in solutions
Ability to incorporate R in other applications
Cons
• SQL Server will need more memory
• Maintaining libraries and version synch is more
difficult – though a fix is promised and SQLBindR
may be an interim help
• RevoScaleR is a whole new way of life
Resources – edX.org
Good all around capabilities reference with tutorials
• https://msdn.microsoft.com/en-us/library/mt590811.aspx
Known Issues
• https://msdn.microsoft.com/en-us/library/mt590540.aspx
Synching R Server and R Client
• https://msdn.microsoft.com/en-us/library/mt590540.aspx
• https://support.microsoft.com/en-us/kb/3210262
• https://msdn.microsoft.com/microsoft-r/rserver-install-windows
• https://msdn.microsoft.com/en-us/library/mt791781.aspx
ScaleR Tutorial
• https://msdn.microsoft.com/microsoft-r/scaler-getting-started
Set Up
• https://msdn.microsoft.com/en-us/library/mt604885.aspx
15