As people look to incorporate R code into their SQL Server environment there are a lot of questions surrounding how it works. As R is very memory intensive, is it possible that people running R code will prevent other processes from running? What processes should be monitored to determine the impact of R code on SQL Server performance? Does the server need to have internet access to run R? Is it possible with R to consume server memory without even loading something onto the server? Why do you have to run R code as an external script within SQL Server? You can start to get these questions answered by understanding how R runs within SQL Server.
SQL Server R Install
In the SQL Server 2016 installation there are two different R components, R Services (In-Database) and R Server (Standalone). Microsoft bought Revolution Analytics (and the company’s flagship software, Revolution R Enterprise) on April 6, 2015. Revolution R Enterprise became R Server. Included in the SQL Server installer, R Server runs on Linux or Windows and is designed to improve the performance of R code not only SQL Server, but also on data stored in Hadoop or Teradata. If you are interested in performing the R code on another server so that you do not tax the resources of your SQL Server database, you can.
However, if the data being used in R comes from SQL Server, the latency increases for sending all of the data from SQL Server to the stand-alone R server needs to be considered. Moving it off can decrease the performance of the R code. The licensing costs of installing R Server standalone also need to be evaluated with a Microsoft representative as well. An evaluation of the resource load on SQL Server with R Services (In-Database) should be performed prior to the decision to move and install R Server Standalone.
Installing R Components and Internet Access
For the first time, Microsoft is including an open source product, with a different licensing agreement, within SQL Server. In addition to the prompt for the R license, the source executable is available on the internet, not from the SQL Server Install. If the server does not have access to the internet, the Open Source R Package and Microsoft R Package components must be downloaded to the server. The install recognizes this and will provide two links for the install and a window for providing the install path.
What is Installed
When R Services is installed on SQL Server, it installs the R libraries, R documentation, R tools, ScaleR Libraries, and several SQL Specific Binaries including a service executable Launchpad.exe. If someone ever wants a quick check to see if the R Services were installed, look for the SQL Server Launchpad service in the list of running services. The Launchpad.exe is how Microsoft initiates calls to R and other languages.
Configuring SQL Server to Run R
Once R has been installed there are some configuration steps which must be completed to permit R to run on SQL Server. From within an SSMS query window, the following script needs to be run to enable R
sp_configure 'external scripts enabled', 1 GO Reconfigure GO
After this step completes successfully, a restart of SQL Server Services is required. After the restart, to check to see if R is working properly, run the following code from within and SSMS query window.
EXEC sp_execute_external_script @language =N'R', @script=N'OutputDataSet <-InputDataSet', @input_data_1 =N'SELECT 1 as CheckToSeeIfRIsWorking' WITH result sets (([CheckToSeeIfRIsWorking] int not null)); GO
When run successfully, this script will return a 1. SQL Server is now ready to run R.
Different methods of using the SQL Server R Resources
There are two different methods of running R within SQL Server. The first is to call the system procedure sp_execute_external_script with the language parameter set to R, as shown in the code listed above. R code can be added to the script section and can be run within SQL Server.
The second method is not run from the server at all, but on any client which can access the server. Using the ScaleR functions included in SQL Server, it is possible for a client computer to use the server resources to run the R code on the client computer. This context switching feature allows clients to use the memory and resources of the server to run their code, which in many cases may greatly improve the performance.
Preventing Users from using Context Switching to Run on Servers
There may be situations and servers where DBAs managing the servers would prefer that no one use SQL Server resources to run R code. This feature is easily disabled by opening the SQL Server Configuration Manager and right clicking on SQL Server Launchpad. This will display a properties window, which is shown below.
Clicking on the Advanced Tab shows an entry for External Users Count, which is shown highlighted. This value is set by default to 20 users. This means that 20 different users can change the context of the code they are running on their client computers to run on the server. To prevent anyone from running code on the server, this value should be set to 0. Once the OK button is clicked, no one will be able to run R code on SQL Server. No restart is required.
SQL Server and R Integration
The attached diagram illustrates how R has been implemented with SQL Server. When a request to run R code either through a stored procedure or through context switching is created, using a named pipe, SQL Server calls the Launchpad.exe. Every time a stored procedure or call to run R is requested an rlauncher process is run. Five Windows job objects to process R are also created if none exist, but if there are unused windows job objects initiated by a previous call and not presently in use they will be utilized.
The job objects containers will execute the R code using the rterm.exe and call rxlink.dll. This dll processes messages to the BxlServer to process any ScaleR functions written in the R code, send monitoring information to the SQLOS, create XEvents and call the sqlsatellite.dll to send and retrieve data from SQL Server.
SQL Server Resource Allocation
SQL Server manages all resources using the application layer, SQLOS. SQLOS is the interface between SQL Server and all of the underlying hardware resources, including memory. Using the Resource Governor within SQL Server it is possible to allocate the resources used by specific processes to ensure that no single process, for example, will use all the memory, starving out other processes running on the machine. Configuring and using Resource Pools provides more important functions such as production applications to be allocated the majority of the SQL Server resources used by the SQLOS. This will ensure that an ad-hoc reporting query will not adversely impact the primary application.
Since R does not run within the SQLOS, SQL Server 2016 created a new External Resource Pool to manage the server resources used by R. By configuring and the Resource Governor and creating external resources pools, the resources used by R can be configured and monitored to determine the ongoing resource load.
R Resource Allocation within SQL Server
The allocations for the Resource Governor for all SQLOS functions can be found by running
SELECT * FROM sys.resource_governor_resource_pools WHERE name = 'default'
By default, the max cpu, memory and cpu cap are all set to 100 percent. To look at the resource allocation for R, the external resource pools need to be examined.
SELECT * FROM sys.resource_governor_external_resource_pools WHERE name = 'default'
By default, the maximum memory that R can use, outside of the memory that has been allocated to SQL Server, is 20% of the remaining memory. If the server needs to support more R processing on SQL Server, the values for SQLOS memory and external pool resources may need to be adjusted. The following settings will decrease the overall memory settings for SQLOS and increase the memory allocated to external processes.
ALTER RESOURCE POOL "default" WITH (max_memory_percent = 60);
ALTER EXTERNAL RESOURCE POOL "default" WITH (max_memory_percent = 40);
ALTER RESOURCE GOVERNOR reconfigure;
The values listed here are suggested if you want to configure SQL Server so that it has more resources for running R. Prior to making these changes, it may be best to create an external resource pool for all of the R code and monitor its usage over time. The following code will create an external resource pool for processes running R. The new pool is called R_Resources. The maximum memory was set to the value of the default, as this step should ideally be performed prior to changing the external resource values as shown earlier.
CREATE EXTERNAL RESOURCE POOL R_Resources WITH (max_memory_percent = 20);
The next step in the process is to create a workload group. The workload group, named RworkloadGroup in the code, is used as a container to hold processes which have been classified as R processes.
CREATE WORKLOAD GROUP RworkloadGroup WITH (importance = medium) USING "default", EXTERNAL "R_resources";
The next step is to create a function for classifying processes running as R so that they can be monitored in the workload group
CREATE FUNCTION is_R_app()
IF program_name() in ('Microsoft R Host', 'RStudio') RETURN 'RworkloadGroup';
Once the function has been created, then the Resource Governor is directed to use the function so that all of the R functions are going to be monitored in the external resource pool and turns on the Resource Governor with the reconfigure command.
ALTER RESOURCE GOVERNOR WITH (classifier_function = dbo.is_R_app);
ALTER RESOURCE GOVERNOR reconfigure;
Going forward, all processes running R will be classified and use the external governor setting specified.
This completes all of the steps required to ensure that SQL Server is configured to optimally run R. Special thanks to Bob Ward of Microsoft for taking the time to improve my understanding of the underlying processes running R.