none
How to pass a TSQL parameter into a Dynamic Query used by R Script to Insert Data into a table? RRS feed

  • Question

  • I have written a parameterized SQL Server 2017 stored procedure that requires an input parameter @StudyID.  The goal of the stored procedure is to group responses by Product Number and StudyID and output the mean and standard deviation for each response.  The challenge I'm having is that the current stored procedure below throws an error stating 

    The parameterized external script expects the parameter '@StudyID', which was not supplied.

    The parameter is supplied when calling the stored procedure with the following syntax:

    exec spCodeMeans
    @StudyID = 22

    The stored procedure is as follows:

    ALTER PROCEDURE [dbo].[spCodeMeans]
    	-- Add the parameters for the stored procedure here
    	@StudyID int
    
    AS
    BEGIN
    
    	
    	-- SET NOCOUNT ON added to prevent extra result sets from
    	-- interfering with SELECT statements.
    	SET NOCOUNT ON;
    
    	
    
        -- Insert statements for procedure here
    	Declare @sStudy varchar(50)
    	Set @sStudy = Convert(Varchar(50),@StudyID)
    	Declare @inquery nvarchar(max) = N'Select
    			c.StudyID, c.RespID, c.ProductNumber, c.ProductSequence, c.BottomScaleValue, 
    			c.BottomScaleAnchor, c.TopScaleValue, c.TopScaleAnchor, c.StudyDate,
    			c.DayOfWeek, c.A, c.B, c.C, c.D, c.E, c.F,
    			c.DependentVarYN, c.VariableAttributeID, c.VarAttributeName, c.[1] as c1, 
    			c.[2] as c2, c.[3] as c3, c.[4] as c4, c.[5] as c5, c.[6] as c6, c.[7] as c7, c.[8] as c8
    			from ClosedStudyResponses c
    			--Sensory Value Attributes only for mean and standard deviation analytics.
    			where VariableAttributeID = 1
    			and c.StudyID =' +@sStudy ;
    			
    BEGIN TRY
    			Insert into CodeMeans
    			exec sp_execute_external_script
    			@language = N'R',
    			@script = N'
    			library(dplyr)
    			function code_means(StudyID){
    				OutputDataSet <- as.data.frame(InputDataSet) %>%
    					group_by (StudyID, ProductNumber) %>%
    					summarise_all(.funs=c(mean, sd)) %>%
    					setNames(c("StudyID","ProductNumber",
    					paste0("c",1:8, "_mean"),
    					paste0("c",1:8, "_sd")))
    				}
    				',
    @params = N'@StudyID int',
    @input_data_1 = @inquery;
    	
    	END TRY
    	
    	BEGIN CATCH
    		THROW;
    	END CATCH
    
    	Select * from CodeMeans;
    END

    So, I'm not sure how to resolve the error that's being thrown.  Any help and guidance would be much appreciated.  A bigger question is whether SQL Server Machine Learning with R using stored procedures is capable of working with a parameterized stored procedure in this manner?  StudyID must be parameterized as the stored procedure will be called from within an ASP.NET web application and the user will select the Study for which they will want to view the requested analytics.  

    Thanks again for your assistance.

    Saturday, June 9, 2018 3:05 AM

Answers

  • Hmm, maybe try and do something similar with minimal input data and to begin with just group the data.

    Also, there should be RevoScaleR functions for what you try to do. Look at rxDataStep and also this link: https://docs.microsoft.com/en-us/machine-learning-server/r/how-to-revoscaler-data-transform.

    Niels


    http://www.nielsberglund.com | @nielsberglund

    • Marked as answer by SChilders Tuesday, June 12, 2018 4:53 AM
    Tuesday, June 12, 2018 4:44 AM

All replies

  • Hey,

    You are almost there. The reason you get the error is that you need to declare @StudyID as well as set the value of the @StudyID parameter when you execute sp_execute_external_script.

    If @StudyID is sent in in your outer procedure, you would change the EXECUTE sp_execute_external_sript to something like so (I only show the pertinent parts):

    @params = N'@StudyID',

    @StudyID = @StudyID,

    @input_data_1 = @inquery

    In the code above I declare @StudyID in the @params parameter (exactly as you did), but after that, you need to indicate what @StudyID is, by adding it as a parameter and setting its value: @StudyID = @StudyID.

    It is not necessary to name the input parameter (the one holding the value) the same as the declared parameter, so let us say you named the parameter @myStudyID in your outer proc, then your code would look like so: @StudyID = @myStudyID.

    You can read more about all this here: http://www.nielsberglund.com/2018/03/11/microsoft-sql-server-r-services-sp-execute-external-script-ii/.

    Hope this helps!


    http://www.nielsberglund.com | @nielsberglund

    Saturday, June 9, 2018 4:44 AM
  • Hi Niels,

    Thanks so much for your assistance.  I find it a bit strange to add @StudyID = @StudyID to the outer portion of the procedure, but in a way understand why it needs to be that way.  So, I added this to my stored procedure so that it's now as follows:

    ALTER PROCEDURE [dbo].[spCodeMeans]
    	-- Add the parameters for the stored procedure here
    	@StudyID int
    
    AS
    BEGIN
    
    	
    	-- SET NOCOUNT ON added to prevent extra result sets from
    	-- interfering with SELECT statements.
    	SET NOCOUNT ON;
    
    	
    
        -- Insert statements for procedure here
    	Declare @sStudy varchar(50)
    	Set @sStudy = Convert(Varchar(50),@StudyID)
    	Declare @inquery nvarchar(max) = N'Select
    			c.StudyID, c.RespID, c.ProductNumber, c.ProductSequence, c.BottomScaleValue, 
    			c.BottomScaleAnchor, c.TopScaleValue, c.TopScaleAnchor, c.StudyDate,
    			c.DayOfWeek, c.A, c.B, c.C, c.D, c.E, c.F,
    			c.DependentVarYN, c.VariableAttributeID, c.VarAttributeName, c.[1] as c1, 
    			c.[2] as c2, c.[3] as c3, c.[4] as c4, c.[5] as c5, c.[6] as c6, c.[7] as c7, c.[8] as c8
    			from ClosedStudyResponses c
    			--Sensory Value Attributes only for mean and standard deviation analytics.
    			where VariableAttributeID = 1
    			and c.StudyID =' +@sStudy ;
    			
    BEGIN TRY
    			Insert into CodeMeans
    			exec sp_execute_external_script
    			@language = N'R',
    			@script = N'
    			library(dplyr)
    			function code_means(StudyID){
    				OutputDataSet <- as.data.frame(InputDataSet) %>%
    					group_by (StudyID, ProductNumber) %>%
    					summarise_all(.funs=c(mean, sd)) %>%
    					setNames(c("StudyID","ProductNumber",
    					paste0("c",1:8, "_mean"),
    					paste0("c",1:8, "_sd")))
    				}
    				',
    @params = N'@StudyID int',
    @input_data_1 = @inquery,
    @StudyID = @StudyID
    
    	
    	END TRY
    	
    	BEGIN CATCH
    		THROW;
    	END CATCH
    
    	Select * from CodeMeans;

    I then execute it by this syntax:

    EXEC spCodeMeans
    @StudyID = 22

    The resulting error states: Procedure or function  has too many arguments specified.  Should I define a new parameter @StudyID_outer to resolve this issue, or is there something else that needs to be adjusted?

    Thanks,

    Sid

    Saturday, June 9, 2018 2:48 PM
  • Hi Sid,

    Sorry, my code is slightly off. Move @input_data_1 above @params, in my example like so:

    , @input_data_1 = N'...'

    , @params = N'@StudyID int'

    , @StudyID = @StudyID

    That should do it.

    /Niels


    http://www.nielsberglund.com | @nielsberglund

    Saturday, June 9, 2018 5:59 PM
  • Hi Niels,

    Thanks for this!  Is there documentation that speaks to the order in which these need to be placed?  

    I've modified the stored procedure so that it now looks like this:

    BEGIN TRY
    			Insert into CodeMeans
    			exec sp_execute_external_script
    			@language = N'R',
    			@script = N'
    			library(dplyr)
    			codemeans <- function(StudyID){
    				OutputDataSet <- as.data.frame(InputDataSet) %>%
    					group_by (StudyID, ProductNumber) %>%
    					summarise_all(.funs=c(mean, sd)) %>%
    					setNames(c("StudyID","ProductNumber",
    					paste0("c",1:8, "_mean"),
    					paste0("c",1:8, "_sd")))
    				}
    				',
    @input_data_1 = @inquery,
    @params = N'@StudyID int',
    @StudyID = @StudyID
    
    	
    	END TRY
    	
    	BEGIN CATCH
    		THROW;
    	END CATCH
    
    	Select * from CodeMeans;


    I now receive the following:

    Msg 11536, Level 16, State 3, Line 3
    EXECUTE statement failed because its WITH RESULT SETS clause specified 1 result set(s), but the statement only sent 0 result set(s) at run time.
    STDERR message(s) from external script: 

    Attaching package: 'dplyr'

    The following objects are masked from 'package:stats':

        filter, lag

    The following objects are masked from 'package:base':

        intersect, setdiff, setequal, union

    Does this mean that I need to add the WITH RESULT SETS clause at the end and define each field returned?  I recall having to remove this last night because I'm inserting into a physical table, CodeMeans.  How should I proceed?

    Your help is greatly appreciated!!

    Saturday, June 9, 2018 6:35 PM
  • Hi Sid,

    I think that your function: function(StudyID) causes issues. You do not - AFAIK - have to declare a WITH RESULT SETS clause, for this to work. You get the error you refer to if there is no OutputDataSet defined and in your code, the OutputDataSet is "hidden" inside the function. 

    My assumption is that in your R script "codemeans <- function..." is a data frame. If that is the case you can assign a name to the output data through the @output_data_1_name parameter, something like so:

    , @input_data_1 = N'...'
    , @output_data_1_name = N'codemeans'
    , @params = N'@StudyID int'
    , @StudyID = @StudyID;

    Notice how the @output_data_1_name comes after the @input_data_1 parameter but before the @params parameter.

    /Niels


    http://www.nielsberglund.com | @nielsberglund


    • Edited by nielsbMVP Saturday, June 9, 2018 7:24 PM
    Saturday, June 9, 2018 7:22 PM
  • Hi Niels,

    Thanks so much.  This resolved the previous error, and WITH RESULT SETS was not needed.  The R script portion is as follows:

    BEGIN TRY
    			--Insert into CodeMeans
    			exec sp_execute_external_script
    			@language = N'R',
    			@script = N'
    			library(dplyr)
    			codemeans <- function(StudyID){
    				OutputDataSet <- data.frame(InputDataSet) %>%
    					group_by (StudyID, ProductNumber) %>%
    					summarise_all(.funs=c(mean, sd)) %>%
    					setNames(c("StudyID","ProductNumber",
    					paste0("c",1:8, "_mean"),
    					paste0("c",1:8, "_sd")))
    				}
    				',
    @input_data_1 = @inquery,
    @output_data_1_name = N'codemeans',
    @params = N'@StudyID int',
    @StudyID = @StudyID
    	
    	END TRY

    So, the intent here is to use the @inquery which is set to InputDataSet and pipe together with dplyr the mean and standard deviation grouped by StudyID and Product Number.  The stored procedure currently throws the error: The output dataset must be of type data frame; consider using data.frame() to convert it.

    How do I pipe together the OutputDataSet, the mean and standard deviation for each group by record within a data.frame?  

    Thanks,

    Sid

    Saturday, June 9, 2018 9:08 PM
  • Hi Sid,

    What do you want the output to look like, can you provide an example?

    /Niels


    http://www.nielsberglund.com | @nielsberglund

    Sunday, June 10, 2018 4:27 AM
  • Hi Niels,

    We would like the mean, and standard deviation, calculated for each c1 through c8 value grouped by product number as per below:

    So, for each unique ProductNumber, we want to pivot out the mean for c1 through c8 as well as the standard deviation for c1 through c8.  This will then need to be inserted into a table, CodeMeans, for regression analysis.  Is this feasible?

    Thanks,

    Sid

    Sunday, June 10, 2018 12:05 PM
  • Hi Niels,

    Just checking to see if what I've been asked to do is feasible?  Greatly appreciate your input.

    Take Care,

    Sid

    Monday, June 11, 2018 6:37 PM
  • Hi Sid,

    Sorry - been swamped. Yes, I believe wh1t you try to do should be feasible. However, my R is not that good - is it as easy as:

    myds <- function(StudyID) {your function code here}
    codemeans <- data.frame(myds)


    /Niels


    http://www.nielsberglund.com | @nielsberglund

    Tuesday, June 12, 2018 3:03 AM
  • Hi Niels,

    Great minds think alike!  So, I modified the R portion of the stored procedure today as follows:

    library(dplyr)
    			codemeans <- function(StudyID){
    				res <- InputDataSet %>%
    					group_by (StudyID, ProductNumber) %>%
    					summarise_all(.funs=c(mean, sd)) %>%
    					setNames(c("StudyID","ProductNumber",
    					paste0("c",1:8, "_mean"),
    					paste0("c",1:8, "_sd")))
    				df <- data.frame(res)
    				}

    This resolves the error, but no data is being returned in the result set.  Do you think the setNames portion of the script might be at issue, or something else?

    Thanks,

    Sid

    Tuesday, June 12, 2018 4:34 AM
  • Hmm, maybe try and do something similar with minimal input data and to begin with just group the data.

    Also, there should be RevoScaleR functions for what you try to do. Look at rxDataStep and also this link: https://docs.microsoft.com/en-us/machine-learning-server/r/how-to-revoscaler-data-transform.

    Niels


    http://www.nielsberglund.com | @nielsberglund

    • Marked as answer by SChilders Tuesday, June 12, 2018 4:53 AM
    Tuesday, June 12, 2018 4:44 AM