none
Cannot coerce class rxSummary to data.frame RRS feed

  • Question

  • I've just started using the RevoScaleR library to calculate summaries such as mean, standard deviation and regression.  My stored procedure is as follows:

    ALTER PROCEDURE [dbo].[spCodeMeansRevo]
    @StudyID int
    AS
    BEGIN
    
    Declare @sStudy varchar(50)
    Set @sStudy = Convert(varchar(50), @StudyID) 
    
     Declare @inquery nvarchar(max) = N'
    Select
    			c.ProductNumber, c.[1] as c1, 
    			c.[2] as c2, c.[3] as c3, c.[4] as c4, c.[5] as c5, c.[6] as c6
    			from ClosedStudyResponses c
    			--Sensory Value Attributes only for mean and standard deviation analytics.
    			where 
    			--VariableAttributeID = 1
    			--and 
    			c.StudyID = ' + @sStudy ;
    
    EXEC sp_execute_external_script @language = N'R'
        , @script = N'
    
    studies <- InputDataSet
    studiesdf <- data.frame(studies)
    sumOut <- rxSummary(~ ProductNumber, data = studiesdf, summaryStats = c("Mean", "StdDev"))
    res <- data.frame(sumOut)
    '
        
    	
    	,@input_data_1 = @inquery
    	,@output_data_1_name = N'res'
    	,@params = N'@StudyID int'
    	,@StudyID = @StudyID
    
    END;

    My goal is to return a data.frame that contains a unique list of ProductNumbers with the mean of values c1 through c6 and standard deviation of c1 through c6.  Columns c1 through c6 may contain nulls.  Initially, I set the @output_data_1_name to sumOut but received an error stating that I had to have the results in data.frame.  Unfortunately, when I create the results data.frame, I now receive the error above.

    So, how do I structure my rxSummary syntax to group by ProductNumber and output mean and standard deviation in a data.frame for columns c1 through c6?

    Thanks,

    Sid

    Friday, June 15, 2018 6:45 PM

All replies

  • Their is an easier way to return the results from the rxSummary() call as a dataframe. The rxSummary() function itself returns a 'sDataFrame' component as part of it's return value. So you can do the following:

    res <- sumOut$sDataFrame

    Friday, June 15, 2018 8:13 PM
  • This now works.  However, I would like the mean and standard deviation calculated for each c1 through c6 value group by product number.  How do I accomplish this?
    Friday, June 15, 2018 8:24 PM
  • I think that you should be able to using the rxExecBy() function here.

    Try the following:

    rxExecBy(studiesdf, keys = "ProductNumber",  func = mean)

    Friday, June 15, 2018 9:08 PM
  • You can also use more than one key in the rxExecBy() call, which you may need here.

    Friday, June 15, 2018 9:10 PM
  • So, I see in this link that separate functions need to be created for mean and standard deviation.  However, it's not clear whether both functions can be passed into rxExecBy e.g. func = c(".mean",".sd") .

    Since I want both mean and sd calculated in one resultset, is the above syntax appropriate?  Are there system functions for mean and sd that can be passed into func instead of creating each one as per the link?

    Friday, June 15, 2018 9:26 PM
  • I was able to revise the R logic so that the error does not occur.  Unfortunately, rxSummary is not grouping by ProductNumber when calculating mean and standard deviation.  My programming logic is as follows:

    ALTER PROCEDURE [dbo].[spCodeMeansRevo]
    @StudyID int
    AS
    BEGIN
    
    Declare @sStudy varchar(50)
    Set @sStudy = Convert(varchar(50), @StudyID) 
    
     Declare @inquery nvarchar(max) = N'
    Select
    			c.ProductNumber, c.[1] as c1, 
    			c.[2] as c2, c.[3] as c3, c.[4] as c4, c.[5] as c5, c.[6] as c6
    			from ClosedStudyResponses c
    			where
    			c.StudyID = ' + @sStudy ;
    
    EXEC sp_execute_external_script @language = N'R'
        , @script = N'
    
    studies <- InputDataSet
    studiesdf <- data.frame(studies)
    
    sumOut <- rxSummary(~c1:ProductNumber, data = studiesdf, summaryStats = c("Mean","StdDev"))
    
    res <- sumOut$sDataFrame
    '

    The results are as follows

    At this point, I'm not sure how to obtain the mean and standard deviation for c1 through c6 for each Product Number.  Can rxSummary do this, or are additional methods needed?

    Tuesday, June 19, 2018 3:41 AM
  • rxExecBy() should still work for you. You should be able to pass in the rxSummary() call as input to rxExecBy(). Give it a try and let me know if you have problems.
    Tuesday, June 19, 2018 4:48 PM
  • I tried implementing rxExecBy as you suggested.  My current stored procedure is as follows:

    ALTER PROCEDURE [dbo].[spCodeMeansRevo]
    @StudyID int
    AS
    BEGIN
    
    Declare @sStudy varchar(50)
    Set @sStudy = Convert(varchar(50), @StudyID) 
    
     Declare @inquery nvarchar(max) = N'
    Select
    			c.ProductNumber, c.[1] as c1, 
    			c.[2] as c2, c.[3] as c3, c.[4] as c4, c.[5] as c5, c.[6] as c6
    			from ClosedStudyResponses c
    			where
    			c.StudyID = ' + @sStudy ;
    
    EXEC sp_execute_external_script @language = N'R'
        , @script = N'
    
    studies <- InputDataSet
    studiesdf <- data.frame(studies)
    
    # sumOut <- rxSummary(~c1:ProductNumber, data = studiesdf, summaryStats = c("Mean","StdDev"))
    result <- rxExecBy(inData = studiesdf, keys = c("StudyID", "ProductNumber"), func = mean)
    '
        
    	
    	,@input_data_1 = @inquery
    	,@output_data_1_name = N'result'
    	,@params = N'@StudyID int'
    	,@StudyID = @StudyID
    
    END;

    Note that I used rxExecBy to calculate the mean, but am thinking a custom function is needed that implements rxSummary to calculate the mean and standard deviation for each column.  Am I heading in the proper direction?

    Tuesday, June 19, 2018 9:35 PM
  • Yes, you would need to create and call a custom R function to compute both the mean and standard deviation simultaneously here.
    Thursday, June 21, 2018 10:32 PM