Asked by:
Cannot coerce class rxSummary to data.frame
Question

I've just started using the RevoScaleR library to calculate summaries such as mean, standard deviation and regression. My stored procedure is as follows:
ALTER PROCEDURE [dbo].[spCodeMeansRevo] @StudyID int AS BEGIN Declare @sStudy varchar(50) Set @sStudy = Convert(varchar(50), @StudyID) Declare @inquery nvarchar(max) = N' Select c.ProductNumber, c.[1] as c1, c.[2] as c2, c.[3] as c3, c.[4] as c4, c.[5] as c5, c.[6] as c6 from ClosedStudyResponses c Sensory Value Attributes only for mean and standard deviation analytics. where VariableAttributeID = 1 and c.StudyID = ' + @sStudy ; EXEC sp_execute_external_script @language = N'R' , @script = N' studies < InputDataSet studiesdf < data.frame(studies) sumOut < rxSummary(~ ProductNumber, data = studiesdf, summaryStats = c("Mean", "StdDev")) res < data.frame(sumOut) ' ,@input_data_1 = @inquery ,@output_data_1_name = N'res' ,@params = N'@StudyID int' ,@StudyID = @StudyID END;
My goal is to return a data.frame that contains a unique list of ProductNumbers with the mean of values c1 through c6 and standard deviation of c1 through c6. Columns c1 through c6 may contain nulls. Initially, I set the @output_data_1_name to sumOut but received an error stating that I had to have the results in data.frame. Unfortunately, when I create the results data.frame, I now receive the error above.
So, how do I structure my rxSummary syntax to group by ProductNumber and output mean and standard deviation in a data.frame for columns c1 through c6?
Thanks,
Sid
All replies

Their is an easier way to return the results from the rxSummary() call as a dataframe. The rxSummary() function itself returns a 'sDataFrame' component as part of it's return value. So you can do the following:
res < sumOut$sDataFrame
 Proposed as answer by stevefwMicrosoft employee Friday, June 15, 2018 8:13 PM
 Unproposed as answer by SChilders Tuesday, June 19, 2018 3:36 AM




So, I see in this link that separate functions need to be created for mean and standard deviation. However, it's not clear whether both functions can be passed into rxExecBy e.g. func = c(".mean",".sd") .
Since I want both mean and sd calculated in one resultset, is the above syntax appropriate? Are there system functions for mean and sd that can be passed into func instead of creating each one as per the link?

I was able to revise the R logic so that the error does not occur. Unfortunately, rxSummary is not grouping by ProductNumber when calculating mean and standard deviation. My programming logic is as follows:
ALTER PROCEDURE [dbo].[spCodeMeansRevo] @StudyID int AS BEGIN Declare @sStudy varchar(50) Set @sStudy = Convert(varchar(50), @StudyID) Declare @inquery nvarchar(max) = N' Select c.ProductNumber, c.[1] as c1, c.[2] as c2, c.[3] as c3, c.[4] as c4, c.[5] as c5, c.[6] as c6 from ClosedStudyResponses c where c.StudyID = ' + @sStudy ; EXEC sp_execute_external_script @language = N'R' , @script = N' studies < InputDataSet studiesdf < data.frame(studies) sumOut < rxSummary(~c1:ProductNumber, data = studiesdf, summaryStats = c("Mean","StdDev")) res < sumOut$sDataFrame '
The results are as follows
At this point, I'm not sure how to obtain the mean and standard deviation for c1 through c6 for each Product Number. Can rxSummary do this, or are additional methods needed?


I tried implementing rxExecBy as you suggested. My current stored procedure is as follows:
ALTER PROCEDURE [dbo].[spCodeMeansRevo] @StudyID int AS BEGIN Declare @sStudy varchar(50) Set @sStudy = Convert(varchar(50), @StudyID) Declare @inquery nvarchar(max) = N' Select c.ProductNumber, c.[1] as c1, c.[2] as c2, c.[3] as c3, c.[4] as c4, c.[5] as c5, c.[6] as c6 from ClosedStudyResponses c where c.StudyID = ' + @sStudy ; EXEC sp_execute_external_script @language = N'R' , @script = N' studies < InputDataSet studiesdf < data.frame(studies) # sumOut < rxSummary(~c1:ProductNumber, data = studiesdf, summaryStats = c("Mean","StdDev")) result < rxExecBy(inData = studiesdf, keys = c("StudyID", "ProductNumber"), func = mean) ' ,@input_data_1 = @inquery ,@output_data_1_name = N'result' ,@params = N'@StudyID int' ,@StudyID = @StudyID END;
Note that I used rxExecBy to calculate the mean, but am thinking a custom function is needed that implements rxSummary to calculate the mean and standard deviation for each column. Am I heading in the proper direction?
