none
Standardize columns across mean in XDF files RRS feed

  • Question

  • Hi,

    I have been trying to standardize all variables (~70 variables) in my data so that I can build XGB model on this. But since XDF works in chunks of data, I've to use the transformObjects parameter in rxDataStep() to pass the mean and sd of the data. This is what I'm doing in MRS:

    ## Calling file path of input and output file

    fileXdf <- file.path(home,'yellowData.xdf')

    fileXdf1 <- file.path(home,'yellowDataStd.xdf')

    ## Compute the summary statistics

    dataSummary <- rxSummary(~., data = fileXdf, summaryStats = c("Mean", "StdDev"))

    ## Extract Mean and StdDev from Summary

    meanData <- dataSummary$sDataFrame$Mean

    sdData <- dataSummary$sDataFrame$StdDev

    ## If null values, then replace mean with 0 and StdDev with 1

    meanData[is.na(meanData)] <- 0

    sdData[is.na(sdData)] <- 1

    ## Create a function to compute the scaled variable

    scaleData <- function(dataFile){

      for(i in 1:length(colnames(dataFile))) {

            if(class(dataFile[,i]) %in% c("numeric", "integer"))

            {    dataFile[,i] <- (as.numeric(dataFile[,i]) - as.numeric(myCenter[i])) / as.numeric(myScale[i]) }

            else {dataFile[,i] <- dataFile[,i]}

            }

      return(dataFile)

    }

    ## Run it with rxDataStep

    rxDataStep(inData = fileXdf, outFile = fileXdf1,

               transformFunc = scaleData,

               transformObjects = list(myCenter = meanData, myScale = sdData), overwrite = T

               )

    I get the below error:

    ERROR: The sample data set for the analysis has no variables.

    Caught exception in file: /builddir/ExaRoot/ExaCore/CxAnalysis.cpp, line: 3848. ThreadID: -1858330496 Rethrowing.

    Caught exception in file: /builddir/ExaRoot/ExaCore/CxAnalysis.cpp, line: 5375. ThreadID: -1858330496 Rethrowing.

    Error in doTryCatch(return(expr), name, parentenv, handler): ERROR: The sample data set for the analysis has no variables.

    I can't get to the root of the issue. Where am I doing wrong? Any help will be appreciated.

    Thanks.

    Monday, September 24, 2018 11:11 AM