none
Creating an array of lagged values with rxDataStep and mclapply RRS feed

  • Question

  • Hello, I have a xdf file named probe03Seq that has a list of events.

    Variable information: 

    Var 1: cm_mac_address       55718 factor levels: 
    Var 2: time, Type: POSIXct
    Var 3: status       3 factor levels:
    Var 4: duration_disc       10 factor levels:
    Var 5: down_power_disc       10 factor levels: 
    Var 6: down_snr_disc       10 factor levels:
    Var 7: down_speed_disc       10 factor levels
    Var 8: latency_disc       1 factor levels:
    Var 9: up_power_disc       10 factor levels:
    Var 10: up_speed_disc       10 factor levels:
    Var 11: Sequence       34777 factor levels:

    I then split this large xdf by the column cm_mac_address into many small xdf's and sort them by time.

    nocSplit <- rxSplit(inData = "probe03Seq.xdf",
                         outFilesBase = file.path(tempdir(), "MACAddress"),
                         splitByFactor = "cm_mac_address")

    mclapply(nocSplit, FUN = function(xdf) {
        
        rxSort(inData = xdf,
               outFile = xdf,
               sortByVars = "time",
               overwrite = TRUE)
    })

    I am now trying to figure out how to create a new variable across these small xdf's. I would like to be able to set a window size and be able to create an array of the previous values in the Sequence column. so for example if I have an observation every hour with a window size of 10 hrs I'd like to see something like the SequenceList column below.

    window.size = 10 hrs

    time Sequence SequenceList 1 6 NA 2 5 NA 3 7 NA 4 8 NA 5 4 NA 6 2 NA 7 10 NA 8 9 NA 9 4 NA 10 6 NA 11 4 {6,5,7,8,4,2,10,9,4,6} 12 3 {5,7,8,4,2,10,9,4,6,4} 13 8 {7,8,4,2,10,9,4,6,4,3} 14 3 {8,4,2,10,9,4,6,4,3,8} 15 9 {4,2,10,9,4,6,4,3,8,3} 16 1 {2,10,9,4,6,4,3,8,3,9} 17 7 {10,9,4,6,4,3,8,3,9,1} 18 3 {9,4,6,4,3,8,3,9,1,7} 19 8 {4,6,4,3,8,3,9,1,7,8} 20 10 {6,4,3,8,3,9,1,7,8,10}

    Matt Parker from the Azure team had a good code for lagging 1 row below.

    lagVar <- function(dataList) { 

         if(.rxStartRow == 1) {
            dataList[[newName]] <- c(NA, dataList[[varToLag]][-.rxNumRows]) 
        } else {
            dataList[[newName]] <- c(.rxGet("lastValue"),
                                     dataList[[varToLag]][-.rxNumRows]) 
          }

        .rxSet("lastValue", dataList[[varToLag]][.rxNumRows])

        dataList

    }


    # Now I'll apply the lagging function in the same way - I'll just wrap it with lapply()
    lapply(djiaSplit, FUN = function(xdf) {
      
        rxDataStep(inData = xdf, 
                   outFile = xdf,
                   transformObjects = list(
                       varToLag = "Open", 
                       newName = "previousOpen"), 
                   transformFunc = lagVar,
                   # append = "cols",
                   overwrite = TRUE)
        
    })

    I think this same approach of wrapping rxDataStep with a custom function in mclapply can be used again. I am just having trouble coming up with this function. Any help would be appreciated! I have looked at the window() function, but since I am working with an xdf and not a ts object I couldn't get it to work.

    Friday, May 13, 2016 12:19 AM

All replies

  •  I have figured out a function that works with a regular dataframe,

    set.seed(100)
    mydf<-data.frame(time=(1:1000),event = sample(1:10,10000,replace=TRUE))

    w=10
    for (i in 1:nrow(mydf)){
      if(i<=w){
        mydf$eventList[i] = NA
        } 
      else {
        mydf$eventList[i] = list(mydf$event[c((i-w):i)])
        }
    }

    However, when I modify this to work with an xdf file I get an error.

    lagVarWindow <- function(dataList) { 

    for (i in 1:.rxNumRows){
      if(i<=window.size){
         dataList[[newName]][i] = NA
        } 
      else {
         dataList[[newName]][i] = list(dataList[[varToLag]][c((i-window.size):i)])
        }
    }

    dataList

    }


    mclapply(nocSplit, FUN = function(xdf) {

        rxDataStep(inData = xdf, 
                   outFile = xdf,
      transformObjects = list(
     window.size = 10,
                      varToLag = "Sequence", 
                      newName = "Sequence2"),
                   transformFunc = lagVarWindow,
      # append = "cols",
                   overwrite = TRUE)

    })

    Error in doTryCatch(return(expr), name, parentenv, handler) : 
      Found list tag in the middle of data: '<list=Sequence2&2190:1>'



    Friday, May 13, 2016 8:26 PM