none
How does .rxSet and .rxGet work in a distributed envrionment? RRS feed

  • Question

  • For instance how or does the following work in a R multi node\server environment?

    Does R actually send the value of .rxSet across machines at chunk boundaries that cross machines?

    lagVar <- function(dataList) {

         # .rxStartRow returns the overall row number of the first row in this
         # chunk. So - the first row of the first chunk is equal to one.
         # If this is the very first row, there's no previous value to use - so
         # it's just an NA.
         if(.rxStartRow == 1) {

            # Put the NA out front, then shift all the other values down one row.
            # newName is the desired name of the lagged variable, set using
            # transformObjects - see below
            dataList[[newName]] <- c(NA, dataList[[varToLag]][-.rxNumRows])

        } else {

            # If this isn't the very first chunk, we have to fetch the previous
            # value from the previous chunk using .rxGet, then shift all other
            # values down one row, just as before.
            dataList[[newName]] <- c(.rxGet("lastValue"),
                                     dataList[[varToLag]][-.rxNumRows])

          }

        # Finally, once this chunk is done processing, set its lastValue so that
        # the next chunk can use it.
        .rxSet("lastValue", dataList[[varToLag]][.rxNumRows])

        # Return dataList with the new variable
        dataList

    }


    • Edited by AceHack Thursday, August 11, 2016 1:56 AM
    Thursday, August 11, 2016 12:12 AM

All replies

  • Also if this is the case how is threading handed? In multithread\multi node system how does this value not get stomped on by the chunk processing in parallel?

    Thursday, August 11, 2016 12:15 AM
  • Hi AceHack,

    I realize you have asked this question nearly two years ago, but I came across it when searching for somewhat related information. As of now, it seems that .rxSet, .rxGet, and friends only work in a local compute context: https://docs.microsoft.com/en-us/machine-learning-server/r-reference/revoscaler/rxtransform

    So it appears that the answer to your question "no, it does not send the values to remote nodes". Implementing lead or lag in a distributed environment is not a trivial undertaking... if the data resides on a DB server (MS SQL Server, Azure DW, Snowflake...), perhaps it would be best to use it for any operations which are available as analytic functions in the DB engine.

    Monday, July 30, 2018 11:10 AM
  • Hi Timo,

    Thanks for your response, I read the article you linked but I did not come to the same conclusion that they are only available in a local context.  From my understanding from the article the only ones avialabe only in a local context are .rxStartRow, .rxChunkNum, .rxNumRows, .rxReadFileName, .rxIsTestChunk, .rxIsPrediction, and .rxTransformEnvir.


    • Edited by AceHack Monday, July 30, 2018 1:23 PM
    Monday, July 30, 2018 1:23 PM