How to select certain blocks from an XDF file? RRS feed

  • Question

  • Is there any way to select certain blocks from an XDF file?

    For example, how can we get block No. 8? or blocks 8,20, 33? 

    Friday, November 11, 2016 3:30 AM

All replies

  • In a local compute context there is an automatic variable called .rxChunkNum available in the transform function that could be used in conditional logic to select specific blocks to operate on assuming you set blocksPerRead. See the help on rxTransform for more info.   

    Monday, November 14, 2016 12:33 AM
  • Thanks! As you said, this works in a local compute context. What if we use a Hadoop or Spark compute context? 

    I tried using rxDataStep to get a specific block:

    rxDataStepXdf( inFile = dataXdf, outFile = dataXdfBlock, startBlock = 10, numBlocks = 1, overwrite = TRUE )

    This works only in a local compute context, but I get the same inFile as output when working in a Hadoop compute context.

    Is there any way to select certain blocks or .xdfd files with a Hadoop context? 

    Monday, November 14, 2016 7:50 AM
  • Not currently because the xdfd files are not an ordered collection. Its no different than reading a collection of CSV files that are distributed across HDFS. However another approach might be to use rowSelection if there is a field within each file that can be used to identify that block or the rows you need.   
    Tuesday, November 15, 2016 4:58 AM