none
read.table: Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'builtin' RRS feed

  • Question

  • Hi,

    I've been getting this:

    #Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
    #Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'builtin'

    error a lot in RStudio trying to perform fairly basic operations; specifically, 'read.table'.

    I originally had R3.3, RStudio 3.3 and MRAN 3.3 installed in Windows 10.

    This seems to be a problem with base R functionality, as opposed to the code I am using.

    I only have around 4GB of RAM but the files I'm attempting to import with 'read.table' are usually only around 100MB maximum. The 'read.table' function itself work fine and one of the first things I noticed about the problem was that it only seemed to occur if I stacked multiple 'read.table' operations in succession of each other; either verbosely in the code or within a loop.

    I have also tried explicitly removing any temporary files generated and immediately calling gc(). The error would even appear even after entirely restarting the computer and starting nothing but RStudio.

    I noticed this seems to be time related somehow. If I manually drag over each import and click run, it's usually fine. This led me to start inserting Sys.sleep(20) pauses after the imports, which almost entirely removes the issue. But, with hundreds of files to get through, results in RStudio spending most of its time asleep.

    I've since re-installed the computer, installed RStudio 3.4 and MRAN 3.4 solely, run the code and, although the error is appearing less frequently, it is still appearing half an hour or more into a run.

    > quarterly.reports.to.modify = c('DEMO14Q3.txt', 'DEMO14Q4.txt', 'DEMO15Q1.txt', 'DEMO15Q2.txt', 'DEMO15Q3.txt', 'DEMO15Q4.txt',
    +                                 'DEMO16Q1.txt', 'DEMO16Q2.txt', 'DEMO16Q3.txt', 'DEMO16Q4.txt')
    > for (i in quarterly.reports.to.modify){
    +   temp = read.table(i, header = FALSE, sep = '$', quote = '', row.names = NULL, na.strings = '', fill = TRUE, strip.white = TRUE,
    +                     blank.lines.skip = TRUE, allowEscapes = FALSE, stringsAsFactors = FALSE)
    +   #Sys.sleep(20)
    +   temp = temp[-c(1),] 
    +   temp = temp[-c(26:100)]
    +   temp[26:30] = NA
    +   temp = temp[,c('V1', 'V2', 'V3', 'V4', 'V26', 'V27', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10', 'V11', 'V12', 'V13', 'V14', 'V15', 'V16',
    +                  'V17', 'V18', 'V19', 'V20', 'V21', 'V23', 'V28', 'V22', 'V29', 'V24', 'V25')]
    +   temp = mutate_each(temp, funs(tolower))
    +   try((temp$V1 = as.numeric(temp$V1)), silent = TRUE)
    +   temp = temp[!is.na(temp$V1),]
    +   temp = rbind(c('primaryid', 'caseid', 'caseversion', 'i.f.code', 'foll.seq', 'image', 'event.dt', 'mfr.dt', 'init.fda.dt', 'fda.dt',
    +                  'rept.cod', 'auth.num', 'mfr.num', 'mfr.sndr', 'lit.ref', 'age', 'age.cod', 'age.grp', 'sex', 'e.sub', 'wt',
    +                  'wt.cod', 'rept.dt', 'occp.cod', 'death.dt', 'to.mfr', 'confid', 'reporter.country', 'occr.country'), temp)
    +   write.table(temp, i, sep = '$', quote = FALSE, na = 'NA', row.names = FALSE, col.names = FALSE, fileEncoding = 'UTF-8')
    +   rm(temp) <--- note that i am purposefully removing the temp object here
    +   gc()} <--- and calling the garbage collector, to try and ensure the memory allocation is clear. it appears both without doing this.
    > rm(quarterly.reports.to.modify, i)
    > gc()

    I'm at a loss as to why this is occurring.

    The code it's self has also gone through multiple iterations over the last six months and it's been consistently doing the same thing, even with entirely different operations being performed. It'll import most of the files and work on them, but almost randomly return:

    #Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'builtin'

    A pattern I've noticed is that it tends to be for the files that are closer to 100MB in size (that contain 20 or so columns) as opposed to the single digit file sizes (containing just three or four columns).

    Ironically, fread of data.table never returns the error, but it is written to read files in as quickly as possible. I have to use 'read.table' as the original files are also too messy for fread to work with initially.

    An equally odd trait of the error is it can actually get worse the less I'm doing on the computer; e.g. it's just managed to successfully complete all the loops but I'm on the web simultaneously.

    Saturday, May 27, 2017 4:41 PM