none
MRO reads files slower than the standard distro RRS feed

  • Question

  • I have both standard R 3.3.3 and MRO 3.3.3 on my fedora

    I test how fast they will read a txt.gz file into a data.frame

    MRO : 111 secs

    standard: 81 secs or so

    it's about 180 meg file zipped

    when multiplying matrices MRO beats the natural R like 6.1 to 8.9 seconds - so well, it shows some superiority :D

    but why is the reading from disk so slow?


    • Edited by zbiggy Friday, April 7, 2017 5:05 PM
    Friday, April 7, 2017 5:03 PM

All replies

  • There's no obvious reason why they would be different. 

    One thought: did you run the tests more than once? Disk caching may be having an impact here, which would make the second run likely to be faster.

    Friday, April 7, 2017 5:19 PM
  • Can you provide us with some sample R code that you are using?

    Thanks!

    Stephen Weller

    Microsoft R Product Team

    Friday, April 7, 2017 6:28 PM
  • only this:

    gdzie <- '/home/z/Documents/R_nauka/grande/'
    system.time(
      train <- read.csv(paste0(gdzie, 'grande_CENTRAL_TRAIN.csv.gz'))
    )

    this time I launched first original R then MRO in order to avoid recycling of the cache content

    plain old R:   

    user  system elapsed
     78.719   1.193  80.201

    MRO:

       user  system elapsed
    109.465   1.121 111.015

    summary on that frame [i.e. system.time(summary(train)) ]

    plain R:

       user  system elapsed
      6.652   0.301   6.999

    open R:

       user  system elapsed
     13.857   0.132  14.066

    Friday, April 7, 2017 6:43 PM
  • I ran it several times maybe up to 4-5

    I just tried to run MRO as second and the result is similar

    I ran them in a terminal

    R

    <insert code>

    q()

    so there should be minimal noise from other things like 2 rstudios open etc.

    Friday, April 7, 2017 6:50 PM
  • I tried both under Windows - both R version work the same, so it looks like something is going on on fedora linux (mine or in general)

    Monday, April 10, 2017 9:09 AM
  • Thank you for reporting this issue.

    I have confirmed that their is a difference in execution time between MRO and CRAN-R, but this difference seems to exist on Linux platforms only - on Windows the timings are the same.

    This is definitely a problem. The strange thing is that their really is no difference between MRO and CRAN-R in how these functions are declared, so this seems to point to some difference perhaps in how MRO is compiled on Linux systems versus how CRAN-R is compiled.

    I will log a bug and we can investigate this issue some more on our end.

    Thanks for the report.

    Stephen Weller

    Microsoft R Product Team

    Tuesday, April 11, 2017 9:31 PM