Optimizing R Open - Does adding Parallel functions (foreach,mclapply, etc...) improve performance on AWS Linux server? RRS feed

  • Question

  • Hi!

    I am trying to optimize performance for a multi-step script of approximately 300 lines. I am fortunate in that I am not resource constrained since I am running this on an AWS server with 128G memory and 16 cores.

    I messed around with the foreach function (vs for) for looping, and I saw modest improvement - even though I was accessing 14 cores - vs R Open using 8 cores. As a result, my impression is that R Open already is basically optimized for multicore, and there is only marginal benefit to re-writing all the code with the doParallel functions like foreach and cmlapply.

    Is that correct, or should I go through the brain damage of rewriting? My script currently takes roughly 3 hours to run on the above system.

    Thursday, November 10, 2016 7:32 PM

All replies

  • It depends what you're doing. If the body of your loop is making heavy use of the Math Kernel Library (matrix operations in particular), it's already running multithreaded and you won't get any benefit from using foreach.

    On the other hand, if your loop isn't consuming much CPU and running as a single thread (using non-BLAS R functions), you might find benefit.

    Thursday, November 10, 2016 7:51 PM
  • Thanks! That matches my experience.

    I just wanted to make sure

    Thursday, November 10, 2016 11:44 PM