none
DeployR in heavy parallel scenario exhausting resources RRS feed

  • Question

  • Hello Everybody,

    I'm facing a new DeployR scenario and wanted some help or insights from the community to solve some issues.

    I've an Enterprise DeployR instance in a big server (128GB, lots of processors,etc) We have created a scoring algorithm that works similar to the scoring model example. I've an RData model, which I load initially, and then I call an R Script that uses this model.

    I have a set os 200 client apps that might call this script. Max concurrency would be between 15 and 20.

    To call the script, I'm using the DeployR API's. The problem is that the scoring needs to work near-realtime. When using the API, ervy time I want to call the scoreing algorithm, the API creates a transient session  , loads the model, then runs the script. The loading time is between 15 to 20 second, really unacceptable for my application. The script runs in 2 seconds, almost alright.

    So, Using the API, the problem I have is that creating a session, running the script, closing the session, take almost 25 seconds. 

    I tried to avoid closing the session, this has the benefit that only the first call is the one that takes long, as the subsequent calls run fast.This option has 2 problems, 1- session timeout, 2- lots of sessions open in the DeployR Server are killing the server memory, because each session take 1GB memory (the model is big)

    So, my question is simple (I think so) Is there anyway that I can call the model and script, only using some sessions, that are loaded only ones (to avoid the loading time) ? 

    Reading some examples I spot the idea of using the RBroker instead of the DeployR API, using the Pooled Task Runtime. But my question here is "where does the pooled sessions live" I mean, I've 200 clients. If all those 200 clients create a pool, that preloads the model, this scenario will kill the server again. The question here is: The sessions of a pooled task runtime are shared across different clients ? 

    If that is the case, I think I should change my client apps and just use the RBroker. If this is not the case I have a third option.

    My third option would be to create a web service of my own, that will concentrate all calls from my clients. This service will have a Pooled Task runtime with 15 sessions, that run "fast" as they have a preloaded model. This looks like I'm reinventing the wheel, so I was looking to check with your previous experiences and understanding of the underlying architecture.

    Looking forward for your comments.

    Regards,

    GV


    Saturday, June 24, 2017 2:26 AM

All replies

  • Hey GV,

    In your post you mention you use DeployR. Have you had a look at mrsdeploy, which is the successor to DeployR. There has been improvements over DeployR when it comes to latency and load performance.

    Have a look at this link: https://msdn.microsoft.com/en-us/microsoft-r/operationalize/data-scientist-manage-services and read especially the part about Realtime Web Services.

    Niels


    http://www.nielsberglund.com | @nielsberglund

    Monday, June 26, 2017 3:59 AM
  • Dear Niels,

    Thank you for your reply, but indeed I did review mrsdeploy and i requires too many changes on my side. For realtime web services to work you need to use the supported functions. My R code does not use them.

    I prefer reviewing the possibilities of DeployR.

    Can you tell me if the scenarios I described can be solved ? 

    Regards,

    Monday, June 26, 2017 1:44 PM
  • Hi German,

    OK, I understand that you cannot use mrsdeploy, "bummer".

    My assumption is that the RBroker Pooled Task Runtime (PTR) is an in-memory pool of some sort (somewhat like the .NET thread-pool). 

    The PTR has a maxConcurrency configuration setting, to limit the number of concurrent tasks executing at the same time. I would try that and set it to your expected concurrency (I believe you mentioned 15 - 20).

    Another idea is to have a couple of servers behind a load balancer, to try and even out the workload.

    Niels


    http://www.nielsberglund.com | @nielsberglund

    Tuesday, June 27, 2017 3:01 AM