none
mrsdeploy publishService never finishes with a Large Random Forest RRS feed

  • Question

  • I´m trying to publish a webservice of my random forest model, but it never finishes.

    It´s only 400 trees, but have a size of 243Mb.

    A glm on the same sample had the same problem before I used strip() to reduce the size.

    Any tips?


    • Edited by Doktoren Wednesday, August 22, 2018 11:58 AM
    Wednesday, August 22, 2018 11:00 AM

All replies

  • You don't say which function you are using to fit the random forest model, but if you use rxDForest() in RevoScaleR their are control parameters you can set to control the fit and reduce the complexity of the tree that is grown, while keeping the same predictors in your model.

    I would try setting the 'maxDepth' parameter to a value of 15 or less. You can also try setting the value of 'minSplit' to determine how many observations must be in a node before a split is attempted.

    See this page for more information:

    https://docs.microsoft.com/en-us/machine-learning-server/r/how-to-revoscaler-decision-forest

    Hope this helps. 

    Wednesday, August 22, 2018 6:00 PM
  • You don't say which function you are using to fit the random forest model, but if you use rxDForest() in RevoScaleR their are control parameters you can set to control the fit and reduce the complexity of the tree that is grown, while keeping the same predictors in your model.

    I would try setting the 'maxDepth' parameter to a value of 15 or less. You can also try setting the value of 'minSplit' to determine how many observations must be in a node before a split is attempted.

    See this page for more information:

    https://docs.microsoft.com/en-us/machine-learning-server/r/how-to-revoscaler-decision-forest

    Hope this helps. 

    Thanks for your help!

    I was using the randomForest package, which obviously creates enormous models. 

    I did try with the rxFastForest and that worked, however miss some of the features from randomForest. E.g. the possibility to set class weights. (disclaimer: I might just not have found it yet). 

    But still interesting that mrsdeploy fails with big models? 

     
    Thursday, August 23, 2018 5:18 AM
  • Hello,

    Which version of Machine Learning Server are you using and which DB are you configuring (SQL, PostgreSQL or default sqllite)?

    Saturday, September 1, 2018 12:17 AM
  • I have the same problem. I have three large random forest models in my param.RData file, which I pass as a value to the model parameter in the publishService function. I get a status_code 500 back from back from the server in roughly 10 - 15 minutes. The error message contains no additional information.
    Thursday, October 11, 2018 4:05 PM