none
Facebook Prophet for R RRS feed

  • Question

  • Does anyone (or Microsoft) know if Microsoft's R Server can run Facebook's Prophet that they recently open sourced for R? 

    If so, does anyone have a guide on how to set it up so that it works both locally and remotely in SQL?

    I am new to R and just starting out, using the visual studio R Tools...

    Thanks,

    Jim

     

    Thursday, May 18, 2017 4:39 PM

All replies

  • Hi Jim!

    Yes you can! There are a couple of caveats though:

    • Prophet requires R 3.2.3 or above, SQL Server 2016 R Services runs on 3.2.2, so you either need to upgrade the R version, or run SQL Server 2017. That is of course if you want to run it inside SQL Server. If you have a fairly recent Microsoft R Server, you should be good.
    • You need to install Prophet from the CRAN repo - for some reason the Prophet package from MS repo is not working properly.
    • When running Prophet inside SQL Server, dates seem to be handled incorrectly (or at least output incorrectly).

    I have a blog-post about it with code etc., at: http://www.nielsberglund.com/2017/05/20/facebook-prophet-and-microsoft-r-server/

    Niels


    http://www.nielsberglund.com | @nielsberglund

    • Proposed as answer by nielsbMVP Sunday, May 21, 2017 6:28 PM
    Saturday, May 20, 2017 10:04 AM
  • Hi Niels,

    Excellent blog post. I was able to get everything running both in my visual studio tools for R, as well as from SQL Manager, running the commands, and everything works just like your article describes.

    I had SQL Server 2016 Standard edition SP1, and was able to update everything following along.

    Obviously, I am waiting to hear what you find out about the date issue, and data discrepancies when running within SQL Services.

    I also had a follow up question, if I want to use this data, in real time, from a c# or web client, how would I return the plots so that they can be returned from a stored procedure as well?  Any ideas?

    Again.. excellent post!

    Thanks for all the information so far!

    Jim

    Saturday, May 20, 2017 9:40 PM
  • Hi Jim!

    Thanks for the kind words re. the post!

    So, the problem with the dates seems to happen when the model is created: m <- prophet(df). When I chek the return values, the dates are all wrong. I do not undestand why this happens, s it works OK with the stab-alone installation of R Server. Anyay, I'll try and chek more.

    When yoou as about operationalize this, what do you want to do?

    Niels


    http://www.nielsberglund.com | @nielsberglund

    Sunday, May 21, 2017 6:32 PM
  • I plan on using this in production in two ways:

    1.) Run a periodic 'batch' operation that uses the R Prophet stored procedure to check each of our products that we manufacture with their sales history (which are all stored in SQL), to forecast the appropriate amounts of each product to manufacture for the next four weeks.   This data will feed into a .NET based application (either windows forms, or intranet web based application) to present the results.   This will also allow the 'business expert' the ability to override the number and add to it, if they know that some big new customers, etc. will be coming over the next 4 weeks.

    2.) Additionally, I hope to be able to allow the user to 'click through' on a product, to see the historical 'plots' that are generated like in the Prophet examples, and hope to generate those in real time, when requested, and display them.  Which is why I was asking about how I could take those plots, and actually return them from the stored procedure so that I could display them in my application, without needing to get the raw data and plot them all myself.

    I also thought of another question, as I was playing around further with this over the weekend.

    Prophet allows us to pass additional data (i.e. Holidays) into the analysis; but I don't see how I can have two separate queries as separate inputs within an R Server stored procedure so that I can query my business holidays table, and include and use them with prophet, along with the main data query?  Is there a way to do that?

    Did I answer your question?

    Thanks,

    Jim

    Monday, May 22, 2017 1:09 PM
  • Hey Jim,

    For the two first questions, let me think a bit, and come back.

    What you said about multiple input data sources, you are correct - only one is supported. You can though execute some code in your R script which would retrieve data from the db:

    sqlConnString <- "Driver=SQL Server;Server=.;Database=some_db;uid=some_uid;pwd=some_pwd"
    
    sqlComputeTrace <- RxInSqlServer(connectionString = sqlConnString)  
    
    rxSetComputeContext( sqlComputeTrace)
    
    ds <- RxSqlServerData(connectionString=sqlConnString, 
    sqlQuery="SELECT * FROM rsome_table")

    That should allow you to get whatever data you need.

    Niels


    http://www.nielsberglund.com | @nielsberglund

    Tuesday, May 23, 2017 1:51 AM
  • OK.. thanks!

    I have run into a bit of a problem; after a couple of days, the launchpad service would't start - I kept getting a timeout error.

    After a bunch of reading and research, this could mean that somehow something was installed that through the standalone version out of sync with the in-sql-version.

    But, I have been having difficulty getting the KB4019916 to run again as a 'reinstall' against my SQL Server 2016 Server.  

    Do you have any ideas, before I move forward and completely remove SQl from my desktop, and reinstall it from scratch?

    Wednesday, May 24, 2017 7:08 PM
  • Not good :(

    I have no idea, but perhaps the log files can tell you more:

    1. Stop the launchpad service (it probably already is stopped)
    2. Open the rlauncher.config file and change TRACEVERSION to 4 (do it as admin)
    3. Delete all files in: "C:\sql_instance_patch>\MSSQL\Log\ExtensibilityLog"
    4. Restart the launchpad service
    5. Does the logfiles (created in the path above) tell you anything

    I wrote about the config and logfiles in my Internals - III post, so if you need more info about that, please have a read.

    Sorry I can't come p with anything else, but please let us know what you find.

    Niels


    http://www.nielsberglund.com | @nielsberglund

    Thursday, May 25, 2017 3:33 AM
  • Jim,

    There are a few different things that can lead to the Launchpad service not starting so it's hard to say exactly what your problem is.

    Here are a few tips that I've had to use when I had configuration issues.

    1) Check Windows Event Viewer for any details about why Launchpad wasn't starting

    2) Try and start the Launchpad service manually under your personal account; this is a good way to get a more detailed error message. Start -> Run -> services.msc -> SQL Server Launchpad -> Properties -> Copy the "Path to executable" and run this in an admin command prompt window.

    3) It's possible that you have group policy restrictions in place getting in the way. I'm not an expert in group policy but the launchpad service account needs 4 permissions listed here. In particular you need "Log on as a service" for NT SERVICE\MSSQLLaunchpad.

    3a) You might be able to temporarily "fix" the NT SERVICE account permissions by doing the following.

    Open Sql Server Configuration Manager -> Open Right click Properties on the SQL Server Launchpad Service -> Change the "Log on as" dropdown to "Local Serivce" and click Apply

    Now, reset the Account name field to NT SERVICE\MSSQLLaunchpad and do not include a password and click Apply again. I think this will basically reconfigure Launchpad permissions for the NT SERVICE\MSSQLLaunchpad account.

    Good luck!

    Bob

    Thursday, May 25, 2017 9:11 PM
  • Neils,

    I tried using method that you show above to get the list of holidays / special event days per product, and load into the Holidays area within prophet, and it works - but it is very slow.

    When I had code the holidays entry, as in the examples on the Prophet pages, the entire process runs in under 3 seconds to generate the forecast using all my historical data.

    When I use the method outlined above, I correctly retrieve and use the data, and get the same results, but the process runs for 30 seconds.

    Why does using the Rx and rx functions add so much overhead to the run time, and what can I do to get this back down to a reasonable level using these functions?

    Thanks,

    Jim

    Friday, December 1, 2017 4:46 PM
  • I should further qualify, the sql retrieved holidays, running at 30 seconds isn't slow in itself, but that is for one part number, and we I will have over 500 products I need to run this for, at 30 seconds per part, that is a significant amount of time.  If it was running at 3 seconds per product, it is easy to see how much better that would be!

    Thanks,

    Jim

    Friday, December 1, 2017 5:53 PM
  • Jim,

    The code you're running - was that from the Prophet pages? I'll grab it and see what I can find out.

    Niels


    http://www.nielsberglund.com | @nielsberglund

    Saturday, December 2, 2017 4:36 AM
  • Niels,

    The example of the hard coded history was.  I had 52 entries I placed in a sql database that has two columns, a Holiday field (which is a date field) and a description field which has the name of the business holiday (think New Years Eve, New Years Day, Christmas Eve, Christmas Day, Day before Thanksgiving, Thanksgiving, etc.).

    I then tried one version using your example above, which use the Rx functions to read this table into the R script, and pass the records to the Holidays function in Prophet.

    I then ran this in Sql Management Studio v17.3

    I then changed the code to not use any of the Rx functions, but instead hardcoded the holidays entries, as shown on this page:  

    https://facebook.github.io/prophet/docs/seasonality_and_holiday_effects.html

    like this:

      'ds': pd.to_datetime(['2008-01-13', '2009-01-03', '2010-01-16',
                            '2010-01-24', '2010-02-07', '2011-01-08',
                            '2013-01-12', '2014-01-12', '2014-01-19',
                            '2014-02-02', '2015-01-11', '2016-01-17',
                            '2016-01-24', '2016-02-07']),


    (note, those aren't the actual dates, just how they example shows to have them in the settings for the holidays)

    I used a lower_window of 0, and an upper window of 0 along with the holidays.

    When I run the hardcoded version, processing the results on 7 years of data runs in 3 seconds.  When I use the Rx functions to retrieve the data directly from the SQL database and use that in the history settings, it takes 30 seconds to run.

    I am struggling in understanding what is taking so long when I use the Rx functions from within R in Sql.

    Also, note that I created a stored procedure, and am actually running the stored procedure from the SQL management studio when I do these timings.

    Does that help?

    Thanks,

    Jim

    Monday, December 4, 2017 2:14 PM
  • One more thing, I am also now using SQL 2017.

    Thanks,

    Jim

    Monday, December 4, 2017 6:20 PM
  • Thank Jim!

    That helps a lot!!

    Niels


    http://www.nielsberglund.com | @nielsberglund

    Tuesday, December 5, 2017 6:26 AM
  • I thought I would include my entire stored procedure, so you see everything exactly as I have it (currently running in sql development server 2017, but will be running on standard edition 2017 in production):

    ALTER PROCEDURE [dbo].[PartForecast]
    @PartNumber nvarchar(30)
    AS
    BEGIN

    EXEC sp_execute_external_script @language = N'R'
        , @script = N'
    # @InputDataSet: input data frame, result of SQL query execution
    # @OutputDataSet: data frame to pass back to SQL

    # load prophet
    library(prophet)

    # load dplyr, needed for mutate()
    library(dplyr)

    # Set up the database connection for script database reads
    conn <- "Driver=SQL Server;
    Server=.;
        Database=Test;
        Trusted_Connection=True;"

    query <- "SELECT convert(varchar(10), Holiday, 101) as Holiday FROM BusinessHolidays;"

    bHolidays <- RxSqlServerData(connectionString = conn, sqlQuery = query, rowsPerRead = 100)

    bcHolidays <- rxDataStep(inData = bHolidays)

    print(bcHolidays)

    # set up the business holidays
    holidays <- data_frame(
    holiday = "closed",
    ds = as.Date(bcHolidays$Holiday, format="%m/%d/%y"),
    lower_window = 0,
    upper_window = 0
    )

    # read in the SQL data from input_data_1 parameter
    df <- InputDataSet

    # create the model
    m <- prophet(df, interval.width = 0.80, uncertainty.samples = 0, holidays=holidays)

    # create a dataframe to fit a forecast into
    future <- make_future_dataframe(m, periods = 12, freq=''week'')

    # do the forecast
    forecast <- predict(m, future)

    # print out some data
    OutputDataSet <- tail(forecast[c(''ds'', ''yhat'', ''yhat_lower'', ''yhat_upper'')], 12)

    # R
    #plot(m, forecast)

    # R
    #prophet_plot_components(m, forecast)
    '
        , @input_data_1 = N'-- Place SQL query retrieving data for the R stored procedure here
    select ShipDate as ds, Quantity as y
    from PartSalesWeeklyFacts
    where PRTNUM = @Part'
    , @params = N'@Part nvarchar(30)'
    , @Part = @PartNumber 
    --- Edit this line to handle the output data frame.
        WITH RESULT SETS ((ds DateTime, yhat float, yhat_lower float, yhat_upper float));
    END;


    Wednesday, December 6, 2017 4:10 PM
  • Thanks for that Jim!! At the moment I am overseas, and haven't had time to look into it yet. I'll tryto have a look this weekend.

    Niels


    http://www.nielsberglund.com | @nielsberglund

    Thursday, December 7, 2017 2:59 PM
  • If you need any more information, let me know...

    Thanks,

    Jim

    Tuesday, December 12, 2017 5:58 PM
  • I took my sql stored procedure, and added some timings in it to see what statement, exactly, is causing the delay in running the stored procedure, and it turns out it is the:

    bcHolidays <- rxDataStep(inData = bHolidays)

    I have tried converting this to using rxImport, etc., without any success.  The interesting thing is that when I run the stored procedure using the rxImport function instead of the rxDataStep, I get a chunk message saying the data was read in .001 seconds; but the step that does the rxImport still takes over 17 seconds to complete, and i have no idea why.

    I thought maybe it was the dates that were causing an issue (i.e. converting slow or something) so I tried adding various date handling routines, like added the lubridate library, and tried using the ymd function to convert the date, and that didn't help at all.

    Niels, have you had a change to take a look at this any more?

    I am even trying to convert it, and try it in python script to see if maybe that will run faster (no success in getting it set up yet).

    Not quite sure what to try next!

    Thanks,

    Jim

    Wednesday, December 20, 2017 3:08 PM
  • Hey Jim,

    Would you by any chance have an insert script for your table with holidays?

    Niels


    http://www.nielsberglund.com | @nielsberglund

    Saturday, December 23, 2017 4:58 AM
  • Hey Jim,

    I created a table with holidays and in ran your select code together with the rxDataStep part. For me the rxDataStep executes at between 2 milliseconds and 10 milliseconds. My holiday table have 90 rows.

    Niels


    http://www.nielsberglund.com | @nielsberglund

    Saturday, December 23, 2017 6:01 AM