none
Prediction producing really low numbers ... RRS feed

  • Question

  • Hey Folks: 

    I am new to SQL Server Machine Learning.  (I have completed the Microsoft certificate in Data Science.  That Cert did not touch SQL Server - so I'm pursuing that on my own.  My immediate goal is to do a User Group presentation - that stick keeping me on task.) 

    I am using AdventureWorks2012 - and am building a simple classification model - do I think a new customer is likely to buy a bike.  I build a model in Visual Studio, tuned it, then moved it to a stored procedure.  The R code in the stored procedure is: 

    PurchasedBikeModel = glm(

                                                   PurchasedBike ~

                                                           Gender + Age + Education + MaritalStatus + HomeOwnerFlag + CommunityKey +

                                                           CardType + LowSalaryRange + HighSalaryRange,

                                                     family = binomial, data = CustomerBike

                                            )

    TrainedModel = as.raw(serialize(PurchasedBikeModel, NULL))

    CustomerBike$Probability = predict(PurchasedBikeModel, newdata = CustomerBike, type = "response")

    OutputDataSet = data.frame(CustomerBike)


    Notice I build the model and serialize it to return the model to the store procedure where it is saved to a table.  I then take the TrainingData and score it - and dump that to a result set.  In the result set (after training) the scoring looks realistic.  Numbers like .4537, .2549, .8461 etc. 

    So - on to my prediction stored procedure.  In that - I am excepting feature values a parameters and building a select 'M' as Gender .... type query.     The R is very simple: 

                               Features <- InputDataSet

                               model <- unserialize(as.raw(model))                   

                    print(summary(model))

                               Prediction <- predict (

                                        model,

                                        newdata = Features,

                                        type = "response"

                                    )

                               '

    Notice I am printing a summary of the model after I unserialize it.  That summary looks realistic.  Based on that - I'm concluding the handling of the model was done properly - but again I am new.  A fumble is likely.

    My call to the code is also simple ...

        exec sp_execute_external_script

                 @language = N'R',

                 @script = @RCode,

                 @input_data_1 =  @Query,

                 @params = N'@model varbinary(max), @Prediction float output',

                 @model = @Model,

                 @Prediction = @Prediction out

    Everything in my scenario occurs as I expect it - with one exception.  I do actually get a Prediction but it is an extremely small number - not what I would expect - examples 0.00000756.  Nothing that looks anything like a percentage.  I can change the parameters to the call and the number will change - but still be of 1.0*e-6 variety. 

    Any ideas??  I'd love to get beyond this one. 

    Thank You!!



    John

    Monday, December 17, 2018 7:40 PM

All replies

  • My first thought is have you looked at the summary() output from the model to see if the prediction makes sense on the newdata?  I would take a look at the coefficient estimates and try computing the predictions by hand for the new data, to see if this prediction makes sense or not.
    Tuesday, December 18, 2018 6:49 PM