none
DMX Query, the most influential variable

    Question

  • Yours sincerely.

    I'm doing work of data mining using decision trees, and I have the expected result: Predicted probability of dropping out, however I would like the result, by DMX query, throw me the most influential variable in the above probability .

    For example if Marco, has a 90% chance of dropping out, tell me what was the most influential variable for this probability, which can be Sex, Age, City, etc..

    I have in my data mining model, named Tree, variables such as sex, age, marital status, district, etc., and my view of the data, Vista_Datos, fields such as customer name, phone, email, Prov.; so my query resulted DMX follows.

    Select
       (T. [CustomerName]) as [Customer]
      (T. [date]) as [Date of Appointment]
      t. [Department],
      t. [Province]
      t. [District]
      (T. [address]) as [Address],
      (T. [telephone]) as [Phone Fixed]
      (T. [Cell_Phone]) as [Cell Phone]
      (T. [email]) as [E],
      PredictProbability ([Tree]. [Desertion], 0) * 100 as [Probability of Defection]
    From
      [Tree]
    PREDICTION JOIN
      OPENQUERY ([Source Mining]
        'SELECT
          [CustomerName]
          [Date],
          [Department],
          [Province]
          [District]
          [Street address]
          [telephone]
          [Cell_Phone]
          [Email]
        FROM
          [Dbo]. [Vista_Datos]
        ') AS t
    ON
      [Tree]. [Department] = t. [Department] AND
      [Tree]. [Province] = t. [Province] AND
      [Tree]. [District] = t. [District] AND
    where (PredictProbability ([Tree]. [Desertion], 0) * 100)> 50

    I need to modify the query to get the most influential variable in the prediction result.

    I hope I have been clear so they can help me.

    Thank you for your attention.

      
    Thursday, July 10, 2014 2:27 PM

Answers

  • select * from [Model_Name].content

    returns you all the information about the model (all the nodes in the tree).

    select NODE_UNIQUE_NAME from [Model_Name].content where NODE_TYPE=2

    returns NODE_UNIQUE_NAME of the root node. If you want to find all children of the root node (all the first splits) you can execute query like

    select * from [Model_Name].content where [PARENT_UNIQUE_NAME]='000000004'

    where '000000004' is NODE_UNIQUE_NAME of the root node.


    Tatyana Yakushev [PredixionSoftware.com]

    Saturday, July 12, 2014 1:02 AM

All replies

  • The most influential variable in the tree is in the first split in the tree. It is the same for all rows. Do you need DMX query to get tree splits?


    Tatyana Yakushev [PredixionSoftware.com]

    Thursday, July 10, 2014 5:02 PM
  • Yes please, thank you very much.
    Thursday, July 10, 2014 9:44 PM
  • select * from [Model_Name].content

    returns you all the information about the model (all the nodes in the tree).

    select NODE_UNIQUE_NAME from [Model_Name].content where NODE_TYPE=2

    returns NODE_UNIQUE_NAME of the root node. If you want to find all children of the root node (all the first splits) you can execute query like

    select * from [Model_Name].content where [PARENT_UNIQUE_NAME]='000000004'

    where '000000004' is NODE_UNIQUE_NAME of the root node.


    Tatyana Yakushev [PredixionSoftware.com]

    Saturday, July 12, 2014 1:02 AM