I am quite unfamiliar with using SSAS as of the moment. I am aware of the classical implementation of Naive Bayes. I have learned about it from the
here. However what I am looking for is a complete walkthrough of how to use this particular algorithm with SSAS.
For simplicity let me assume we are supposed to classify a news write-up as having positive criticism or negative criticism. So for the positive articles we can observe words like
good, awesome, super, recommended, love, like, etc occuring frequently. For negative articles we can observe words like
bad, poor, unsatisfactory, unsatisfied, pathetic, etc mostly. There are only two possible outcomes (positive
or negative), hence, generalizing on patterns is fairly simple.
To start with we have a few write-ups with their corresponding outcomes, which are
mostly in accordance with patterns we've generalized above. If we were to do this without the help of a data mining tool, we would do the following:
Take the first write-up (assume this one is a positive article)
We'd first split the whole write-ups into words.
Remove the stopwords in them, like the, this, that, etc. (Words meant to provide a grammatical structure to the write-up but they occur frequently hence get rid of them). We get a corpus of words now.
This corpus is assigned to the outcome positive. We simply note the frequency of how many time positive appears, and also the frequency of the individual words tending to give outcome
The next write-up is taken. (Assume this one to be negative).
Steps 2-5 is repeated and the particular frequencies are updated each time.
So once we have looked into all documents, we can actually prepare test cases.
In accordance with the formula above, nc is the no.of times
good actually give the outcome positive. p is a prior estimate (=0.5 since only 2 outcomes), and
n is no.of time positive outcome appears in our corpus.
How can I use SSAS to and go about verifying these kind of test cases manually?
I am a bundle of mistakes intertwined together with good intentions
Edited bydeostrollSaturday, June 29, 2013 4:48 AMformula explanation
Microsoft is conducting an online survey to understand your opinion of the Technet Web site. If you choose to participate, the online survey will be presented to you when you leave the Technet Web site.