2006년 4월 21일 금요일 오전 11:53
Hi, all here, I found that in my case when I trained the data mining models, the model cover rate is very low (in my case, the train data set has 82 rows but the case occuring in the models I trained is only 25). How can I improve the cover rate to improve the quality of the models? (if it is possible in SQL Server 2005) I am using SQL Server 2005.
2006년 4월 23일 일요일 오후 3:14Please explain what you mean by "cover rate" and what you are trying to accomplish. 82 rows is a pretty small data set in general, especially if you hae many attributes.
2006년 4월 23일 일요일 오후 3:46
Hi, yes. my training data set is quite small which is some university marks data for analysis.
my inputs for the data set are like: mark1, mark 2, marker1, marker2, then the output which i wanna predict is the agreed mark based on mark 1 and mark2 marked by marker1 and marker2 respectively..
In order to make the classification task easily I have discretized the continous mark to be categorical values.
In my data analysis case, the "cover rate" I mean is the cases covered by the mining model. Cos the training data set actually got 82 rows but the mining model just covered 25 cases of it.
Hope this explanation is clear for your help.
Thanks a lot.
2006년 4월 26일 수요일 오전 12:47Do you mean it accurately classified 25 cases?
2006년 4월 26일 수요일 오전 8:11
Hi, Jamie, the number is what all the models covered in the trainings. Like I have 82 rows in the trainning set, for example, clustering model only grouped 25 cases (rows) into clusters.
Thanks a lot.
2006년 4월 26일 수요일 오후 6:21I'm not sure what's going on here - I think what you are saying is that the cluster support of all clusters when added together is only 25 when it should be 82. They should all be accounted for. You can always go to the prediction query tab and do a prediction query for Cluster() against your training data. You may also want to check the cluster probability as well. If the cluster probability is something very close to 1/# clusters, then it's likely the clusters aren't very strong. This isn't unexpected for such a small data set, but I'm not sure why you're only getting 25 altogether.
2006년 4월 27일 목요일 오전 8:59
Hi, Jamie, is it that the model wont cover all the records during its training process?
Thanks a lot
2006년 4월 28일 금요일 오전 4:59No - it covers all of the data. If you can post the 82 rows we can take a look at it here.
2006년 4월 28일 금요일 오전 10:08
Hi, Jamie, below is my training data for the mining models building.
The first column is about the data attributes names.
Thanks a lot for that.
2006년 4월 28일 금요일 오후 5:39
Does you data have a key column? The algorithms need a unique key column to determine that each row is a case. The "Agreed" column has 25 distinct values, so it looks like that is what you are using for your key.
I think you need to add an additional "ID" column to uniquely identify each row.
2006년 5월 2일 화요일 오후 3:29
Hi, Jamie, thanks a lot. Got it done as your suggestion.
Yes, the problem is that the column I used as the key column is only with 25 distinct values which resulted in only 25 cases were covered by the training model.
Thanks a lot.