Size of table data impacts data mining performance?


  • I've had great success with using the association algorithm with data mining over the previous couple of years. I have a typical market basket analysis going on.

    We decided to explore some new customer data which has about 4x more input than our previous model (40m vs 10m rows from the database) with the same type of discrete text data. Due to the nature of the data, the number of resulting itemsets and rules is about the same for the new model and the one that has been in place even with the increase in rows.

    Everything is fine and performance seems normal in data tools. However, whenever I run a simple DMX query, the performance is about 2000% slower. The current model takes about 200ms to return results while the new one takes about 22 seconds. When I look at the size on disk, the new one is 600mb while the old one is 500ms.

    Through experimentation, I find that anything after 1.5m rows of input starts to slow it down. With 1.5m rows processing is very quick. (1 min)

    Is there any idea what could be causing such terrible query performance?

    Friday, June 14, 2013 1:14 PM

All replies