Wednesday, February 20, 2013 10:02 AMHi all,
The following is keeping me awake at night. I've got a sales DB with records for each date, store id, product id, volume and sales. There are ~300 stores, ~300 days and ~3,000 products sold per day, hence the database has ~270 million records, which are unsorted.
Since I'd like to run queries on a specific day, store and product ID, I'd like to create an index of this table on these columns. However, I've tried to create an index (both clustered & non-clustered) but I had to terminate after several hours (up to almost a day) because I didn't know how long it was going to take. I am looking for a better way of creating an index, which has less complexity. I think that my current complexity is n^2 for non-clustered and n log n for clustered indexes.
Does anyone have any tips?
Wednesday, February 20, 2013 10:13 AM
If you using SQL Server enterprise edition , then use ONLINE operation
Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker
Wednesday, February 20, 2013 10:26 AMAnswerer
If you specify all needed columns(values) in the WHERE condition how much data does it return?
If you use EE then yes create an index by using ONLINE clause, but as alternative I would suggest probably partition your table..
Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/
MS SQL optimization: MS SQL Development and Optimization
MS SQL Blog: Large scale of database and data cleansing
Remote DBA Services: Improves MS SQL Database Performance
Wednesday, February 20, 2013 10:40 AM
So the issue is the amount of time required to create this index...
If you are going to build nonclustered indexes.. then make sure that the clustered index is built first (should you choose to have one) - or else the nonclustereds will all have to be rebuilt as part of moving from, heap (16byte page key) to clustered (clustered key).
For your covering nonclustered index maybe you could consider a FILTERED index - which only includes a particular slice or the rows... of course this will only be helpful if you query also only requires that same particular slice of the rows...
There maybe a proportion of rows which are always eliminated by your query which you could reflect in your Filtered index...
Wednesday, February 20, 2013 10:59 AM
I think that my current complexity is n^2 for non-clustered and n log n for clustered indexes.
If the engine can create a clustered index in n*log(n), then there is no reason why creating a nonclustered index would take more.
You could create a few tables with different amount of rows and create indexes on them afterwards. For example a table with 1,000,000 rows, 2,000,000 rows, 4,000,000 rows and 8,000,000 rows. Based on the numbers you can extrapolate how long index creation on the big table will take.
When you time the index creation for these test tables, make sure that the table's data is not in cache, because I am quite sure that your 270 million row table will not entirely be in memory either.
- Proposed As Answer by Naomi NMicrosoft Community Contributor, Moderator Wednesday, February 20, 2013 1:22 PM
Wednesday, February 20, 2013 1:12 PM
Guys, thanks for your replies.
I'm assuming building a clustered index takes n log n time, as for each of n unsorted records, it takes log n steps to find the correct position in the ordered table.
As for non-clustered, I assume that for each record in a lookup table with size n, it takes on average 0.5n to find its position in the unordered table.
@LAF: I'm not using EE. Unfortunately I potentially require all data
@UD: combinations between date, store id and product id are unique (i.e., the table consist of the total volume and sales generated by each product, per store per day)
@GJS: thanks for the tip. I think that's the right way to make an estimation of the process time.
Wednesday, February 20, 2013 1:40 PMModerator
You can speed up index creation with fast, dedicated tempdb configuration:
Both tempdb data and tempdb log must be on dedicated fast devices.
Specify the SORT_IN_TEMPDB = ON option in CREATE INDEX:
BOL:"The intermediate sort results that are used to build the index are stored in tempdb. This may reduce the time required to create an index if tempdb is on a different set of disks than the user database. However, this increases the amount of disk space that is used during the index build."
Kalman Toth Database & OLAP Architect
Paperback / Kindle: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012