none
what exactly is index rebuild doing to my column store clustered index values?

    Question

  • Hi. I'm looking at our maint job for indexes.  Its a lot of code but I think its determining how "fragged" each index is before applying maintenance to that index.

    From what I can tell the possible commands are index rebuild and index reorg.  I'm assuming they manipulate the order of values in each "vertical table" analogous to the way a conventional index's order would be re built.

    But what is that order say on 5 columns?  let's say the 1st column is a unique key.  So they probably order that column first.  So what happens to the remaining 4 columns.  Are they ordered independently of the first column?  Or does sql try to keep them aligned with the first column regardless of their values?     

      
    • Edited by db042190 Tuesday, July 3, 2018 11:10 AM focus on rebuild to keep it simple
    Tuesday, July 3, 2018 11:03 AM

All replies

  • https://blog.sqlauthority.com/2007/12/22/sql-server-difference-between-index-rebuild-and-index-reorganize-explained-with-t-sql-script/

    https://docs.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes?view=sql-server-2017

     All rebuilding the index does is lay the pages back down.  The rows may move around on the clustered index pages

    If you  rebuild CI  none of NCI are affected 


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Tuesday, July 3, 2018 11:13 AM
    Answerer
  • Are you talking about Ola Hallengren script ? Can you put the part which seems confusing to you in the question. I have no idea about the script

    Cheers,

    Shashank

    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it

    My TechNet Wiki Articles

    MVP

    Tuesday, July 3, 2018 11:15 AM
    Moderator
  • thx Uri, I already read that and pretty much get it.  But in a clustered cs index I'm wondering about the order sql tries to achieve in an index rebuild. 

    Im assuming sql has the flexibility to treat each column's order independently of every other in a clustered cs index, maintaining pointers to keep relationships to the same "row" intact.

    Tuesday, July 3, 2018 11:20 AM
  • Just moving data around the index pages , probably start from the first index in the clustered index 

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Tuesday, July 3, 2018 11:26 AM
    Answerer
  • thx shashank.  with this question I'm assuming an index rebuild gets run against the clustered cs index via that script which makes its decisions based on dmv/canned views like queries and not really worried about understanding the script.  I'm sure it determines how fragmented an index is (if at all) before taking an action.  

    I'm really curious what a rebuild does order wise to my columns.  If sql has to keep the 2-4 th columns aligned order wise with my unique key (column 1) and with the order it chose to apply to column 1, you can imagine there would be no use unless the 1st column were something less granular.  If on the other hand each column's order is independent of every other's, I can move forward in my quest to better understand this technology and perhaps make better decisions on how (if at all) to use it. 

    Tuesday, July 3, 2018 11:37 AM
  • thx, "first index in the clustered index"?  R u suggesting each column is like an index within the clustered cs index?  If yes, does each have its own independent order?
    Tuesday, July 3, 2018 1:05 PM
  • No I   meant \guess that it starts from the first column that is part of the CI 

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence


    Tuesday, July 3, 2018 1:11 PM
    Answerer
  • I just spoke with one of our dba's (just in passing) who I think is admitting he's not an expert.  he feels that the compression cs is famous for is horizontal not vertical and re occurs in index rebuilds.  And he feels that the unique key I mentioned shouldn't matter where it falls ordinal "ly". 

    I told him that I thought the whole performance gain in cs was in vertical compression which leverages redundant data allowing elimination to occur in dramatic fashion. 

    But if he's right on the horizontal compression thing, then I'm thinking an index rebuild probably doesn't help cs performance in a big way.    

    Tuesday, July 3, 2018 7:27 PM
  • Dear db042190,

    Actually, SQL Server uses the VertiPaq in-memory columnstore engine to create Columnstore Indexes. Columnstore indexes work well for mostly read-only queries that perform analysis on large data sets. For traditional table, dataset will be stored in disk, but for Columnstore indexes, the dataset is stored in memory, as a result, the compression will be used to reduce the usage of memory.

    >> I told him that I thought the whole performance gain in cs was in vertical compression 

    I agree with this, the vertical compression can reduce the memory usage of each column. There are many method to compress the column data, for example, value encoding, dictionary encoding and Run Length Encoding.

    >> you can imagine there would be no use unless the 1st column were something less granular

    Generally, this is true, Columnstore Index benefits Columns with more similar data, which results in high compression rates, but as mentioned above, there are more than one method to compress data. 

    >> R u suggesting each column is like an index within the clustered cs index?  If yes, does each have its own independent order?

    No, each column will use the same order, all the columns in the same table are sorted in the same way because, at some point during the querying, VertiPaq might have to match different columns for the same row. When we create the index, the engine will read all the table, it tries different sort orders to find a best way to compress the dataset, then each column will be compressed with the same order.

    Best Regards,
    Teige


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    Wednesday, July 4, 2018 5:48 AM
  • http://www.nikoport.com/2018/06/21/columnstore-indexes-part-123-clustered-columnstore-index-online-rebuild/

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Wednesday, July 4, 2018 5:51 AM
    Answerer
  • Thx Teige.  I'm disappointed about the sort order thing, but not surprised and I think one of the white papers I read a while ago eluded to what you say. 

    If we assume this table can fit into memory, does it reload into memory every time a query is run against it?  Or kind of as sql needs space etc?  If its as needed is the load time significant and perhaps part of the "start up" we hear about?  

    Also, the comment about the index is very interesting to me.   Since the index rebuild presumably has no knowledge of our most common predicates, and because with 100 columns the rebuild would likely find "best columns" to sort on that have no relevance to the app, is there a way we can tell sql at index rebuild time (and maybe during inserts) which cols to use for sort order that best helps the app?   

    I'm already thinking the most important date / time col needs to be split into date and time separately.  Its probably the most important filter of the bunch.  

    As I think about advice I've gotten in these forums and things I've read, I don't think column store is in memory unless you direct sql to make it so.  I've always thought that while the engine is similar to vertipac, it isn't exactly vertipac. 

    • Edited by db042190 Thursday, July 5, 2018 2:22 PM questioning answerer
    Thursday, July 5, 2018 11:54 AM
  • Hi db042190,

    I did some further researching on this, the columnstore index is a tool providing a very high level of data compression for data warehouse storage, it can reducing the memory footprint but is not stored in memory fully.

    The compression engine is similar to vertipac but other part is different. I would suggest you referring to the article mentioned by Uri Dimant.

    Best Regards,

    Teige


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, July 6, 2018 7:18 AM
  • thx, can we tell sql server which col(s) is/are the best for sorting at index rebuild time?  There is no way it would pick the best.  And I hate to say this but I think different cols would be best for different parts of the app.  And as I recall (I have to double check) only one nc cs index is allowed per table. 
    Monday, July 9, 2018 11:14 AM
  • Teige, this article implies your theory that the physical order of cols has to align isn't true.   And it resonates better with me if its true that the query predicate determines which cols are processed first. 

    https://logicalread.com/sql-server-columnstore-index-w02/#.W0c3mJWWzVg

    Thursday, July 12, 2018 2:48 PM