locked
How to deal with too many columns ? RRS feed

  • Question

  • Hi !

    1. I have a excel table containing around of 300 or more, columns for each row .

    2. I read the data and I need to store each row to MS SQL DB .

    3. each row has a unique key .

    4. IMPORTANT !! : The 300  columns can be divided functionally ! (lets say 1 to 20 , 21 -50, 51 - 70, etc. .....).

    Which is the best way to design the DB ? One table or few tables with the same primary key ?

     

    Thanks !!  

    Tuesday, September 20, 2011 2:50 PM

Answers

  • It really depends on the nature of the system, queries and other factors. If you store everything in one table, you'd make your data row quite wide. It could make scans inefficient (bigger data row -> less rows per page -> more data pages -> more IO operations during the scans). Typically it's not an issue when you have your queries optimized with non-clustered indexes and don't have scans. Also, don't forget about 8060 bytes data page size limitation (row should fit in the data page) - be careful with fixed width types

    Same time, if you split your data across multiple tables, you'd have to pay performance penalty because of the joins if you need to select all 300 attributes. You'll also have some overhead in the code to support data integrity.

    Again, analyze your queries and think about the implications. Also, if most part of the attributes are not populated, think about SPARSE columns


    Thank you!

    My blog: http://aboutsqlserver.com

    • Proposed as answer by Sandeep Dasam Wednesday, September 21, 2011 11:00 AM
    • Marked as answer by Peja TaoEditor Tuesday, September 27, 2011 8:59 AM
    Tuesday, September 20, 2011 4:12 PM

All replies

  • It really depends on the nature of the system, queries and other factors. If you store everything in one table, you'd make your data row quite wide. It could make scans inefficient (bigger data row -> less rows per page -> more data pages -> more IO operations during the scans). Typically it's not an issue when you have your queries optimized with non-clustered indexes and don't have scans. Also, don't forget about 8060 bytes data page size limitation (row should fit in the data page) - be careful with fixed width types

    Same time, if you split your data across multiple tables, you'd have to pay performance penalty because of the joins if you need to select all 300 attributes. You'll also have some overhead in the code to support data integrity.

    Again, analyze your queries and think about the implications. Also, if most part of the attributes are not populated, think about SPARSE columns


    Thank you!

    My blog: http://aboutsqlserver.com

    • Proposed as answer by Sandeep Dasam Wednesday, September 21, 2011 11:00 AM
    • Marked as answer by Peja TaoEditor Tuesday, September 27, 2011 8:59 AM
    Tuesday, September 20, 2011 4:12 PM
  • To add on to the comments by Dmitri

    Sparse columns is the best option if you have too many nulls in the table. If have data in all the rows and columns.. Consider some of the points below.

    You can evaluate your queries on this table and Seperate out the less frequently used columns (Like text descriptions etc.,) and redundant columns into seperate table.

    Wednesday, September 21, 2011 11:04 AM
  • Are all the columns always used? When you say they can be divided functionally, is that because some rows use some columns and some use others?

    An example of the data and rows should be helpful.

    Wednesday, September 21, 2011 12:38 PM
    Answerer