none
Why designers add an ID column for almost every table?

    General discussion

  • Hi friends,

    I have seen people do this, assume I have created a table called products ( for a supermarket system) some design it

    Column Name         Data type
    =============================
    ProductID(PK)            int
    SKU                  char(10)

    some design it as

    Column Name     Data type
    =========================
    ProductID       int (PK)
    SKU             int
    .....

    some design it as

    ColumnName      Data Type
    ==========================
    SKU(PK)             int
    ProductName         char(20)
    .......

    Typically SKU is always unique, But why people still use ProductID which is nothing but a column of int values set with an auto increment starts from 1?

    Is this some kind of the DB search speed up thing?

    What is the recommended way to designing tables like this?

    thanks


    I use Visual studio 2012 Ultimate and SQL server 2008 developer edition!



    I use Visual studio 2012 Ultimate and SQL server 2008 developer edition!


    Wednesday, December 04, 2013 3:34 AM

All replies

  • Hi friends,

    I have seen people do this, assume I have created a table called products ( for a supermarket system) some design it

    Column Name         Data type
    =============================
    ProductID(PK)            int
    SKU                  char(10)
    

    some design it as

    Column Name     Data type
    =========================
    ProductID       int (PK)
    SKU             int
    .....

    some design it as

    ColumnName      Data Type
    ==========================
    SKU(PK)             int
    ProductName         char(20)
    .......

    Typically SKU is always unique, But why people still use ProductID which is nothing but a column of int values set with an auto increment starts from 1?

    Is this some kind of the DB search speed up thing?

    What is the recommended way to designing tables like this?

    thanks


    I use Visual studio 2012 Ultimate and SQL server 2008 developer edition!

    Wednesday, December 04, 2013 3:20 AM
  • Check this discussion and especially few last answers and references

    http://beyondrelational.com/quiz/sqlserver/general/2010/questions/sqlserver-quiz-general-2010-madhu-k-nair-surrogate-key-vs-natural-key.aspx


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Wednesday, December 04, 2013 3:51 AM
  • Hi,

    Yes, using identity columns helps with the performance as the keys thus created as compact. Please refer to the following links for more detailed discussion on this topic.

    https://www.simple-talk.com/sql/t-sql-programming/identity-columns/

    http://social.msdn.microsoft.com/Forums/sqlserver/en-US/1374322a-3c58-4738-8ac0-d8072a077c02/how-should-an-identity-column-be-used?forum=transactsql


    Vatsalya - MSFT The views and opinions expressed herein are those of the author and do not necessarily reflect the views of Microsoft.

    Wednesday, December 04, 2013 3:55 AM
  • Identity columns can be used for generating key values. The identity property on a column guarantees the following:

    • Each new value is generated based on the current seed & increment.

    • Each new value for a particular transaction is different from other concurrent transactions on the table.


    Many Thanks & Best Regards, Hua Min

    Wednesday, December 04, 2013 4:07 AM
  • If SKU is unique. You can replace ProductID with SKU. A unique key can have NULL values however a Primary Key can not.

    http://www.dotnet-tricks.com/Tutorial/sqlserver/V2bS260912-Difference-between-Primary-Key-and-Unique-Key.html

    SKU is not a user friendly name whereas ProductID is.


    Regards, RSingh


    Wednesday, December 04, 2013 4:59 AM
  • One reason is when there isnt any other Unique column in the table ( RSingh already explained it).

    Another reason is replacement/alternative for Primary Key ( mostly to reduce CI size).

    In the first example ,SKU is char(10), so to replace it with 4 byte value, the designers may chose another column with int data type.


    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Wednesday, December 04, 2013 5:44 AM
  • That's a question that could result in a very long thread with the right (or wrong, depending how you look at it) people involved.

    Generally, when you design your tables, you should strive to use natural keys that uniquely identifies the object. Thus, if you have identified that SKU is unique, then is what you should use for your primary key.

    Why would you not? In your example SKU is char(10) and an integer takes up less space and is more efficient. So some people would add a surrogate key which is an integer. Then you would use that id in your OrderDetails table etc. This is alright, as long as you also have a UNIQUE constraint on the SKU column, so that you don't mistakenly add the same product twice.

    Sometimes it is difficult to find something that uniquely identifies an entity, or what you find is unpractical to use. In this case, you can't do but to use only the surrogate key and hope that duplicates can be avoided or sorted out manually.

    People who are unexperienced with database design tend to design their tables with an IDENTITY column as the primary key and all other columns nullable. They have no understanding about data integrity and the risks such a design come with. IDENTITY itself is just one mechanism to generate these surrogate keys. It's good when you expect many concurrent inserts, but else you can just as well roll your own. Or on SQL 2012, use sequences instead. Sequences give you same benefits as IDENTITY, but is overall cleaner.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Wednesday, December 04, 2013 8:21 AM
  • It is also possible that a designer wanted using a surrogate keys ... Read Tony article

    http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/05/11/427.aspx


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Wednesday, December 04, 2013 11:06 AM
  • I would design it like follows:

    CREATE TABLE Product (
    ProductID   int IDENTITY(1,1) PRIMARY KEY,
    SKU         char(17) NOT NULL UNIQUE   -- Size from Joe Celko's book
    );

    Starting with SQL Server 2012 SEQUENCE object is available instead of IDENTITY.

    The biggest advantages of SURROGATE PRIMARY KEYs are fast in JOINs and meaningless numbers.

    PRIMARY KEY is the wrong name for SURROGATEs. They should be called LINKING KEY or SURROGATE KEY.

    In my mind SKU is still the "PRIMARY" KEY even though I declared it the UNIQUE KEY.

    Ideal design which would be less confusing (this does not work, just a wish):

    CREATE TABLE Product (
    ProductID   int IDENTITY(1,1) SURROGATE KEY,
    SKU         char(17) PRIMARY KEY   -- Size following Joe Celko's book
    );


    Kalman Toth Database & OLAP Architect IPAD SELECT Query Video Tutorial 3.5 Hours
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012



    Wednesday, December 04, 2013 12:38 PM
  • In my mind SKU is still the "PRIMARY" KEY even though I declared it the UNIQUE KEY.

    I am not sure that I agree with adding SURROGATE KEY to the syntax, but I agree with the sentiment that SKU being the primary key. Thankfully, you can do whatever you like. You can still use the surrogate for your FK constraints; SQL Server do not require them to go against the primary key. All you need is a non-filtered unique indexes.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Wednesday, December 04, 2013 12:53 PM
  • Rule #1:  Never key a table on a field the user can change.

    Rule #2:  NEVER key a table on a field the user can change.

    Rule #3:  See Rule #1 and #2

    Using an auto generated key vs a surrogate key (like SKU) is a different issue.  If the SKU can NEVER be changed, then it is a candidate key.  However, if the user can change the SKU, that means  you need to do coding to rekey every linked table from the old key to the new key if the user changes it.

    In addition, a number uses less storage and is faster to seek than a char(10). 

    This is a good discussion:

    http://sqlmag.com/database-administration/sql-design-how-choose-primary-key

    Wednesday, December 04, 2013 1:21 PM
  • Kalman,

    Taking Erlands post as a point it would be good if you change thread type as Discussion.As almost all the post seems to have a valid point

    Thanks


    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

    Wednesday, December 04, 2013 1:27 PM
  • From my experience, the main reason people ad ID fields, is simply because they are programmers who do not understand how databases work, or, even if they do, they use the database as efficient storage, rather than the master of the data, as they prefer to keep control in the code.

    This is true especially at large companies where because of politics, blue dollars, or sheer laziness, the programmers do not ask experienced DB designers what to do. Or, even when they do, neither cares to do it correctly, and they just do it to get the job done. In my guestimation, this accounts for 90%-95% of Id usage.

    So, what is happening? We have a programmer who needs to store data in the database because that is the company standard, it is faster than flat files, and if there are performance issues, he can dump the problem on someone else. He just wants an easy way to call up a record, so he gives them easy numbers, in every TABLE. This also accounts for why there are so many "flags", as opposed to status fields or appropriate data logic.

    --

    Another reason Id is used is for speed, as mentioned in this thread already. While it can add speed, it is usually negligable, and most of the people who use it wouldn't understand or appreciate it even it it was noticeable, because the application takes even longer. Nonetheless, they know the mantra, "ids are faster" so they use them.

    --

    Personally, i avoid Ids when there is a natural key. Not only is it redundant, it forces programmers to think before they add or query data. They actually have to understand that if it isn't UNIQUE, we're not going to magically add a new Id so everything is ok, and that the record may not be INSERTed, so they have to check the returns. Overall, this causes better data and understanding of the data.

    --

    If you want to know if you should add an Id COLUMN. The answer is no. If you cannot continue without ADDing it, than ADD it. What about performance? My rule is, don't compromise until you need to. Otherwise, you'll end up compromising on things you don't need to, and the aim to be optimal results in nothing but confusion and me getting grumpy again. :)




    Sunday, December 08, 2013 12:26 AM
  • Brian,

    Would you make Email varchar(70) a PRIMARY KEY on the Email table?  

    I wouldn't. Even forgetting the performance disaster it would create, do you want the Email varchar(70) used as a FOREIGN KEY in related tables? 

    As far as sloppy developer attitude of placing a SURROGATE PK INT key as the first column of a new table and not thinking about the real PK (NATURAL KEY), I agree, it happens frequently.

    In the case of the Email table I would use SURROGATE INT (or BIGINT) PRIMARY KEY and I would make the Email varchar(70) column a UNIQUE KEY.


    Kalman Toth Database & OLAP Architect IPAD SELECT Query Video Tutorial 3.5 Hours
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012



    Sunday, December 08, 2013 12:34 AM
  • Email varchar(70)?

    1) Email addresses can and do change. Most likely, email is an attribute.
    2) 70 characters is nowhere near enough for an email address. Both the domain and the address can be larger than that.

    However, i'm sure you mean to ask about a column that is quite large. If:

    1) The size was large.
    2) The COLUMN is in a TABLE with many records.
    3) The TABLE is searched a lot.

    Then i would entertain the idea of using a surrogate. However, i has not yet come into such a case. I'm not saying it doesn't exist, just that i have not seen it and it would be hard to make a judgement on what is to me a hypothetical case.

    --

    On re-reading, that sounds like it has quite a belligerent tone. I mean it not against you, Kalman, but i gird myself for a battle when someone wants to put a surrogate key in place. Otherwise, i find some people won't think about it and will fill up the DB with all sorts of useless COLUMNs.

    Thursday, December 12, 2013 1:16 AM
  • Hi Brian,

    In the example of AdventureWorks2012 Production.Product table:

    ProductID INT surrogate PK

    ProductNumber NATURAL KEY with unique index (I would use UNIQUE KEY)

    Name NATURAL KEY with unique index (I would use UNIQUE KEY)

    I assume the ProductNumber - which constructed by company staff - can be PK. I still find INT SURROGATE a cleaner & better solution.

    In addition SURROGATE is excellent for preventing composite FOREIGN KEYs.


    Kalman Toth Database & OLAP Architect IPAD SELECT Query Video Tutorial 3.5 Hours
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012



    Thursday, December 12, 2013 12:42 PM
  • I hear ya. I disagree, but i hear ya.

    And the redundant naming (TABLE name inside the COLUMNs)... it's so insane.

    Thursday, December 12, 2013 10:25 PM
  • Hi sniff_bits,

    I'm currently studying Database Systems as a subject at college so I'd try to answer according to my understanding.

    This method of adding a PK into a table increases its importance.
    It can be linked much more easily with other tables. It also helps in the process of Normalization or if there are any anomalies, you can point them out easily.

    Thanks


    • Edited by bilalShaw Thursday, December 12, 2013 10:38 PM grammer
    Thursday, December 12, 2013 10:37 PM
  • billal, interesting. I disagree.

    >This method of adding a PK into a table increases its importance.

    Why would a surrogate key be of any greater importance than a natural key?

    >It can be linked much more easily with other tables.

    What is easier about it? A key is a key.

    >It also helps in the process of Normalization

    ADDing redundant COLUMNs goes against normalization.

    >or if there are any anomalies, you can point them out easily.

    How?

    Monday, December 16, 2013 10:55 PM
  • Because identity columns are magic! :)
    Tuesday, December 17, 2013 2:15 PM
  • Because identity columns are magic! :)

    Yes! Even grand daddy RDBMS is using it.

    DB2 Manual: "Identity columns

    An identity column contains a unique numeric value for each
    row in the table. DB2® can automatically generate sequential numeric values for
    this column as rows are inserted into the table. Thus, identity columns are
    ideal for primary key values, such as employee numbers or product numbers."

    LINK: http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=%2Fcom.ibm.db2z10.doc.apsg%2Fsrc%2Ftpc%2Fdb2z_identitycols.htm

    So many topics related to the simple INT IDENTITY(1,1) PRIMARY KEY declaration that makes it difficult to present it to SQL beginners:

    INT is a fast linking (JOIN) key

    INT is a dumb number

    IDENTITY is not required: you can program max +1 or use SEQUENTIAL OBJECT (SQL Server 2012 and on)

    PRIMARY KEY default is CLUSTERED index but that is not a requirement

    CI builds nicely as a tree when using INT sequentially increasing

    How to move the CI to a different column(s)

    Having a UNIQUE KEY frequently forgotten when declaring INT IDENTITY PRIMARY KEY as the first column in the table; most casual database designers are relieved: "Wow...I have a PRIMARY KEY...I am safe". Not true, you are not safe until you declare the UNIQUE KEY(s) (NATURAL KEY).


    Kalman Toth Database & OLAP Architect SELECT Video Tutorials 4 Hours
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012




    Thursday, March 27, 2014 11:29 AM