Best practice reference table in where clause

Answered Best practice reference table in where clause

  • Tuesday, January 29, 2013 6:30 PM
     
     

    IF I have the following tables:

    CREATE TABLE Company (CompanyID int IDENTITY (0, 1), CompanyName varchar(128))

    CREATE TABLE Relationship (RelationshipID int IDENTITY (0, 1), CompanyID int, RelationshipTypeID int)

    CREATE TABLE RelationshipType (RelationshipTypeID int IDENTITY (0, 1), LongName varchar(32), RelationshipTypeCategoryID)

    CREATE TABLE RelationshipTypeCategory (RelationshipTypeCategoryID in INDENTITY (0, 1), LongName varchar(32))

    The foreign keys are implied.

    Now I write a query (the actual values are immaterial)

    SELECT

    FROM Company c

         INNER JOIN Relationship rel

              ON c.CompanyID = rel.CompanyID

         INNER JOIN RelationshipType rt

              ON rel.RelationshipTypeID = rt.RelationshipTypeID

         INNER JOIN RelationshipTypeCategory rtc

              ON rt.RelationshipTypeCategoryID = rtc.RelationshipTypeCategoryID

    which where clause would you use?

    WHERE rtc.RelationshipTypeCategoryID = 1

                       or

    WHERE rtc.LongName = 'Membership'

    I like the fact the first option prevents silly errors like misspelling but since I work with scripts to load this kind of reference data from dev to qa to production I worry that this surrogate key may not always mean what I think it means and it is also harder to visually decipher (although you can put a comment in with the natural key value).

    I like the second option because it is highly readable but I worry that the business could (but almost certainly won't) require a change to the natural key value.

    As a dba I can require either method from the programmers when they write stored procedures and such but there is an interesting wrinkle in that we are moving to Entity Framework and the dratted ORM makes my opinion on the matter moot.

    But given just the t-sql aspect, which is better?

All Replies

  • Tuesday, January 29, 2013 6:35 PM
    Moderator
     
     

    >WHERE rtc.LongName = 'Membership'

    I assume the application has  'Membership'.  Therefore the preferred choice. Surrogate PRIMARY KEY should not be exposed to the end user.

    App would have to make an extra query to find out:  rtc.RelationshipTypeCategoryID = 1?

    > business could (but almost certainly won't) require a change to the natural key value.

    That's OK. The application should be programmed to pick up the change automatically. No hard-wiring. App has to be data-driven.


    Kalman Toth SQL 2008 GRAND SLAM
    Paperback: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012



  • Tuesday, January 29, 2013 6:41 PM
     
     

    According to the T-SQL is concerned, its always better to use Key value like Primary Key or Foreign Key, as they will be having Indices on them by default, and searching using integer key is faster than a string search. When you try to use Keys like Primary or Foreign, optimizer tries to get better performance than a string search.

    And as per the business is concerned, the type tables i.e. master tables wont change regularly, they want to keep same key, value pair. Only transaction tables will be changing, which will use key value from type tables.

    Example: Department table value will be consistent, Changing of department name and department id is very rare, where as Employee table may change, an employ can transfer from one department to another, in this case we will update only Department number in Emplyee table.

    So. First is better as per my knowledge


    Thanks & Regards Prasad DVR

  • Thursday, January 31, 2013 2:46 AM
     
     

    >> If I have the following tables:<<

    No, you do not. Tables have keys; keys are never, by definition, a count of the physical insertion attempts to one table on one disk of one machine like you have. But non-SQL programers who grew up with linked lists and network databases like usign IDENTITY to slowly build these fake assembly language pointers!

    ISO-11179 rules would require that table names are plural or collective names for the sets they model, or a precise name of the relationship. You say you are  a DBA, but missed basics!

    The columns are also wrong. I had to laugh when I read “type_category_id” because I had used it in one of my books as a bad example that soooo awful that nobody would really do it; I wanted to get laugh, but you really did it!! Why not be completely absurd and have a “relationship_type_kind_category_id_value”? In data modeling those affixes are called attribute properties and you only get one of them on an attribute. This is so basic!

    The only reason for having a look up table for a nominal scale is that it is very large or very volatile. When the nominal scale is small and constant, then SQL programmers use “CHECK ( x IN (..))” in a column constraint.

    I know it is skeleton, but the industry standard for a company identifier is the DUNS. Companies are categorized using the North American Industry Classification System (NAICS). The USPS uses a CHAR(35) column for company names on envelopes. But you claim to need 128! With that lack of data quality, you will get some.

    >> The foreign keys are implied. <<

    No, SQL requires that they be declared. Basic Netiquette also requires that you post enough constraints that we can test code. Here is another skeleton, with ANSI/ISO Standards and basic SQL techniques.
     
    CREATE TABLE Companies
    (company_duns CHAR(9) NOT NULL PRIMARY KEY,
     company_name VARCHAR(35) NOT NULL,
     naics CHAR(6) NOT NULL
       REFERENCES North_American_Industry_Classification_System
    (naics));

    CREATE TABLE North_American_Industry_Classification_System
    (naics CHAR(6) NOT NULL PRIMARY KEY,
     naics_description VARCHAR (128) NOT NULL);

    >> which WHERE clause would you use? <<
     
    Nether; I am an experienced SQL programmer who would not write DDL anything like this mess.

    >> I like the fact the first option prevents silly errors like misspelling but since I work with scripts to load this kind of reference data from dev to qa to production I worry that this surrogate key may not always mean what I think it means and it is also harder to visually decipher (although you can put a comment in with the natural key value). <<

    You never read Codd! This is not a surrogate key; it is a physical locator used to create a fake Poitier chain that is costing you data integrity, and 1-2 or more orders of magnitude in performance.

    >> .. But given just the t-sql aspect, which is better? <<

    Flush it. I fear that your migration will make this even worse. OO and RDBMS do not mix.


    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

  • Thursday, January 31, 2013 1:52 PM
     
     

    CELKO, I have programs that I want to separate into 2 major categories: Rebate and Reseller.  Within those categories I want to divide them into types:  Rebate has Fuel, Tires, Software.  What tables would you create to describe this?


    • Edited by britbitter Thursday, January 31, 2013 2:38 PM
    •  
  • Friday, February 01, 2013 4:40 PM
     
     

    >>ISO-11179 rules would require that table names are plural or collective names for the sets they model, or a precise name of the relationship. You say you are  a DBA, but missed basics! <<

    When I first began working with RDBMS I was taught that tables are actually relationships and should (mostly) be singular nouns.  As to ISO\IEC 11179 I am guessing that you're talking about section 5 on naming conventions.  I've read it 3 times through and have not seen where specific naming schemes are referenced except in the appendix (which specifically uses the example of singular noun names). As far as I can tell it describes what sorts of rules should be included in a naming convention but not what those rules need be. Where can I find this prescription in the document?

    >>The columns are also wrong. I had to laugh when I read “type_category_id” because I had used it in one of my books as a bad example that soooo awful that nobody would really do it; I wanted to get laugh, but you really did it!! Why not be completely absurd and have a “relationship_type_kind_category_id_value”? In data modeling those affixes are called attribute properties and you only get one of them on an attribute. This is so basic! <<

    Setting aside for a moment the obnoxious tone of voice you use (yes, I understand you are exasperated by my incompetence).  That particularly overly abstracted scheme was created by my predecessor so your assumption that because I "have" these tables that I must have designed and implemented them is unwarranted.  With that said the new database schema  I am developing has a couple of instances where I need to further refine a type with a category.  This design is meant to facilitate the use of MS Entity Framework (which is a ORM made for OO and RDBMS that is trying to bridge the impedance mismatch between OO and RDBMS so your insistence that your dogma of "OO and RDBMS do not mix" should be taken as gospel due to your authority on the matter will have to wait a while longer to be borne out by evidence). 

    >>The only reason for having a look up table for a nominal scale is that it is very large or very volatile. When the nominal scale is small and constant, then SQL programmers use “CHECK ( x IN (..))” in a column constraint. <<

    I hadn't thought of that but it seems to me that a foreign key lookup table is more flexible in terms of changes to the allowable values.  The list may be constant but should you plan on it?  The lookup table can be expanded or shrunk through a UI with DML and doesn't require a schema change.

    >>I know it is skeleton, but the industry standard for a company identifier is the DUNS. Companies are categorized using the North American Industry Classification System (NAICS). The USPS uses a CHAR(35) column for company names on envelopes. But you claim to need 128! With that lack of data quality, you will get some. <<

    So if a company name is 40 characters then if the post office only reads the first 35 then the company itself should conform to post office limitations?  You seem to have a slavish devotion to standards that may or may not be appropriate for the task at hand.  At any rate the amount of time and money it would take for us to determine the DUNS and the NAICS category is rather large and to attempt it is frankly absurd.  We have had enough trouble having to go back to each customer, find and verify their DOT number (which we actually have evolved a need for) that attempting to do the same for some piece of information that I'll warrant most of the small businesses that are our customers wouldn't have the slightest clue about is ridiculous.

    >>No, SQL requires that they be declared. Basic Netiquette also requires that you post enough constraints that we can test code. Here is another skeleton, with ANSI/ISO Standards and basic SQL techniques. <<

    This seems to be fair comment but I would point out that if you understand the question I am asking (which is not some sort of puzzle but just a question about opinion or preference) with the information I gave then you're just being an ass for your own enjoyment.

    >>You never read Codd! This is not a surrogate key; it is a physical locator used to create a fake Poitier chain that is costing you data integrity, and 1-2 or more orders of magnitude in performance.<<

    I have read Codd and Date (although I certainly bow to your superior knowledge).  I can see that maybe you want to help me with the design decisions I am making with regard to how I'm using machine generated keys and I see what you mean and appreciate your pointing it out (although this is not what I was asking about).  Out of curiosity, if there are other attributes associated with the lookup I'm using how would I go about organizing that (categories and types)?

    I am well aware of my limitations in the field.  I was trained in music theory for crying out loud.  But when people are looking for guidance or even if they are transparently looking to have their opinion validated insulting them is not just rude and obnoxious it is counter productive.  When I ask for the time please don't tell me how to build a clock.  Also you can be tough without giving a dick-fu demonstration.   So, thanks for your "help", but if it means this sort of treatment you needn't bother in future.


    • Edited by britbitter Friday, February 01, 2013 4:41 PM
    •  
  • Friday, February 01, 2013 4:43 PM
     
     
    But if the join is on the integer key doesn't that obviate the necessity for using it in the where clause?
  • Friday, February 01, 2013 5:00 PM
    Moderator
     
     

    >working with RDBMS I was taught that tables are actually relationships and should (mostly) be singular nouns.

    Affirmative. Table names should be singular like in AdventureWorks2012.

    >The lookup table can be expanded or shrunk through a UI with DML and doesn't require a schema change.

    Yes lookup tables are accepted in RDBMS when the codes are dynamic and/or associated with additional information such as state code and state name. If there are too many lookup tables, consolidate them into LookupHeader - LookupDetail parent-child table structure.

    >When the nominal scale is small and constant, then SQL programmers use “CHECK ( x IN (..))” in a column constraint.

    Agreed.  Like "CHECK (Gender in ('M', 'F') )"

    US state and Canadian province codes can go into a CHECK constraint because they are static and below hundred (ballpark). You can also store them in a lookup table with additional columns.



    Kalman Toth SQL 2008 GRAND SLAM
    Paperback/Kindle: SQL Server 2012 Pro







  • Friday, February 01, 2013 6:11 PM
     
     Answered

    Don't let Joe get to you too much.  He posts this kind of response a lot.  You can go to http://social.msdn.microsoft.com/Profile/--celko--/activity where you can see links to his recent posts.  If you click on of few of them, you will see that you have lots of company (including me) when your posts are criticized by Joe.

    To be fair to Joe, asking for DDL and data is often a good thing.  If you are asking a question where you want a query that gives a certain result, giving us DDL, data, and the result you want from your query often helps us quickly give you a query that does what you want.  But, IMO, Joe is unnecessarily offensive in the manner which he asks for that information.  In any case, your question is different, it's more of a coding style question than a request for a particular working query, and I thought you gave us perfectly fine information.

    Joe likes standards more than I do.  I believe in standards in most cases, but I am also willing to ignore them when I believe that ignoring them makes my systems more maintainable and/or understandable.  I don't think I've ever seen a production database that was 100% ISO-11179 compliant, certainly none of mine ever were.

    My answer to your question would be "it depends". 

    If the user is keying in the information, I would definitely go with WHERE rtc.LongName = 'Membership'.  Yes you can get potential spelling errors, but if the user types Mebmership when the user meant Membership, no rows will be returned and the user will likely figure out what happened.  But if you use rtc.RelationshipTypeCategoryID = 1 and the user accidentally types rtc.RelationshipTypeCategoryID = 2 instead, now you get back a result set with the wrong rows, and that is often more difficult for the user to notice and correct.  Another advantage is that you do not have to train any new users of the system that when the user wants Membership, they must type 1.

    If on the other hand, the application reads the RelationshipTypeCategory table and presents the user with a dropdown list of LongName's, then when the user selects Membership, the application can know (because it read the RelationshipTypeCategory table) that that is RelationshipTypeCategoryID = 1.  In this case, since you aren't dependent on either having the value of 1 coded in the application or the user knowing that Membership = 1, I would go with using WHERE RelationshipTypeCategoryID = 1.  The reason for that is that RelationshipTypeCategoryID is in your ON clause of your join.  This might help the query optimizer product a more efficient plan.

    Tom

    • Marked As Answer by britbitter Friday, February 01, 2013 8:16 PM
    •  
  • Saturday, February 02, 2013 12:04 AM
    Moderator
     
     

    Hi Tom,

    >Joe likes standards more than I do. I believe in standards in most cases

    Standards are immaterial for us mostly. We live in the microcosm of SQL Server. If Microsoft implemented some standards, we can use it, otherwise not. There are a few choices we can make such as CURRENT_TIMESTAMP (ANSI) vs. getdate() .

    If we pick CURRENT_TIMESTAMP, so goes the wisdom, it is easier to migrate to ORACLE. Really? How about all the other (ten? hundred?) SUBSTANTIAL differences?

    Joe for some reason does not like INT IDENTITY(1,1) SURROGATE PRIMARY KEY. I was unable to figure out what does he propose as replacement?


    Kalman Toth SQL 2008 GRAND SLAM
    Paperback/Kindle: SQL Server 2012 Pro





  • Saturday, February 02, 2013 10:59 AM
     
     Proposed

    Standards are immaterial for us mostly. We live in the microcosm of SQL Server. If Microsoft implemented some standards, we can use it, otherwise not. There are a few choices we can make such as CURRENT_TIMESTAMP (ANSI) vs. getdate() .

    Not forgetting the fact that if you use datetime2, it's preferrable to use sysdatetime() which has 1 ms precision, as opposed to the 3.33 ms precision of getdate() and CURRENT_TIMESTAMP.

    Joe for some reason does not like INT IDENTITY(1,1) SURROGATE PRIMARY KEY. I was unable to figure out what does he propose as replacement?

    Sequences, I assume, since they are part of ANSI SQL.

    Joe's argument about IDENTITY beging a physical property blabla is of completely bogus.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
  • Saturday, February 02, 2013 6:16 PM
     
     

    I asked s similar question a while ago. You might find some of the comments there useful.

    Is it a good idea to mix surrogate keys and natural keys in queries?
    http://social.msdn.microsoft.com/Forums/en-US/databasedesign/thread/6463c2d6-7e2c-4f14-bf92-3051058fc1d4


    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers.
    Thanks!
    Aalam | Blog (http://aalamrangi.wordpress.com)

    • Edited by Aalam Rangi Saturday, February 02, 2013 6:20 PM
    •