none
Normalization

    Question

  • Which one of the following is more normalized and why ? Which one would perform better ? Which one is better as a structure?

    Table 1:

    CustomerId    EventDate     EventType

    1                    July 1           Opened

    1                    July 1          Closed

    Table 2:

    CustomerId   EventDate OpenedFlag  ClosedFlag

    1                   July 1       Y                   Y

    Thursday, July 18, 2013 7:36 PM

Answers

  • >> Which one of the following is more normalized and why? Which one would perform better? Which one is better as a structure? <<

    Read about First Nor4mal Form (1NF). Each column has to be a scalar value. Your first table has “attribute splitting”, a common design flaw where one attribute is in one or more columns. You split the starting and ending dates into (start-or-end date, meta-data flag) pairs! 

    Your second table uses flags, which is Assembly language and not RDBMS. It also prevents an event from existing for more than one day. 

    CREATE TABLE Events
    (customer_id CHAR(10) NOT NULL,
     event_start_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,
     PRIMARY KEY (customer_id, event_start_date),
     event_end_date DATE,
     CHECK (event_start_date <= event_end_date)
    );

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Friday, July 19, 2013 10:59 PM

All replies

  • The second one is more normalized.  You arent repeating data unneccesarily.  It would also perform better (though you probably wouldn't be able to tell much of a difference in a table this size until you had millions of records).  As far as being a better structure, it really depends on what the information is going to be used for, how it would be accessed, would there be any other information needed, etc.  For the limited information contained here I like the second example best as you have less repetative information, and though you can over do normalization, Usually less is better.  In a large database you would also encounter disk space issues with the first example due to the repetativeness of the information.

    Thanks, JLinker

    • Proposed as answer by jlinker Thursday, July 18, 2013 7:44 PM
    Thursday, July 18, 2013 7:41 PM
  • In your example, I assume "Opened" and "Closed" are mutually exclusive.  Is everything which exists implied as "Opened"?  If those are true, then you only need a "Closed" flag.  Closed = N then "Opened", Closed=Y then "Closed". 

    However, I assume you will have more than 2 status.  In which case, I would create a status table and use an ID to the Event table.

    CREATE TABLE [dbo].[EventStatus](
    	[StatusID] [int] IDENTITY(1,1) NOT NULL,
    	[Description] [varchar](50) NOT NULL,
     CONSTRAINT [PK_EventStatus] PRIMARY KEY CLUSTERED 
    (
    	[StatusID] ASC
    )
    )
    CREATE TABLE [dbo].[Event](
    	[EventID] [int] IDENTITY(1,1) NOT NULL,
    	[EventDate] [datetime] NOT NULL,
    	[StatusID] [int] NOT NULL,
     CONSTRAINT [PK_Event] PRIMARY KEY CLUSTERED 
    (
    	[EventID] ASC
    )
    )
    ALTER TABLE [dbo].[Event]  WITH CHECK ADD  CONSTRAINT [FK_Event_EventStatus] FOREIGN KEY([StatusID])
    REFERENCES [dbo].[EventStatus] ([StatusID])
    ALTER TABLE [dbo].[Event] CHECK CONSTRAINT [FK_Event_EventStatus]

    Thursday, July 18, 2013 7:53 PM
  • Thanks for the reply. Here you are more of creating the Table 1 type of structure. What do you think of Table 2 ? Is n't table 2 more normalized and better design ? What do you suggest ?
    Also the table consists of millions of records.

    • Edited by zenithsql Thursday, July 18, 2013 8:06 PM added
    Thursday, July 18, 2013 8:04 PM
  • Question, can the event for July 1st have more than one status at the same time?  Can it be opened and closed? or will it always have only one status?  If it will only have one status, then Tom's answer would be the most normalized.  It is like your second example in that there would only be one record for July 1st event, and the status one be one column on that table, and you would update that column as the status changes.  Your first example implies that there would be two records for the event, one for each status.  Tom's answer would be the more normalized way to go.

    Thanks, JLinker

    Thursday, July 18, 2013 8:08 PM
  • I am not a purist but I think having two columns representing "something of the same" is duplication and therefore not normalized. Attributes/columns like open-close should be a single column indicator. If you have more possible values that would be a code. Its up to you to enforce integrity with your possible code values in a lookup table.

    Hope this is Helpfull. J Alverson Microsoft BI Solutions Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Friday, July 19, 2013 10:09 PM
  • >> Which one of the following is more normalized and why? Which one would perform better? Which one is better as a structure? <<

    Read about First Nor4mal Form (1NF). Each column has to be a scalar value. Your first table has “attribute splitting”, a common design flaw where one attribute is in one or more columns. You split the starting and ending dates into (start-or-end date, meta-data flag) pairs! 

    Your second table uses flags, which is Assembly language and not RDBMS. It also prevents an event from existing for more than one day. 

    CREATE TABLE Events
    (customer_id CHAR(10) NOT NULL,
     event_start_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,
     PRIMARY KEY (customer_id, event_start_date),
     event_end_date DATE,
     CHECK (event_start_date <= event_end_date)
    );

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Friday, July 19, 2013 10:59 PM