none
SQL Data Warehouse Business Key vs. Surrogate Key

    Question

  • Hi all, i'm fairly new to data warehousing and i have a few questions.  I'm trying to import data from a horrible highly normalised database into an Education DWH (to answer questions like grade improvement over time, avg grade over time, avg number of subjects taken in year etc. etc.)  The primary keys in this DB are Numeric (god knows why).  Obviously i don't want to use Numeric keys for my DWH as PrimaryKeys that is stupid.  How do i maintain a relationship between the tables without using the Business Keys? i.e. Lookups maybe?  I'm not aware of how to use lookups in this situation.  I created a RowGenerator in SSIS which will give me a unique ID across the rows, but this doesn't work for 1 to many relationships as you can have many subjects per student etc.  StudentID = 1 SubjectID = 1,2,3,4 etc.  you've lost the relationship.  So i'm thinking of having the relationships defined by the Business Key, and having a surrogate key to uniquely identify each row in DWH (Dim or Fact).  Is this best practise?  Advice would be greatly appreciated.  (there doesn't seem to be a definitive answer on net)
    Thursday, April 10, 2014 4:38 PM

Answers

  • Typically, you do the following

    dimension_table(business-key, surrogate-key, <other columns>)

    FACT (surrogate-key, <other columns>)

    Thanks

    Sunil


    Sunil Agarwal, SQL Server PM

    • Marked as answer by jhowe1 Thursday, May 22, 2014 7:10 AM
    Wednesday, May 21, 2014 11:42 PM

All replies

  • Take a look at the excellent design of AdventureWorksDW2012. Here is an example:

    USE [AdventureWorksDW2012]
    GO
    
    CREATE TABLE [dbo].[DimCurrency](
    	[CurrencyKey] [int] IDENTITY(1,1) PRIMARY KEY,
    	[CurrencyAlternateKey] [nchar](3) NOT NULL UNIQUE,
    	[CurrencyName] [nvarchar](50) NOT NULL);
    
    /*......
    CurrencyKey	CurrencyAlternateKey	CurrencyName
    92	CHF	Swiss Franc
    93	BDT	Taka
    94	SIT	Tolar
    95	TTD	Trinidad and Tobago Dollar
    96	TND	Tunisian Dinar
    97	TRL	Turkish Lira
    98	GBP	United Kingdom Pound
    99	UYU	Uruguayan Peso
    100	USD	US Dollar
    .......
    */
    
    CREATE TABLE [dbo].[DimProduct](
    	[ProductKey] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
    	[ProductAlternateKey] [nvarchar](25) NULL UNIQUE,
    	[ProductSubcategoryKey] [int] NULL,
    	[WeightUnitMeasureCode] [nchar](3) NULL,
    	[SizeUnitMeasureCode] [nchar](3) NULL,
    	[EnglishProductName] [nvarchar](50) NOT NULL,
    	[SpanishProductName] [nvarchar](50) NOT NULL,
    	[FrenchProductName] [nvarchar](50) NOT NULL,
    	[StandardCost] [money] NULL,
    	[FinishedGoodsFlag] [bit] NOT NULL,
    	[Color] [nvarchar](15) NOT NULL,
    	[SafetyStockLevel] [smallint] NULL,
    	[ReorderPoint] [smallint] NULL,
    	[ListPrice] [money] NULL,
    	[Size] [nvarchar](50) NULL,
    	[SizeRange] [nvarchar](50) NULL,
    	[Weight] [float] NULL,
    	[DaysToManufacture] [int] NULL,
    	[ProductLine] [nchar](2) NULL,
    	[DealerPrice] [money] NULL,
    	[Class] [nchar](2) NULL,
    	[Style] [nchar](2) NULL,
    	[ModelName] [nvarchar](50) NULL,
    	[LargePhoto] [varbinary](max) NULL,
    	[EnglishDescription] [nvarchar](400) NULL,
    	[FrenchDescription] [nvarchar](400) NULL,
    	[ChineseDescription] [nvarchar](400) NULL,
    	[ArabicDescription] [nvarchar](400) NULL,
    	[HebrewDescription] [nvarchar](400) NULL,
    	[ThaiDescription] [nvarchar](400) NULL,
    	[GermanDescription] [nvarchar](400) NULL,
    	[JapaneseDescription] [nvarchar](400) NULL,
    	[TurkishDescription] [nvarchar](400) NULL,
    	[StartDate] [datetime] NULL,
    	[EndDate] [datetime] NULL,
    	[Status] [nvarchar](7) NULL)
    GO
    /*
    ProductKey	ProductAlternateKey	EnglishProductName
    1	AR-5381	Adjustable Race
    2	BA-8327	Bearing Ball
    3	BE-2349	BB Ball Bearing
    4	BE-2908	Headset Ball Bearings
    5	BL-2036	Blade
    ....
    */
    CREATE TABLE [dbo].[FactInternetSales](
    	[ProductKey] [int] NOT NULL REFERENCES DimProduct,
    	[OrderDateKey] [int] NOT NULL,
    	[DueDateKey] [int] NOT NULL,
    	[ShipDateKey] [int] NOT NULL,
    	[CustomerKey] [int] NOT NULL,
    	[PromotionKey] [int] NOT NULL,
    	[CurrencyKey] [int] NOT NULL REFERENCES DimCurrency,
    	[SalesTerritoryKey] [int] NOT NULL,
    	[SalesOrderNumber] [nvarchar](20) NOT NULL,
    	[SalesOrderLineNumber] [tinyint] NOT NULL,
    	[RevisionNumber] [tinyint] NOT NULL,
    	[OrderQuantity] [smallint] NOT NULL,
    	[UnitPrice] [money] NOT NULL,
    	[ExtendedAmount] [money] NOT NULL,
    	[UnitPriceDiscountPct] [float] NOT NULL,
    	[DiscountAmount] [float] NOT NULL,
    	[ProductStandardCost] [money] NOT NULL,
    	[TotalProductCost] [money] NOT NULL,
    	[SalesAmount] [money] NOT NULL,
    	[TaxAmt] [money] NOT NULL,
    	[Freight] [money] NOT NULL,
    	[CarrierTrackingNumber] [nvarchar](25) NULL,
    	[CustomerPONumber] [nvarchar](25) NULL,
    	[OrderDate] [datetime] NULL,
    	[DueDate] [datetime] NULL,
    	[ShipDate] [datetime] NULL,
     PRIMARY KEY CLUSTERED 
    (
    	[SalesOrderNumber] ASC,
    	[SalesOrderLineNumber] ASC
    ))
    GO
    
    


    Kalman Toth Database & OLAP Architect Free T-SQL Scripts
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012

    Thursday, April 10, 2014 5:34 PM
  • hi thanks for your response.  The thing i don't understand is in this example the relationship is between product key --> Fact and CurrencyKey --> Fact. If you were analysing this in a pivot table there would be no relationship between currency and product?  Let me list my example.  DimStudent, DimSubject, FactGrade.  DimStudent and DimSubject share a business key in business DB which is 16 digit numeric (this is primary and foreign key in business DB).  I have given each dimension a unique int surrogate key.  Do i not have to define the relationship between the business key in the two dimensions?  or store the business key in the fact table?  Because if i have student ID = 1 who is taking ModulesID 1,3,7,8 there is no relationship defined for that.  Thanks for your help.

    helllpppp

    • Edited by jhowe1 Friday, April 11, 2014 3:14 PM
    Friday, April 11, 2014 8:51 AM
  • Hi jhowe1,

    Based on my understanding within your listed example, there is a one-to-many relationship between the 2 dimensions, you just need to store their surrogate keys in the fact tables to define the relationship. If that could not meet your analysis requirement, please give details on which can't achieve.

    thanks,
    Jerry

    Monday, April 14, 2014 2:05 AM
  • hi jerry what i'm confused about is the surrogate keys are just identity primary keys.  The actual relationships are stored between the business keys.  So how is creating relationships just between surrogate keys in the DWH sufficient?

    Monday, April 14, 2014 9:03 AM
  • Typically, you do the following

    dimension_table(business-key, surrogate-key, <other columns>)

    FACT (surrogate-key, <other columns>)

    Thanks

    Sunil


    Sunil Agarwal, SQL Server PM

    • Marked as answer by jhowe1 Thursday, May 22, 2014 7:10 AM
    Wednesday, May 21, 2014 11:42 PM
  • Yep thanks I get it now!
    Thursday, May 22, 2014 7:10 AM