Overview

Microsoft has introduced many great features with the SQL Server 2016. Even though some of the features are introduced into SQL Server engine, those are still important for data warehouse development as well. Temporal table feature is among one of them. Temporal tables are just like normal database tables in SQL Server except its built-in support for history versioning of your data. It has two datetime2 data type columns to indicate the row version validity. Once you enable the system versioning temporal feature, it will create another table called “History Table” to store history versions of your data.

Slowly Changing Dimensions

There are various usages of temporal tables. Data auditing, point in time analysis, slowly changing dimension are few of them. The whole focus in this article is how we can use temporal to address the slowly changing dimension scenarios.

Typically dimension attributes are less frequently changing over the time. Product price in product dimension, customer location in customer dimension are few examples for them. It is essential to keep track of changes to the dimension attributes in the data warehouse. This type of dimensions is referred to as slowly changing dimension (SCD). There are pre-defined SCD types in order to handle and cater this SCD situation. Usually, it is handled by the ETL process.  In SCD type 4, it maintains a separate table to keep the history and this temporal behavior matches with SCD type 4.  

Let’s see how we can use the temporal feature to address the SCD problem.

DEMO

Data Model

In the example demo, we have 4 tables, 3 dimension tables and 1 fact table.

1.      DimEmployee

DimCalender


DimSalesTerritory

FactSales


In the demo model, sales territory table connect with DimEmployee table using SalesTerritoryKey. So, this model we can consider as a Snow-flake data model.

Enabling Temporal for Existing Tables

We can enable the temporal feature for a new table as well as for an existing table. In this example, we're going to show you how to enable a system-versioning temporal feature for an existing data warehouse dimension table.

In order to enable the temporal feature, execute the below script

/* ENABLE SYSTEM-VERSIONING TEMPORAL FEATURE FOR A EXISTING TABLE */
 
ALTER TABLE dbo.DimEmployee
ADD
    ValidFrom DATETIME2 (2) GENERATED ALWAYS AS ROW START
    CONSTRAINT DF_ValidFrom DEFAULT DATEADD(second, -1, SYSUTCDATETIME())  
    ,ValidTo DATETIME2 (2) GENERATED ALWAYS AS ROW END
    CONSTRAINT DF_ValidTo DEFAULT '9999.12.31 23:59:59.99'
    , PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo);
     
ALTER TABLE dbo.DimEmployee
  SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.DimEmployee_History));

Now we have added “ValidFrom” and “ValidTo” columns for our existing DimEmployee table and have created another table to keep the employee history records.

We will select employee Linda Mitchell (employeeID 282) to continue with the demo. So, let’s see how Linda has performed on her sales.

SELECT E.EmployeeKey, E.FirstName , E.LastName , E.SalesTerritoryKey, ST.SalesTerritoryRegion
, SUM(F.SalesAmount) AS TotalSales
FROM dbo.DimEmployee E
INNER JOIN dbo.FactSales F ON E.EmployeeKey = F.EmployeeKey
INNER JOIN dbo.DimSalesTerritory ST ON E.SalesTerritoryKey = ST.SalesTerritoryKey
WHERE E.EmployeeKey = 282
GROUP BY E.EmployeeKey, E.FirstName , E.LastName , E.SalesTerritoryKey, ST.SalesTerritoryRegion

In the above image, you can see Linda has performed total sales of 11786000.00 and her sales territory region is Southwest.

Simulate the ETL Process

Let’s assume she has moved into Northeast region as her new sales territory. This is a typical and very common SCD scenario where the dimension attributes are changing. 

/* Change Linda's Sales Territory*/
UPDATE dbo.DimEmployee
SET SalesTerritoryKey = 2
where EmployeeKey = 282

After updating the sales territory field for Linda, now you can see the older version has moved into the history table.

Load the dimensions successfully, then in the ETL process load the fact tables.

In this demo, we will insert a record for Linda with sales amount of 10000.00 into sales fact table.

INSERT INTO dbo.FactSales
VALUES (282, 20170319, 10000.00, '2017-04-03 07:50:55.10')

Now let’s run the first query again to see the sales amount details for Linda.

Now you can see the latest sales territory has updated to Northeast and 10000.00 sales are being added to existing sales amount. The issue in this result is that Linda’s older sales data which belongs to Southeast also still shows as Northeast, which is technically depicted as a wrong idea. We need to adjust our query in order to see the precise results in the report.

SELECT E.FirstName , E.LastName , E.SalesTerritoryKey , ST.SalesTerritoryRegion , E.ValidFrom, E.ValidTo
SUM(F.SalesAmount) AS SalesAmount
FROM dbo.FactSales F
INNER JOIN dbo.DimEmployee FOR SYSTEM_TIME ALL  AS E
ON E.EmployeeKey = F.EmployeeKey AND F.Date BETWEEN E.ValidFrom AND E.ValidTo
INNER JOIN dbo.DimSalesTerritory ST ON E.SalesTerritoryKey = ST.SalesTerritoryKey
WHERE E.EmployeeKey = 282
GROUP BY E.FirstName , E.LastName , E.SalesTerritoryKey , ST.SalesTerritoryRegion , E.ValidFrom, E.ValidTo

Dig into Modified Query

Typically, events or transactions in organizations are stored in Fact Tables. Therefore, most of the time date key involved in Facts. If you focus on the adjusted query what it does is it joins the transaction date in Fact Table with the system versioned period valid columns in the dimension tables. So, as you can understand this method will only work if your data is loading to the warehouse as in the same date as the transaction date-time is (Ex: Sales date in Fact Sales) otherwise it will give an inaccurate result.

Considerations

  • In case if you want to modify the history table you can do that by setting the history version off.
  • In performance point of view, we can obtain better performance by adding column store index to the history table. Especially if the dimension attributes changing rapidly.

References

Temporal tables: https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables

Temporal tables performance : https://docs.microsoft.com/en-us/sql/relational-databases/tables/memory-optimized-system-versioned-temporal-tables-performance