none
Techniques for verifying data load

    Domanda

  • What are some common techniques that are used to verify the all projected data is loaded into the data warehouse with no data loss?

    John Schroeder

    giovedì 3 maggio 2018 19:43

Tutte le risposte

  • Hi,

    There are many ways to perform reconciliation of data between source and target tables to avoid potential data loss. Below 2 are most common:

    1. Compare data using Left Joins.

    2. Compare Data using Except clause.

    Hope this helps.

    Thanks,

    Alambir

    Please vote this as helpful or Mark this as answered if this helps.


    Thanks, Alambir Please vote this as helpful or Mark as answered if this helps!

    giovedì 3 maggio 2018 20:04
  • I mean between the source databases existing on multiple servers and the final data warehouse destination.  there is no way to do a left join between a table in one of the source databases and the datawarehouse table.

    John Schroeder

    giovedì 3 maggio 2018 20:31
  • Hi,

    Yes, it is possible to compare data exist on different servers. Two approaches:

    1. If data volume is small, Linked server can be created in one of the server pointing to another server and then compare data using either Left Join or Except operators.

    2. If data volume is very high, create staging table in one of the server, Copy table data from one server to another using Import/Export wizard or using SSIS to automate this process and generate reconciliation report on daily/weekly basis.

    Hope this helps.

    Thanks,

    Alambir

    Please vote this as helpful or Mark this as answered if this helps.


    Thanks, Alambir Please vote this as helpful or Mark as answered if this helps!



    giovedì 3 maggio 2018 20:46
  • If the tables are very large, you might send a summary query to DW and Source System, requesting an aggregation by something that won't return too many rows. You might need to cater for data changing on the source system, so a low and high watermark can be useful. For example

    select City, count(*), sum(amount), sum(discount)

    -- sum(convert(bigint,checksum(*))) can be useful

    from MyTable

    where TranDate between convert(date,getdate()-7) and convert(date,getdate())

    Rather than GETDATE(), you might have highwater marks in your DW to use, but by converting to DATE above, it will ensure that both queries are looking for the same date ranges and not taking the hot  data from the source system. On a non transaction table, you might ignore the TranDate concept completely as the table won't be so large.

    If you have 100 cities in the system, you will only get 100 rows returned from DW and source system for the query above. These 100 rows should match exactly.

    Of course, this reconciliation might let through a false positive. But that's unlikely and it can be executed with a very low cost to DW, source system and the network.

    Hope that helps,


    Richard

    lunedì 7 maggio 2018 06:50