none
How to Load the fact table with 100 millions of records RRS feed

  • Question

  • Hi
    I have created a package, which loads the dimension tables and fact table. I have like 8 dimensions and one fact table. Out of this 8 dimensions 4 dimensions have been created on top of a table which has more then 100 million of records. 

    Now loading the fact table has been difficult for me. I have created the package like, everytime when the package starts it first Truncates the fact table and have to reload. 

    As the Fact table will be loaded based on joining this dimension table, its been really difficult for me to load it. It's taking hours of hours of time but still not Loading. 

    How do I do this, can anyone let me know please.

    Thursday, August 11, 2011 5:46 PM

Answers

  • Why do you truncate the fact table is this a business requirement? If not then in your ETL package you can use merge joins to filter new rows from the ones already in fact table and then through lookup controls feed the new data.

    Although debateable, but lookup have better performance over join statements.


    http://dailyitsolutions.blogspot.com/
    • Marked as answer by K.Kalyan Wednesday, September 14, 2011 6:37 PM
    Thursday, August 11, 2011 6:32 PM

All replies

  • You will probably be better off asking this in the SSIS forum which you can find here: http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/threads

    Having said that you probably need to provide more information. You don't mention what is the source of the fact data. Is it a single table in another database or a flat file or multiple flat files being loaded into one table or... That can make a big difference. Also, are you using the enterprise edition of SQL Server? If so you should probably look at partitioning so that you can load and query in parallel.

    Still, you probably want to revisit how this all works. Do you really need to truncate the table and reload it all again or is there some form of marker on the source data that can identify what has changed and/or is new so you only need to load that. Using SSIS you can use cached lookups in order to lookup keys during the insert though you need to identify what the bottleneck is. (i.e. If you use cached lookups in SSIS but the source database can't give the data fast enough then you are tuning the wrong thing.)

     


    http://bi-logger.blogspot.com/
    Thursday, August 11, 2011 6:30 PM
  • Why do you truncate the fact table is this a business requirement? If not then in your ETL package you can use merge joins to filter new rows from the ones already in fact table and then through lookup controls feed the new data.

    Although debateable, but lookup have better performance over join statements.


    http://dailyitsolutions.blogspot.com/
    • Marked as answer by K.Kalyan Wednesday, September 14, 2011 6:37 PM
    Thursday, August 11, 2011 6:32 PM