Datawarehouseing and continuous loads (online/real time data)


  • Hi,

    We have a large datawarehouse with fact-tables (marts) and dimensions which is accessed through database views with Business Objects.
    We load our DW hourly but only update our datamarts every night.
    Now we are getting requests for more online data in our marts (not just every night) and we are having some trouble coming up with a good solution.

    Is there any datawarehousing best practices for more online data and not just one batch every night - even while users are possibly running queries?

    Our marts contain upwards of 500 million rows each and are partitioned. When we load our facts we first do a delete from the fact on all the affected rows and then we re-load affected and new rows to the table.

    Is there any guidelines or best practices for this?
    Any tips on how to achieve more "online" data even with users (possibly) running queries?

    We have some ideas but none of them are really solid.


    • Bearbeitet JonathanS101 Montag, 16. April 2018 14:04 Spelling
    Montag, 16. April 2018 14:03

Alle Antworten

  • You need to change your process to merge changes, instead of drop and recreate.

    Montag, 16. April 2018 14:34
  • if you have partitions, then you can write a logic to process only the incremental partitions each time. Also for the new data, you can create new partitions on the fly and process them. Once its complete you can merge them onto existing partition if you want. That way you can reduce processing to a minimum and it would be faster

    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    Montag, 16. April 2018 16:02
  • What your users are asking is an ODS (Operational Data Store) not a Data Warehouse. Have a look Here. It should help.

    Please mark as answer if this post helped you

    Montag, 16. April 2018 16:27