none
Seeking opinions on this partitioning strategy

    Frage

  • I'm currently implementing a data warehouse and am in the early stages of designing my partitioning strategy. I'd like to explain my current thinking and solicit feedback from folks that have experience of implementing something similar.

    I am implementing this for an online retailer. I have a fact table called FactInventorySnapshot that contains a daily count (i.e. a snapshot) of inventory (aka stock) in our fulfilment centre. It is possible (for reasons I won't go into - because for this discussion its not important) that we will need to update the data for a given day.

    The fact table has a column called [DateId] which will contain smart date keys (e.g. 20130823, 20130824, 20130825, etc...). The data goes back to 1st January 2008 so I'll be looking at ~2000 days (6 * 365) on the first time we load.

    I am considering implementing a partition per day (i.e. for every value in [DateId]). This should mean that if I need to update the data for a given [DateId] I can simply get rid of the existing partition for that day, insert the new data for that [DateId] into a staging table and switch in that table as a new partition. Bear in mind this may be a [DateId] in the past.

    The aim here is to make it easy to load data into the table. Ordinarily I'll only be loading data for a day for which we haven't yet loaded any data, so I think that's simply a case of INSERTing into a staging table and SWITCHing that staging table in as a new partition. Dropping existing partitions and recreating them bothers me tho - I'm not yet even sure if this is possible.

    Any comments on that? Am I heading in the right direction or is my proposal for a partition-per-day totally stupid? If you need to query anything please reply indicating so and I'll be on it immediately.

    thanks
    Jamie



    ObjectStorageHelper<T> – A WinRT utility for Windows 8 | http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson

    Dienstag, 27. August 2013 09:20

Antworten

  • ...

     Dropping existing partitions and recreating them bothers me tho - I'm not yet even sure if this is possible.

    Any comments on that? Am I heading in the right direction or is my proposal for a partition-per-day totally stupid? If you need to query anything please reply indicating so and I'll be on it immediately.

    I don't see any conceptual issue with that.

    Depending on how long you keep your data, you will end up having very many partitions - which in the end can cause Problems depending on the memory you have available. If you only update the let's say "most current 14 days" ever, you could change the size of the partitions of older data to monthly - yearly for example.

    I am not sure what you mean by “dropping partitions bothers me” – why do you think you will have to?

    Maybe you mean that the schema can’t be changed easily after implementation. There you are right. It’s not simple, but it’s possible by merging partition (borders)…

    Hope that helps


    Andreas Wolter | Microsoft Certified Master SQL Server
    Blog: www.insidesql.org/blogs/andreaswolter
    Web: www.andreas-wolter.com

    Dienstag, 27. August 2013 10:30
  • I am considering implementing a partition per day (i.e. for every value in [DateId]). This should mean that if I need to update the data for a given [DateId] I can simply get rid of the existing partition for that day, insert the new data for that [DateId] into a staging table and switch in that table as a new partition. Bear in mind this may be a [DateId] in the past.

    The aim here is to make it easy to load data into the table. Ordinarily I'll only be loading data for a day for which we haven't yet loaded any data, so I think that's simply a case of INSERTing into a staging table and SWITCHing that staging table in as a new partition. Dropping existing partitions and recreating them bothers me tho - I'm not yet even sure if this is possible.

    Any comments on that? Am I heading in the right direction or is my proposal for a partition-per-day totally stupid? If you need to query anything please reply indicating so and I'll be on it immediately.

    I think partitioning by day here is a viable strategy as that will allow you to load, replace or remove an entire daily partition at once using SWITCH from/to a stating table. 

    Removing existing partitions with MERGE and creating new ones with SPLIT is no big deal as long as you plan such as to avoid data movement with a SPLIT/MERGE of only empty partitions.  With an incremental date sliding window scenario, one typically uses a RANGE RIGHT function with the first boundary being the oldest retained date and the last boundary being a future date with no data yet.  Run a daily sliding window partition maintenance while the future date partition is still empty to remove the oldest partition and create a new one for future data.  The partition maintenance script:

    • truncates a similarly partitioned staging table
    • switches out the oldest date (partition 2)
    • merges the oldest date partition (with partition 1, which is always empty)
    • truncates the stating table again
    • splits the last partition with the day after the existing future date boundary

    The daily load script:

    • truncates a similarly partitioned staging table
    • loads staging table with data for new day
    • switches in the staging table partition

    The script to reload an existing day:

    • truncates similarly partitioned staging tables 1 and 2
    • loads staging table 1 with day to be replaced
    • switches out the partition to staging table 2
    • switches in staging table 1
    • truncates staging table 2

    Be aware that you are limited to 1,000 partitions per table prior to SQL 2008 SP2 and SQL 2008 R2 SP1 unless you enable 15,000 partition support using sp_db_increased_partitions introduced in those service packs.  SQL 2012 has 15,000 partition support out-of-the-box with no special configuration needed.


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Dienstag, 27. August 2013 11:04
  • Msg 2812, Level 16, State 62, Line 1
    Could not find stored procedure 'sp_dp_increased_partitions'.

    Looks like a typo in the proc name.  Try: sp_db_increased_partitions


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Dienstag, 27. August 2013 11:19
  • got it

    btw

    there is a Special article for that 15.00 Partition Feature with some hints on performance:

    http://technet.microsoft.com/en-us/library/gg981694.aspx


    Andreas Wolter | Microsoft Certified Master SQL Server
    Blog: www.insidesql.org/blogs/andreaswolter
    Web: www.andreas-wolter.com

    Dienstag, 27. August 2013 11:31

Alle Antworten

  • ...

     Dropping existing partitions and recreating them bothers me tho - I'm not yet even sure if this is possible.

    Any comments on that? Am I heading in the right direction or is my proposal for a partition-per-day totally stupid? If you need to query anything please reply indicating so and I'll be on it immediately.

    I don't see any conceptual issue with that.

    Depending on how long you keep your data, you will end up having very many partitions - which in the end can cause Problems depending on the memory you have available. If you only update the let's say "most current 14 days" ever, you could change the size of the partitions of older data to monthly - yearly for example.

    I am not sure what you mean by “dropping partitions bothers me” – why do you think you will have to?

    Maybe you mean that the schema can’t be changed easily after implementation. There you are right. It’s not simple, but it’s possible by merging partition (borders)…

    Hope that helps


    Andreas Wolter | Microsoft Certified Master SQL Server
    Blog: www.insidesql.org/blogs/andreaswolter
    Web: www.andreas-wolter.com

    Dienstag, 27. August 2013 10:30
  • I am considering implementing a partition per day (i.e. for every value in [DateId]). This should mean that if I need to update the data for a given [DateId] I can simply get rid of the existing partition for that day, insert the new data for that [DateId] into a staging table and switch in that table as a new partition. Bear in mind this may be a [DateId] in the past.

    The aim here is to make it easy to load data into the table. Ordinarily I'll only be loading data for a day for which we haven't yet loaded any data, so I think that's simply a case of INSERTing into a staging table and SWITCHing that staging table in as a new partition. Dropping existing partitions and recreating them bothers me tho - I'm not yet even sure if this is possible.

    Any comments on that? Am I heading in the right direction or is my proposal for a partition-per-day totally stupid? If you need to query anything please reply indicating so and I'll be on it immediately.

    I think partitioning by day here is a viable strategy as that will allow you to load, replace or remove an entire daily partition at once using SWITCH from/to a stating table. 

    Removing existing partitions with MERGE and creating new ones with SPLIT is no big deal as long as you plan such as to avoid data movement with a SPLIT/MERGE of only empty partitions.  With an incremental date sliding window scenario, one typically uses a RANGE RIGHT function with the first boundary being the oldest retained date and the last boundary being a future date with no data yet.  Run a daily sliding window partition maintenance while the future date partition is still empty to remove the oldest partition and create a new one for future data.  The partition maintenance script:

    • truncates a similarly partitioned staging table
    • switches out the oldest date (partition 2)
    • merges the oldest date partition (with partition 1, which is always empty)
    • truncates the stating table again
    • splits the last partition with the day after the existing future date boundary

    The daily load script:

    • truncates a similarly partitioned staging table
    • loads staging table with data for new day
    • switches in the staging table partition

    The script to reload an existing day:

    • truncates similarly partitioned staging tables 1 and 2
    • loads staging table 1 with day to be replaced
    • switches out the partition to staging table 2
    • switches in staging table 1
    • truncates staging table 2

    Be aware that you are limited to 1,000 partitions per table prior to SQL 2008 SP2 and SQL 2008 R2 SP1 unless you enable 15,000 partition support using sp_db_increased_partitions introduced in those service packs.  SQL 2012 has 15,000 partition support out-of-the-box with no special configuration needed.


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Dienstag, 27. August 2013 11:04

  • I am not sure what you mean by “dropping partitions bothers me” – why do you think you will have to?

    Maybe you mean that the schema can’t be changed easily after implementation. There you are right. It’s not simple, but it’s possible by merging partition (borders)…

    Hi Andreas,

    Thanks for the reply. Let me clarify what I mean by that (I didn't explain it very well).

    Its quite possible that if I am loading today (20130827) that I might also have to amend the data for (say) 20130823. I would rather not issue any UPDATEs so what I would like to do instead is blow away all the data that already exists for 20130823 and INSERT it all again. I'd like to do this by switching out the partition that contains the data for 20130823 so that it exists as a single staging table, delete all the data from it (or DROP the table and recreate it) and INSERT all the data (that has already been recalculated elsewhere). 

    Not sure if I'm explaining this as fully as I need to, let me know if not. Conceptually I want to get rid of the data that represents the snapshot for a given day, and load in the new (already calculated) snapshot for that day.

    TIA
    Jamie


    ObjectStorageHelper<T> – A WinRT utility for Windows 8 | http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson

    Dienstag, 27. August 2013 11:07
  • Be aware that you are limited to 1,000 partitions per table prior to SQL 2008 SP2 and SQL 2008 R2 SP1 unless you enable 15,000 partition support using sp_db_increased_partitions introduced in those service packs.  SQL 2012 has 15,000 partition support out-of-the-box with no special configuration needed.

    Thanks Dan. I'll address the rest of your reply later but I am at the moment attempting to enable 15000 partitions as you suggest and its failing for reasons that are completely baffling to me. If I issue: select @@version i see this:

    Microsoft SQL Server 2008 R2 (SP2) - 10.50.4000.0 (X64)   Jun 28 2012 08:36:30   Copyright (c) Microsoft Corporation  Enterprise Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) 

    Which seems to be adequate for 15000 partitions. Yet when I execute EXEC sp_dp_increased_partitions I see:

    Msg 2812, Level 16, State 62, Line 1
    Could not find stored procedure 'sp_dp_increased_partitions'.

    I'm totally bemused by this as I have enterprise edition, I have 2008R2 SP2. Have you ever seen this before?

    Regards
    Jamie


    ObjectStorageHelper<T> – A WinRT utility for Windows 8 | http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson

    Dienstag, 27. August 2013 11:14

  • ...

    Its quite possible that if I am loading today (20130827) that I might also have to amend the data for (say) 20130823. I would rather not issue any UPDATEs so what I would like to do instead is blow away all the data that already exists for 20130823 and INSERT it all again. I'd like to do this by switching out the partition that contains the data for 20130823 so that it exists as a single staging table, delete all the data from it (or DROP the table and recreate it) and INSERT all the data (that has already been recalculated elsewhere). 

    Not sure if I'm explaining this as fully as I need to, let me know if not. Conceptually I want to get rid of the data that represents the snapshot for a given day, and load in the new (already calculated) snapshot for that day.

    ...

    Hello Jamie

    makes perfect sense to me.

    If that staging-table is located on the same filegroup, that switichg in of the "old day's" data is just a metadata Operation only.

    For a good overview I also recommend this whitepaper by Kimberley Tripp/Randall: http://technet.microsoft.com/en-us/library/ms345146%28v=sql.90%29.aspx

    it's from 2005, but the Basic concepts still apply. We now have support for up to 15.000 partitions as main improvement - as Dan Guzman just said.


    Andreas Wolter | Microsoft Certified Master SQL Server
    Blog: www.insidesql.org/blogs/andreaswolter
    Web: www.andreas-wolter.com

    Dienstag, 27. August 2013 11:17
  • Msg 2812, Level 16, State 62, Line 1
    Could not find stored procedure 'sp_dp_increased_partitions'.

    Looks like a typo in the proc name.  Try: sp_db_increased_partitions


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Dienstag, 27. August 2013 11:19
  • For a good overview I also recommend this whitepaper by Kimberley Tripp/Randall: http://technet.microsoft.com/en-us/library/ms345146%28v=sql.90%29.aspx

    it's from 2005, but the Basic concepts still apply. We now have support for up to 15.000 partitions as main improvement - as Dan Guzman just said.

    yep, have currently got that open in the next tab to this one, been pouring over it all morning.



    ObjectStorageHelper<T> – A WinRT utility for Windows 8 | http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson


    Dienstag, 27. August 2013 11:20
  • Msg 2812, Level 16, State 62, Line 1
    Could not find stored procedure 'sp_dp_increased_partitions'.

    Looks like a typo in the proc name.  Try: sp_db_increased_partitions


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Ha! How stoopid do i feel!! :)   (Mind you, I copied/pasted it from somewhere, can't remember where!). Thanks Dan.

    ObjectStorageHelper<T> – A WinRT utility for Windows 8 | http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson

    Dienstag, 27. August 2013 11:22
  • Ha! How stoopid do i feel!! :)   (Mind you, I copied/pasted it from somewhere, can't remember where!). Thanks Dan.

    Copied from here: http://technet.microsoft.com/en-us/library/hh204563.aspx

    Should have taken the time to read the community additions where someone already pointed out the typo!


    ObjectStorageHelper<T> – A WinRT utility for Windows 8 | http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson

    Dienstag, 27. August 2013 11:25
  • got it

    btw

    there is a Special article for that 15.00 Partition Feature with some hints on performance:

    http://technet.microsoft.com/en-us/library/gg981694.aspx


    Andreas Wolter | Microsoft Certified Master SQL Server
    Blog: www.insidesql.org/blogs/andreaswolter
    Web: www.andreas-wolter.com

    Dienstag, 27. August 2013 11:31

  • The daily load script:

    • truncates a similarly partitioned staging table
    • loads staging table with data for new day
    • switches in the staging table partition

    The script to reload an existing day:

    • truncates similarly partitioned staging tables 1 and 2
    • loads staging table 1 with day to be replaced
    • switches out the partition to staging table 2
    • switches in staging table 1
    • truncates staging table 2

    Hi Dan,

    This is really really useful, thank you. Can I just clarify something? You use the term "similarly partitioned" here, what exactly does that mean? I'm assuming it means "same partition function, same partition scheme" but if that is the case I would have thought "identically partitioned" would be a better description.

    Can you clarify your meaning here?

    thanks
    Jamie


    ObjectStorageHelper<T> – A WinRT utility for Windows 8 | http://sqlblog.com/blogs/jamie_thomson/ | @jamiet | About me
    Jamie Thomson

    Dienstag, 27. August 2013 14:26