none
DPM 2010 not co-locating data correctly RRS feed

  • Question

  •  

    Hello

     

    I have 4 protection groups. All have long term protection to tape protection enabled and have the same goals set for goal 1 (daily backup retained for 1 week) and goal 2 (weekly backup retained for 8 weeks)

    As I understand it data is co-located based on similar retention guidelines. So my daily backups should get allocated to the same tapes and my weekly backups to another tape. The daily and weekly backups should never mix on a tape since their retention settings are different.

    Well that is exactly what is happening. I have both daily (retention 1 week) and weekly (retention 8 weeks) getting backed up to the same LTO5 tapes. As I understand all the documentation this should not happen. Anyone have an idea? My settings are below:

     

    Data Colocation Enabled

    TapeWritePeriodRatio: 1 

    ExpiryToleranceRange: 100 

    Goal 1:

    Daily backup to tape retained for 1 week

    Goal 2:

    Weekly backup retained for 8 weeks. 

     

    I have 8 LTO5 5 tapes in a HP library. Both my daily and weekly backups are being mixed on the same tapes despite having different retention settings and other free tapes available. Why? **This is important because it impacts when the tape is marked as available again.

     

     

     


    • Edited by ZackinMA Tuesday, October 18, 2011 1:39 PM
    Tuesday, October 18, 2011 1:34 PM

Answers

  • I have an answer after contacting Microsoft support... It reads as follows:

    "Summary from GES:  The logic used in DPM to collocate data on tapes does not directly account for retention ranges.  By default, it indirectly accounts for retention through narrowing the allowed expiration dates via the TapeExpiryTolerance value. Your  value is set to one, the filter is essentially switched off.  On the flip side, since there isn’t a way to directly link retention ranges, lowering the value shrinks the window of opportunity for not only unwanted daily jobs being collocated on weekly tapes but, also limits the period of time weekly jobs can be collocated before requiring additional tapes. As a short term stop gap GES has recommend setting the TapeExpiryTolerance to 0.5 to get the best balance we can achieve without a code change. .  The daily jobs could still get collocated on a weekly tape with this setting, but it would be four weeks into the eight week retention period before it happened.  The downside is that after four weeks, the weekly tapes will expire regardless of the presence of any of the daily jobs.  The same shorter expiry period would be true for daily jobs with a one week retention – they would expire in 3.5 days.

    Overall, GES believes this setting will result in more efficient use of tapes."

    In  other words this is a design flaw in Microsoft's code that needs to be corrected. The behavior isn't isolated to just me. It isn't anything I did wrong! - The problem is in the product's code and this will impact everyone!! At this point they have no information on a patch or fix... Wonderful.. 

     

     

    • Marked as answer by ZackinMA Thursday, December 1, 2011 8:31 PM
    Thursday, December 1, 2011 8:24 PM

All replies

  • Hi,

    Was this a one time occurance (IE: Only one tape has mis-matched recovery points on it) - or has this occured multiple times ?

    See if this DPM power shell output (longterm.txt) brings anything to light.

     

    cls
    $confirmpreference='none'
    $dpmversion = ((get-process | where {$_.name -eq "msdpm" }).fileversion)
    write-host "DPM Version - " $dpmversion "`nCollecting Long Term protection Information. Please wait..." -foreground yellow
    $dpmserver = (&hostname)
    out-file longterm.txt
    $pg = @(get-protectiongroup $dpmserver | where { $_.ProtectionMethod -like "*Long-term using tape*"})
    write-host "We have" $pg.count "groups with tape protection"
    foreach ($longterm in $pg)
    {
     "-----------------------------------------------------------`n" | out-file longterm.txt -append
     "" | out-file longterm.txt -append
     "Protection Group " + $longterm.friendlyname | out-file longterm.txt -append
     switch ($dpmversion.substring(0,1))
     {
      2 { $policySchedule = @(Get-PolicySchedule -ProtectionGroup $longterm -longterm)}
      3 { $policySchedule = @(Get-PolicySchedule -ProtectionGroup $longterm -longterm tape)}
      default { write-host "NOT TESTED ON THIS DPM VERSION. Exiting script" -foreground red;exit }
      
     }
     
     $tb = Get-TapeBackupOption $longterm;$tb = $tb.labelinfo
     $label = @($tb.label);
     $count = $policySchedule.count -1
     while ( $count -ne -1)
     {
      if ($label[$count].length -eq 0 -or $label[$count].length -eq $null)
      {
       "Default Label Name" | out-file longterm.txt -append
      }
      else
      {
       "Tape Label: " + $label[$count] | out-file longterm.txt -append
      }
      $policyschedule[$count] | fl * | out-file longterm.txt -append
      $count--
     }
    }
    #exit
    if ($pg.count -gt 1)
    {
     $pgcount=0
     while ($pgcount -ne ($pg.count-1))
     {
      $collocation = @($pg[$pgcount].friendlyname)
      write-host $pgcount -background green
                    (Get-TapeBackupOption $pg[$pgcount]).RetentionPolicy | out-file policyretention.txt
      write-host "policyretention.txt" -foreground green
      type policyretention.txt
      $pgcountinnerloop = 0
      while ($pgcountinnerloop -ne $pg.count)
      {
       write-host $pgcountinnerloop -background yellow
       if ($pgcount -eq $pgcountinnerloop) {$pgcountinnerloop++}
                     (Get-TapeBackupOption $pg[$pgcountinnerloop]).RetentionPolicy | out-file policyretention1.txt
       write-host "policyretention1.txt" -foreground green
       type policyretention1.txt

       $compare = Compare-Object -ReferenceObject $(get-content policyretention.txt) -DifferenceObject $(Get-content policyretention1.txt)
       if ($compare.length -eq $null)
       {
        if ($pgcountinnerloop -lt $pgcount)
        {
         Break
        }
        else
        {
         $collocation = $collocation + $pg[$pgcountinnerloop].friendlyname
         $collocation
         write-host "done"
        }
       }
       $pgcountinnerloop++
      }
      if ($collocation.count -gt 1)
      {
       "-----------------------------------------------------------" | out-file longterm.txt -append
       "Protection Groups that can share the same tape based on recovery goals:" | out-file longterm.txt -append
       "NOTE: Encrypted and non-encrypted protection cannot share same tape despite output" | out-file longterm.txt -append
       " " | out-file longterm.txt -append 
       write-host $collocation 
       foreach ($collocation1 in $collocation)
       {
        $collocation1 | out-file longterm.txt -append
       }
      }
      $pgcount++
     }
    }

        
    "-----------------------------------------------------------" | out-file longterm.txt -append
    $dir = dir longterm.txt; write-host "`nDONE`n`nOutput file Created:" $dir.fullname -foreground yellow
    del policyretention*.txt
    notepad longterm.txt
     

     


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Tuesday, October 18, 2011 10:58 PM
    Moderator
  • Since I just started backing DPM up to this new tape drive, it has so far only occurred on one tape. But it has occurred multiple times on this tape. That tape is now full so I'm waiting to see what will happen in the coming week after my next weekly backup on Saturday. 

     

    I have pasted the output of your script below but I don't think I see anything that explains this behavior. I have also included a screen shot at the bottom that shows this behavior. You have two backups with different retention dates (1 week vs 8 weeks) shown on the same tape - as evidenced by the disparity in expiration dates in relation to when the backup was taken. 

     

    Again my understanding is that this weekly backup tape should only be seeing data from backups with a similar retention policy - mainly my weekly backups that expire after 8 weeks. 

     

     

    -----------------------------------------------------------

     

     

    Protection Group VS3

    Tape Label: DailyBackup

     

     

    Frequency        : Daily

    Interval         : 1

    WeekDay          : Su

    WeekDays         : {Su, Mo, Tu, We...}

    RelativeWeekDay  : None

    RelativeInterval : None

    ScheduleTime     : 10/14/2011 1:00:00 AM

    Generation       : Father

    Vault            : Offsite

    JobType          : OffsiteFatherArchive

    JobTypeString    : Tape backup 

     

     

     

    Tape Label: WeeklyBackup

     

     

    Frequency        : Weekly

    Interval         : 1

    WeekDay          : Sa

    WeekDays         : {Sa}

    RelativeWeekDay  : None

    RelativeInterval : None

    ScheduleTime     : 10/14/2011 1:00:00 AM

    Generation       : Grandfather

    Vault            : Offsite

    JobType          : OffsiteGrandFatherArchive

    JobTypeString    : Tape backup 

     

     

     

    -----------------------------------------------------------

     

     

    Protection Group VS2

    Tape Label: DailyBackup

     

     

    Frequency        : Daily

    Interval         : 1

    WeekDay          : Su

    WeekDays         : {Su, Mo, Tu, We...}

    RelativeWeekDay  : None

    RelativeInterval : None

    ScheduleTime     : 10/14/2011 12:00:00 AM

    Generation       : Father

    Vault            : Offsite

    JobType          : OffsiteFatherArchive

    JobTypeString    : Tape backup 

     

     

     

    Tape Label: WeeklyBackup

     

     

    Frequency        : Weekly

    Interval         : 1

    WeekDay          : Sa

    WeekDays         : {Sa}

    RelativeWeekDay  : None

    RelativeInterval : None

    ScheduleTime     : 10/14/2011 12:00:00 AM

    Generation       : Grandfather

    Vault            : Offsite

    JobType          : OffsiteGrandFatherArchive

    JobTypeString    : Tape backup 

     

     

     

    -----------------------------------------------------------

     

     

    Protection Group DomainControllers

    Tape Label: DailyBackup

     

     

    Frequency        : Daily

    Interval         : 1

    WeekDay          : Su

    WeekDays         : {Su, Mo, Tu, We...}

    RelativeWeekDay  : None

    RelativeInterval : None

    ScheduleTime     : 10/14/2011 10:00:00 PM

    Generation       : Father

    Vault            : Offsite

    JobType          : OffsiteFatherArchive

    JobTypeString    : Tape backup 

     

     

     

    Tape Label: WeeklyBackup

     

     

    Frequency        : Weekly

    Interval         : 1

    WeekDay          : Sa

    WeekDays         : {Sa}

    RelativeWeekDay  : None

    RelativeInterval : None

    ScheduleTime     : 10/14/2011 10:00:00 PM

    Generation       : Grandfather

    Vault            : Offsite

    JobType          : OffsiteGrandFatherArchive

    JobTypeString    : Tape backup 

     

     

     

    -----------------------------------------------------------

     

     

    Protection Group ShoreServer

    Tape Label: DailyBackup

     

     

    Frequency        : Daily

    Interval         : 1

    WeekDay          : Su

    WeekDays         : {Su, Mo, Tu, We...}

    RelativeWeekDay  : None

    RelativeInterval : None

    ScheduleTime     : 10/14/2011 11:00:00 PM

    Generation       : Father

    Vault            : Offsite

    JobType          : OffsiteFatherArchive

    JobTypeString    : Tape backup 

     

     

     

    Tape Label: WeeklyBackup

     

     

    Frequency        : Weekly

    Interval         : 1

    WeekDay          : Sa

    WeekDays         : {Sa}

    RelativeWeekDay  : None

    RelativeInterval : None

    ScheduleTime     : 10/14/2011 11:00:00 PM

    Generation       : Grandfather

    Vault            : Offsite

    JobType          : OffsiteGrandFatherArchive

    JobTypeString    : Tape backup 

     

     

     

    -----------------------------------------------------------

    Protection Groups that can share the same tape based on recovery goals:

    NOTE: Encrypted and non-encrypted protection cannot share same tape despite output

     

    VS3

    VS2

    DomainControllers

    ShoreServer

    -----------------------------------------------------------

    Wednesday, October 19, 2011 4:15 PM
  • Hi,

    Yes, I have seen one other customer experience this, lets see what happens going forward.


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Thursday, October 20, 2011 12:23 AM
    Moderator
  • I don't mind the wait and see approach, but why should i believe the problem will be self correcting? Issues don't fix themselves..
    Thursday, October 20, 2011 2:01 PM
  • Well the wait and see approach didn't yield fruit. I still have backup jobs with different retention dates being saved to the same tapes. I now have 3 tapes with mixed retention and expiration dates. 
    Monday, October 24, 2011 12:47 PM
  • OK,

    Well, I don't have any magic bullet, something in the SQL Db is not correct. Please modify each PG that is not co-locating correctly and uncheck the LT tape option and save the PG settings, then re-enable LT protection and see what happens. 


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Monday, October 24, 2011 10:36 PM
    Moderator
  • Well i can give that a try but since its happening on all 4 groups I don;t think that is it. I have unlimited web support since i have software assurance on this product. i think i'll open a ticket and let Microsoft figure this one out. I'll post their findings.

    Thanks;

    Tuesday, October 25, 2011 2:44 PM
  • I have an answer after contacting Microsoft support... It reads as follows:

    "Summary from GES:  The logic used in DPM to collocate data on tapes does not directly account for retention ranges.  By default, it indirectly accounts for retention through narrowing the allowed expiration dates via the TapeExpiryTolerance value. Your  value is set to one, the filter is essentially switched off.  On the flip side, since there isn’t a way to directly link retention ranges, lowering the value shrinks the window of opportunity for not only unwanted daily jobs being collocated on weekly tapes but, also limits the period of time weekly jobs can be collocated before requiring additional tapes. As a short term stop gap GES has recommend setting the TapeExpiryTolerance to 0.5 to get the best balance we can achieve without a code change. .  The daily jobs could still get collocated on a weekly tape with this setting, but it would be four weeks into the eight week retention period before it happened.  The downside is that after four weeks, the weekly tapes will expire regardless of the presence of any of the daily jobs.  The same shorter expiry period would be true for daily jobs with a one week retention – they would expire in 3.5 days.

    Overall, GES believes this setting will result in more efficient use of tapes."

    In  other words this is a design flaw in Microsoft's code that needs to be corrected. The behavior isn't isolated to just me. It isn't anything I did wrong! - The problem is in the product's code and this will impact everyone!! At this point they have no information on a patch or fix... Wonderful.. 

     

     

    • Marked as answer by ZackinMA Thursday, December 1, 2011 8:31 PM
    Thursday, December 1, 2011 8:24 PM
  • In working with Microsoft I was asked to "justify" requesting them to fix their product. They asked I make a business case. I think this isn't going to be an easy fix and they are looking for a way to avoid having to fix their product. 

    I submitted the following as my business case. I will continue to post updates here as they develop. 

     

    Nathan

     

    I’m just a bit surprised by this. When there is an obvious flaw in a product I would think Microsoft would want to fix it. I shouldn’t have to make a case, beg and pled for Microsoft to get their product to function properly and as advertised. Consider this in making the business case. In the last 3 years alone we have spent nearly a quarter million on licensing and SA agreements with Microsoft. On 2/1/2012 I’ll be writing Microsoft another check for 77 grand. A customer like that is one you should want to keep happy. Aside from that here is the actual impact on my business.

    The data being incorrectly co-located on tapes means that tape expiration isn’t functioning correctly. Tapes aren’t expiring and becoming available again when we need them to. This means we have to set retention goals that are shorter than we need – or we would have to exponentially increase the number of tapes in our library. Our goal is to have a retention time of 6 months for our weekly backups and 30 days for our daily. However with weekly backups being incorrectly co-located on daily backup tapes that means our daily backup tapes won’t expire for 6 months either. That means we would have to purchase roughly 180 tapes to backup our entire data set according to our retention goals. That is not feasible.

    This is also preventing us from upgrading our infrastructure. Because the DPM server can’t meet our retention guidelines I am not able to backup our file server infrastructure using it. That has forced me to keep an older Windows 2003 R2 server in production so I can continue to use our old tape library and backup Exec 10d to protect it. I have a windows Server 2008 R2 file server in production (using DFS replication) but I can’t retire the 2003 file server until I can back up the 2008 server. So this flaw is forcing me to keep 2 servers in production I want to retire (old file server and backup exec server), and tape library I want to retire.

    If this flaw isn’t fixed than I will be forced to invest in a product that will function properly and meet our retention goals. This means additional expenditures on a solution like backup exec which will cost in excess of ten thousand dollars for the necessary licensing – an expenditure I shouldn’t have to make.

    Lastly if this isn’t fixed I will be writing the higher ups at Microsoft seeking financial compensation. I shouldn’t have to make an investment in another backup solution because Microsoft doesn’t want to fix their product so that it will function properly and as advertised. If I have to go the backup exec route than it’s only fair Microsoft pays for that investment I otherwise wouldn’t be required to make. 

    Friday, December 30, 2011 3:13 PM
  • This seems like quite a big deal.  Have you received any additional information from Microsoft on this issue?
    Tuesday, January 17, 2012 9:15 PM
  • the latest word:

     

    Hi Zack,

     

    This issue has been accepted by the triage and I was supposed to receive a private fix for testing today.  However, the fix is still awaiting sign-off by the testing team and it looks like it will now be available on Monday.  As soon as I get it I will forward it out to you to test.

     

    Thanks, Nathan.

    Sunday, January 22, 2012 3:49 PM
  • Microsoft has given me a "private fix". However this fix is coded specifically for my current retention goals. I have decided not to use it because it has unintended consequences if i should change my retention goals in the future. I don't call it a fix - I call it a band-aid on an open wound. 

     

    Based on the "private fix" and how it works it seems that this also wouldn't entirely correct my problem either. Based on Microsoft's reponse it would still be possible under specific circumstances to have data incorrectly co-located. So that was another reason i have chosen not to install it. 

     

    Microsoft's latest response is below:

     

    The example you gave is a possibility in some scenarios since the solution does not implicitly enforce a restriction of exact retention ranges.  However, with the settings you mentioned in your earlier mail that co-location scenario would not be possible.  Here is how the logic flow would look in the example you gave.

     

    1 – empty tape

    2 – A daily job (with retention of 1-month) runs and is the first dataset stored on the tape.

    3 – A weekly job (with a retention of 6-months) runs and the co-location logic checks the following:

     

                    Is the media’s expiry date in the future?

    Yes, it is one month from now.

                    A ‘tolerance range’ is established based on the TapeExpiryTolerance setting.  With a setting of 1.0, the range will be +/- 1-month from the existing media’s expiry date.

                                    Is the expiry of the scrutinized dataset (the weekly job’s expiry) within the tolerance range?

                                                    No, the expiry is more than 1-month in the future and will not be co-located.

     

    If the scenario was reversed, then a daily job could be co-located on a weekly job’s tape once the expiry of the first dataset is within a month of the daily job’s run date.  In this scenario, the co-location would not cause an early expiration of the media since the new check enforces the ‘tolerance range’ in a way that ensures that the new dataset’s expiration is beyond the date of the first dataset on the tape.

     

    I’m sorry if this is still confusing.  It’s somewhat difficult to keep track of how all of the settings interact and even more difficult to explain.  If this still does not make sense please just give me a call so we can talk through it.

     

     

    Thursday, January 26, 2012 2:45 PM
  • Hi Zack,

    I think you may be confused as to the spirit of co-location feature, and the consequence of the bug that is now fixed in the private.

    First off, The private is a universal fix for all customers, not "customized" for your specific recover goals, we simply used your specific goals to illustrate the behavior of the fix.

    The tape Co-location feature was introduced to allow tapes to be more fully utilized, so more recovery points could be written to tapes before they were marked offsite ready.  Ideally, you would like to fill tapes to capacity to minimize the number of tapes used, so that is the spirit of that feature.  We do document that for that feature to work, that the recovery goals between protection groups must match, however the documentation does not say that different recovery goals could not co-exist on the same tape even thought it's easy to gleen that from the wording, and to be honest, I believed the same thing until we investigated this behavior more closely, so live and learn.... 8-)

    Now, The bug that was fixed is that under certain circumstances, we would write a shorter range recovery point on a longer range tape that normally would not be marked offsite ready until later, and that would potentially cause the tape to be marker offsite ready prematurely, thus defeating the co-location goal of maximizing tape usage. 

    The new fix ensures that this will never happen, and that the tape can be used for the maximum time allowed based on the two configurable parameters TapeWritePeriodRatio and ExpiryToleranceRange.

    I would urge you to use the private fix and validate that helps maximize tape usage.

         


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Thursday, January 26, 2012 11:24 PM
    Moderator
  •  

    Okay. I understand your explanation. Now I would just like to explain why this does not completely solve the problem I reported. In fact it could make the issue even worse!

    The issue I reported was data with different retention settings being co-located was causing tapes consumption to skyrocket. This was because my daily backups (with a shorter retention time) being co-located with weekly jobs means that the tapes with mixed co-located data won’t expire until months after the daily backups had expired. Meaning that tape space can’t be used and I would have to greatly increase my library size.

    The issue that you fixed wasn’t even an issue I reported. It was an issue that was discovered after the fact when the behavior I actually reported began to be investigated.

    If I have 2 long term retention goals for daily and weekly backups - they should NOT be co-located. That is what I originally reported – and what your documentation suggested would happen with my TapeWritePeriodRatio and my ExpiryToleranceRange settings. That behavior is what everyone at Microsoft also believed should happen as well. That is why this got escalated to the point that it did within MS. Here is why they should not be co-located. 

    The tape is only expired and marked available when the latest recovery point's retention expiration is met. So if I have a blank tape that gets a daily backup and a weekly backup co-located - the end result is the tape won’t expire and become available again until the weekly backup expires months down the road. That would be well after the daily backup retention would have expired. So the end result is that the space on that tape being consumed by expired daily backups remains unusable months after their expiration because of a weekly backup being co-located on that tape.

    So this means that tape consumption goes through the roof! Now ALL OF MY TAPES will only expire after the longest retention timeframe I have set. The end result is I would likely have to quadruple the number of tapes in my library to accommodate this model because of the tremendious amount of wasted space. That is not practical, realistic, ideal, or cost effective.

    If I could use your words here “Ideally, you would like to fill tapes to capacity to minimize the number of tapes used”. That is correct. But this fix doesn’t accomplish this goal of minimizing the number of tapes required. – and therefore isn’t really a fix at all.

    Given having a longer term retention marked incorrectly as expired if a shorter term retention job is the last written to tape is a real problem – and one I’m glad you fixed. I hope you push that fix out to the public ASAP. However it doesn’t fix the problem I originally reported. 

     


    • Edited by ZackinMA Wednesday, February 1, 2012 4:29 PM
    Wednesday, February 1, 2012 4:22 PM
  • Hi Zack,

    The fix will address all your concerns except the belief that recovery points from different goals should never be co-located, that can still occur, however the main difference is that with the fix in place, that recovery point will not affect when a tape is available for re-use, or cause a tape to be marked offsite ready prematurely based on expiry data for that data set. 

    The next RP written cannot be co-located if it’s expiry date is sooner than the original (1st dataset) expiry date.  This prevents expiring a tape prematurely.  Basically, the first recovery point will expire before the last one written (with a shorter retention period) will expire.  Also if a shorter range RP is added to a longer term backup tape, it will be toward the end of it's tape expiry range when we would normally not add any more longer term recovery points to the tape, so that should not lead to more tapes being used that you were concerned about.   

    A longer retention range job will not ever get co-located with a shorter one if the initial job, i.e. a daily with a shorter retention is the first one written to a tape, so that will prevent the problem of tapes not expiring when expected.

    Again – please install the fix and monitor the tapes and provide feedback if you believe the new behavior causes you any problems.  I think you will agree once implemented that it is a good fix and will address the tape usage concerns.

     


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Wednesday, February 1, 2012 8:54 PM
    Moderator