none
Hanging jobs will not abort, why the other VM's couldn't be backuped RRS feed

  • Question

  • Hi

    We are using DPM 2012 in an Hyper-V Cluster environment with Windows Server 2012 Servers.

    The DPM is backing up all the VMs we have. There are something about 50 VMs which should be protected by DPM.

    Since a few days we have the issue, that the backups are taking much longer than before. Our DPM is configured to maximum allow one job per Clusternode and per Cluster-Shared-Volume (CSV). When I check our backup the next morning, there are every day 5 different VMs which are backing up for 8 - 9 hours! Usualy the DPM backup will need some few minutes to complete the backup uf 1 VM. When I cancel this jobs manualy, the other jobs are processed very fast and I can resume the abortet jobs, which also will finish very fast.

    Is there a possiblity to cancel jobs which are taking longer than e.g. 1 hour to backup? Maybe by powershell? This would help me solving my issue for the moment.

    Thank you for your help!

    Friday, March 28, 2014 7:32 AM

Answers

  • Hi Siddarth

    Sorry for my late reply. But we worked a long time to find a solution for that issue. Now it seems like we finally got it.

    After contacting the Microsoft Support for unstable and in the last few weeks not complete backups, we did the following steps:

    Installing all the following patches/hotfixes to all Hyper-V Nodes:

    KB2916993, KB2929869, KB2913695, KB2878635

    After installing these patches, we got a new error message on the DPM: Something like "DPM Bitmap Error"

    Then we had to do the following:

    To get to a clean state please delete DpmFilter* files from “System Volume Information” using following steps.

    1. Open an administrative command prompt and run:

           C:\>fltmc unload DpmFilter (run on all cluster nodes)

    2) From one of the Cluster nodes, run:

           C:\>psexec -s cmd  (psexec is www.sysinternals.com tool used to run cmd under system account)

    3) In this new command prompt (running as system) run:

           C:\>CD C:\ClusterStorage

           C:\ClusterStorage>For /d %i in (volume*) do del /s "%i\System Volume Information\DpmFilter*"

           

    Note: The * includes (DPMFilterBitmap{Guid}, DPMFilterStatus, DPMFilterLog, DPMFilterTrace*)

    4) verify files are deleted by running

            “dir /s /b DpmFilter*”

            exit

           

    5) Then load the filter again on all nodes

           C:\>fltmc load DpmFilter (on all cluster nodes)

    Now the backups work perfectly with no more errors and hanging jobs!

    Thank you anyway for your help!

    Regards,

    Zak61

    • Marked as answer by zak61 Tuesday, June 17, 2014 9:29 AM
    Tuesday, June 17, 2014 9:29 AM

All replies

  • Hi Zak,

    It looks like you have per Node and Per CSV serialization enabled for the backups. This limits the number of backup per node and per CSV to 1 at a time. This limitation existed in Windows 2008 R2 , Windows 2008 R2 Sp1 based cluster.

    Serialization is not required on a Windows 2012 CSV cluster.

    http://technet.microsoft.com/en-us/library/dn296605.aspx

    Can you please disable the serialization and then check if that resolves the issue?

    Monday, April 7, 2014 6:50 PM
  • Hi Siddharth

    Thank you for your reply!

    In the beginning we used to have the parallel backup, but changed to the serialization because of performance and other issues. We will not go back from serialization.

    I need only to stop hanging backups, which are on the "backing up" stage for more then 3 hours each VM. If a single VM is backing up for more then these defined 3 hours, this backup task should stop, so that the other backups will continue.

    Thank you!

    Wednesday, April 9, 2014 3:16 PM
  • Hi Zak,

    I don't think there is any such script available, Some PowerShell guru might be able to write one for you.

    BTW these jobs that keep running for more than three hours - Are these express full backup jobs or Consistency Checks?

    Regards,

    Siddharth Jha

    • Proposed as answer by Siddharth Jha Tuesday, April 22, 2014 3:47 PM
    • Unproposed as answer by zak61 Tuesday, June 17, 2014 9:30 AM
    Friday, April 11, 2014 4:28 PM
  • Hi Siddarth

    Sorry for my late reply. But we worked a long time to find a solution for that issue. Now it seems like we finally got it.

    After contacting the Microsoft Support for unstable and in the last few weeks not complete backups, we did the following steps:

    Installing all the following patches/hotfixes to all Hyper-V Nodes:

    KB2916993, KB2929869, KB2913695, KB2878635

    After installing these patches, we got a new error message on the DPM: Something like "DPM Bitmap Error"

    Then we had to do the following:

    To get to a clean state please delete DpmFilter* files from “System Volume Information” using following steps.

    1. Open an administrative command prompt and run:

           C:\>fltmc unload DpmFilter (run on all cluster nodes)

    2) From one of the Cluster nodes, run:

           C:\>psexec -s cmd  (psexec is www.sysinternals.com tool used to run cmd under system account)

    3) In this new command prompt (running as system) run:

           C:\>CD C:\ClusterStorage

           C:\ClusterStorage>For /d %i in (volume*) do del /s "%i\System Volume Information\DpmFilter*"

           

    Note: The * includes (DPMFilterBitmap{Guid}, DPMFilterStatus, DPMFilterLog, DPMFilterTrace*)

    4) verify files are deleted by running

            “dir /s /b DpmFilter*”

            exit

           

    5) Then load the filter again on all nodes

           C:\>fltmc load DpmFilter (on all cluster nodes)

    Now the backups work perfectly with no more errors and hanging jobs!

    Thank you anyway for your help!

    Regards,

    Zak61

    • Marked as answer by zak61 Tuesday, June 17, 2014 9:29 AM
    Tuesday, June 17, 2014 9:29 AM