none
VSS provider is in a bad state error in DPM 2010 RRS feed

  • Question

  • I have a cluster with 2 Hyper-V nodes (2008 R2 Enterprise 6.1.7600 build 7600), CSV (HP MSA 2312i), and DPM2010 backing up using Child Partition Snapshot.

    The cluster has 9 VMs. If all VMs are on HyperV node 1, or all are on HyperV node 2 everything works perfectly!

    Now, for some VMs, If I move a VM from node 1 to node 2, backup fails with this error:

    "The VSS application writer or the VSS provider is in a bad state. Either it was already in a bad state or it entered a bad state during the current operation. (ID 30111 Details: VssError:The writer experienced a non-transient error.  If the backup process is retried,
    the error is likely to reoccur.
     (0x800423F4))"

    moving that same VM back to node 1, backup works fine. and this is regardless of the CSV owner; even if the owner is node 2, backup fails when the VM is on node2.

    For other VMs, backup works fine regardless of the node they are on.

    Saturday, September 17, 2011 3:16 PM

Answers

  • Thanks everyone.

    This issue has been resolved by a complete rebuild of cluster node #2

    No updates for DPM or installation of 2008 R2 SP1 has been done.

    • Marked as answer by Mike002 Monday, September 26, 2011 8:20 PM
    Monday, September 26, 2011 8:20 PM

All replies

  • Hi Mike002

    Are you using VSS Hardware Providors for your MSA 2312i? If not you must use the serialized backup function in DPM.

    http://robertanddpm.blogspot.com/2010/07/enabling-serialized-backup-of-hyper-v.html 

     


    Best Regards

    Robert Hedblom

    MVP DPM


    Check out my DPM blog @ http://robertanddpm.blogspot.com

    Sunday, September 18, 2011 6:41 PM
    Moderator
  • Thanks Robert,

    I am not using any hardware provider.

    I actually used to have the error you referred to in your link:

    "Failed to prepare a Cluster Shared Volume (CSV) for backup as another backup using the same CSV is in progress"

    and you're right, that was resolved by enabling serialization.

    But the error I am getting now is different, and what makes it very strange is that the backup for some VMs works fine on either node, but other VMs' backups work when on node #1 and fail when moved to node #2.

    both nodes are identical, and I've applied the recommended batches to both, updeated the Integrated Services and restarted VMs.

    Not sure what else i should check...

     

    Sunday, September 18, 2011 7:23 PM
  • Mike,

    One command to check out as an elevated admin on the node when it happens again is

    vssadmin list writers

    My guess is if the VSS is in a bad state you will see the following

    Writer name: 'Microsoft Hyper-V VSS Writer'
       Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
       Writer Instance Id: {3b6290fe-bbe8-499e-a601-db817a85576c}
       State: [1] Stable
       Last error: Time Out

    Normally a reboot of the Node will fix the issue but it will come back. Make sure you are running the newest version of DPM
    http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=20953

    Try the following fixes
    http://technet.microsoft.com/en-us/library/ff634205.aspx

    Check for the following updates
    http://support.microsoft.com/kb/2494016
    http://support.microsoft.com/kb/2494162
    http://support.microsoft.com/kb/2520235
    http://support.microsoft.com/kb/2531907 

    Good Luck

    Joey

     



    • Edited by Joey Troy Tuesday, September 20, 2011 7:58 PM
    Tuesday, September 20, 2011 7:57 PM
  • Hello Joey,

    I've checked the updates but none seems to be VSS writer related.

    This is what I get right after the backup fails:

     

    Writer name: 'Microsoft Hyper-V VSS Writer'
       Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
       Writer Instance Id: {6f61d84e-8b1f-4334-b42b-bda60613d718}
       State: [1] Stable
       Last error: Non-retryable error

    Tuesday, September 20, 2011 11:30 PM
  • Mike,

    If you run a 

    vssadmin list providers

    On the nodes, what do you get?

     

    Joey

    Thursday, September 22, 2011 4:27 PM
  • Same from both nodes:

    U:\>vssadmin list providers
    vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
    (C) Copyright 2001-2005 Microsoft Corp.

    Provider name: 'Microsoft Software Shadow Copy provider 1.0'
       Provider type: System
       Provider Id: {b5946137-7b9f-4925-af80-51abd60b20d5}
       Version: 1.0.0.7


    U:\>

    Thursday, September 22, 2011 4:59 PM
  • Mike,

    You may want to verify the following key on the DPM Server

    HKLM\Software\Microsoft\Microsoft Data Protection Manager\2.0\Configuration\MaxAllowedParallelBackups\Microsoft Hyper-V

    Set it from 3 to 1 so it's not backing up so many machines at once. It's possible because so many VM's (3) are getting backed up it could be causing problems with the VSS writer. You will want to reboot the Node having the problem and make sure both nodes are showing clean on the 'Microsoft Hyper-V VSS Writer' before starting backups.

    Joey

     



    • Edited by Joey Troy Thursday, September 22, 2011 5:29 PM
    Thursday, September 22, 2011 5:28 PM
  • Joey,

    As mentioned earlier, I've serialization enabled.

    I've set this key value to 1 long time ago actually :)

     

    Thursday, September 22, 2011 5:35 PM
  • I would recommend turning off serialization buy deleting the DataSourceGroups.XML file located in %PROGRAMFILES%\Microsoft DPM\DPM\Config. And then just try to see how the the

    HKLM\Software\Microsoft\Microsoft Data Protection Manager\2.0\Configuration\MaxAllowedParallelBackups\Microsoft Hyper-V

    set to 1, effects the backup.

    Thursday, September 22, 2011 5:49 PM
  • I don't have it per LUN, just did it with the reg key change.

    The issue is not in serialization, I tried to set up a test VM in a separate PG running by itself, got the same error.

    Even when backup was done in parallel (maxAllowed value = 3), it worked well as long as all VMs are on the same cluster node.

    Again, the issue is with specific VMs; backup works on one node and fails when moved to the other node.

    Thursday, September 22, 2011 6:27 PM
  • I should have asked earlier, have you updated your DPM to the current version and your agents

    http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=23224

    Also is there a reason you have not installed SP1 for Server 2008R2?



    • Edited by Joey Troy Thursday, September 22, 2011 7:07 PM
    Thursday, September 22, 2011 7:02 PM
  • Tried to install the latest updates for DPM, but got an error that Hotfix 2223201 is a requirement.

    Unfortunately, I only found the x86 version of it. The link provided for 64-bit is not working:

    http://vkbexternal.partners.extranet.microsoft.com/VKBWebService/ViewContent.aspx?scid=KB;EN-US;2223201

    Any idea?

     

    Sunday, September 25, 2011 1:31 PM
  • Hi

    As the look of it you must now request the fix, http://support.microsoft.com/kb/2223201#top


    Best Regards

    Robert Hedblom

    MVP DPM


    Check out my DPM blog @ http://robertanddpm.blogspot.com

    Monday, September 26, 2011 6:48 AM
    Moderator
  • Thanks everyone.

    This issue has been resolved by a complete rebuild of cluster node #2

    No updates for DPM or installation of 2008 R2 SP1 has been done.

    • Marked as answer by Mike002 Monday, September 26, 2011 8:20 PM
    Monday, September 26, 2011 8:20 PM