none
Backup of Hyper-V 2012 CSV Intermittent Fails with Error 0x80042301

    Question

  • We have a 5 node cluster.  All nodes are running fully patched versions of Windows Server2012 Datacenter (including hotfixes KB2813630 and KB2796995).  Storage is EqualLogic running firmware 6.0.2.  All nodes have EqualLogic HIT 4.5 installed and we are using the hardware provider.  We have two 3TB thin provisioned CSVs setup.  One is not in use.  The other currently contains the first 14 VMs that have been moved from our existing stand-alone Windows Server 2008 R2 SP1 Hyper-V servers.  Only 5 of the 14 VMs are being backed up.  Protection was stopped and started for the move and the required consistency check was performed after the move.  The DPM server is a physical server running SCDPM 2012 SP1 RU2.  All Hyper-V servers have had their agent updated after RU2.  The SCDPM server only has a single protection group setup for all Hyper-V servers (legacy 2008 R2 servers and 2012 cluster).  All backups are succeeding on the legacy servers which are running the same EqualLogic HIT version and are storing their VMs on the same SAN.  Overnight, some backups will fail and others will succeed.  When I fix them up the next day, they will sometimes fail as well even if I tell it to resume backups on one VM at a time.  I can see the hardware snapshots being created on the SAN.  The SAN doesn't report any errors.  SCDPM fails and reports the following:

    Type:    Recovery point
    Status:    Failed
    Description:    The VSS application writer or the VSS provider is in a bad state. Either it was already in a bad state or it entered a bad state during the current operation. (ID 30111 Details: VssError:A function call was made when the object was in an incorrect state
    for that function
     (0x80042301))
        More information
    End time:    4/23/2013 3:37:09 PM
    Start time:    4/23/2013 3:34:44 PM
    Time elapsed:    00:02:25
    Data transferred:    0 MB
    Cluster node    xxxxx.xxxx.xxx
    Recovery Point Type    Express Full
    Source details:    \Backup Using Child Partition Snapshot\vm1
    Protection group:    Hyper-V VMs - Daily

    It leaves the Micrsoft Hyper-V VSS Writer in a failed state with a Timed Out error.  All other VSS writers are fine.  I am also intermittently seeing the following in Application log on some nodes only when backups fail:


    Event: 12363

    Source: VSS

    An expected hidden volume arrival did not complete because this LUN was not detected.

     LUN ID            {350f0b61-0244-4708-abab-a413fb710e7b}
     
     Version            0x0000000000000001
     Device Type        0x0000000000000000
     Device TypeModifier    0x0000000000000000
     Command Queueing    0x0000000000000001
     Bus Type        0x0000000000000009
     Vendor Id        EQLOGIC
     Product Id        100E-00
     Product Revision        6.0
     Serial Number        6090A0881074D4686E17059B9F4365CA
     
     Storage Identifiers
     Version        16
     Identifier Count    2

        Identifier        0
        CodeSet        "VDSStorageIdCodeSetBinary" (1)
        Type        "VDSStorageIdTypeFCPHName" (3)
        Byte Count    16

        60 90 A0 88  10 74 D4 68   6E 17 05 9B  9F 43 65 CA    `....t.hn....Ce.

        Identifier        1
        CodeSet        "VDSStorageIdCodeSetBinary" (1)
        Type        "VDSStorageIdTypeVendorSpecific" (0)
        Byte Count    16

        01 00 00 00  1F BF 0E 6A   00 00 00 3F  00 00 10 54    .......j...?...T
     
     

    Operation:
       Exposing Volumes
       Locating shadow-copy LUNs
       PostSnapshot Event
       Executing Asynchronous Operation

    Context:
       Execution Context: Provider
       Provider Name: Dell EqualLogic VSS HW Provider
       Provider Version: 4.5.0
       Provider ID: {d4689bdf-7b60-4f6e-9afb-2d13c01b12ea}
       Current State: DoSnapshotSet

    Event: 8194

    Source: VSS

    Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface.  hr = 0x80070005, Access is denied.
    . This is often caused by incorrect security settings in either the writer or requestor process.

    Operation:
       Gathering Writer Data

    Context:
       Writer Class Id: {e8132975-6f93-4464-a53e-1050253ae220}
       Writer Name: System Writer
       Writer Instance ID: {d70791b2-f0fe-416e-bbea-e631878ee313}

    Tuesday, April 23, 2013 10:11 PM

All replies

  • The error "A function call was made when the object was in an incorrect state" and the VSS Writer timing out seem to indicate we are having problems accessing the CSV during backups.   The following registry settings allow you to make adjustments to how DPM performs retries to claim the CSV in order to get reliable backups.

    CsvMaxRetryAttempt - Adjust the maximum number of times (Default is 1) the DPM agent will attempt to claim the CSV volume. The value 0xC8 = 200 times.
    CsvAttemptWaitTime - Adjusts the amount of time in milliseconds to wait between retry attempts.  The value 0x2bf20 = 3 minutes.

    To change the values for these registry settings follow the steps below.

    1) Copy the following in notepad, then save the file as csvretry.reg

    Windows Registry Editor Version
    5.00
    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Data Protection
    Manager\Agent\CSV]
    "CsvMaxRetryAttempt"=dword:000000C8
    "CsvAttemptWaitTime"=dword:0002bf20

    2) Copy the csvretry.reg file to each node in the cluster.

    3) Logon to each node in the cluster as an administrator, then right-click
    the csvretry.reg file and select "open with" - then "Registry Editor" option to
    import the registry settings.


    -------------------- Regards, Michael V [MSFT] - This posting is provided "AS IS" with no warranties, and confers no rights.

    Tuesday, April 23, 2013 10:55 PM
    Moderator
  • The values

    "CsvMaxRetryAttempt"=dword:000000C8
    "CsvAttemptWaitTime"=dword:0002bf20

    were already set on all five nodes.

    Wednesday, April 24, 2013 3:25 AM
  • I am anxiously awaiting an answer. I almost have the exact same setup with the same problem. I went so far as to completely wipe my DPM Server and reload it from scratch in hopes of a fix.

    daves

    Thursday, April 25, 2013 6:37 PM
  • I did some testing today with disabling the EqualLogic hardware VSS provider and so far it seems to be working.  That is not a solution.  However, if my overnight backups succeed in one pass it points me in Dell's direction rather than Microsoft's.  Are you using EqualLogic as well Daves?  This morning, I emailed a contact I have at Dell that is a System Center/virtualization/EqualLogic specialist that has been very helpful in the past.  I asked him if this is a known issue and if there is any additional configuration required to support DPM using hardware snapshots of CSVs on Server 2012 with EqualLogic storage.  We'll see if he has any insights when he replies.  Barring that I am going to open a case with entry level Dell support and see where that takes me.
    Thursday, April 25, 2013 7:07 PM
  • We are using an EqualLogic SAN with the latest 4.5 HIT. With the ASM installed we have tried with both logging into the PS group and not.  We tried disabling the hardware VSS and just going with the software. That made for a nice morning the next day as it caused our cluster to crash.

    Almost all of our VM backups show critical with the following:

     The VSS application writer or the VSS provider is in a bad state. Either it was already in a bad state or it entered a bad state during the current operation. (ID 30111 Details: VssError:A function call was made when the object was in an incorrect state
    for that function
     (0x80042301))
     

    Sometimes I can clear this message by manually creating a recovery point or doing a consistency check, one-by-one.
     


    daves

    Friday, April 26, 2013 1:07 PM
  • Using the software provider didn't crash my cluster, but it has been randomly crashing some VMs (not just ones that I am backing up) since I turned it on.  I am about to switch back to the hardware provider today.  My contact at Dell hasn't responded to my email yet, so I will probably start a support case with Dell today or tomorrow and see if they have any solutions.  I just hope that I don't end up in a situation where Dell is pointing the finger at Microsoft and Microsoft is pointing back at Dell.
    Monday, April 29, 2013 2:39 PM
  • Just another comment from someone else having what looks to be the same issue watching for an answer.

    Windows 2012 Hyper-V Cluster. DPM 2012 SP1. Dell EqualLogic SAN on firmware 6.0.2. HV Hosts running Dell HIT 4.5

    Direct DPM backup of VMs works. Backup of the VMs via cluster Child Partitions fails:

    The VSS application writer or the VSS provider is in a bad state. Either it was already in a bad state or it entered a bad state during the current operation. (ID 30111 Details: VssError:A function call was made when the object was in an incorrect state
    for that function
     (0x80042301))


    Steve Ware

    Monday, April 29, 2013 3:55 PM
  • Steve, what exactly do you mean by direct backup of VMs?

    I am trying some things on my end today (somewhat at random).  Are either of you integrating SCDPM with SCVMM as per Protecting Hyper-V machines?  Did you create a new protection group from scratch for your cluster, or did you re-use an existing one (perhaps one that was already protecting stand-alone hosts or hosts running a different version of Windows)?

    Monday, April 29, 2013 4:18 PM
  • Not using SCVMM at all. My current workaround is just treating from DPM's point of view the clustered VM as a standalone machine.

    I used the same Protection Group I was using to try to do full VM backups through the Cluster. But rather I installed the DPM agent on the VM (which I think is needed for granual recover anyway when backing up through the cluster) and rather than selecting under Modify Protection Group

    DomainName\Clustername\SCVMM VMName Resources\HyperV

    I selected things from

    DomainName\VMName\files/etc I want backed up


    Steve Ware

    Monday, April 29, 2013 4:26 PM
  • We had our cluster crash again over the weekend. When the backup job started it ran ok for a couple of minutes then we had several servers go into a consistency check. This time we had all servers (4 hosts) logged in via HIT to the PS group with all the services logged in. We did not have the DPM server logged in to HIT on purpose. Does DPM need direct access to the CSV in the iscsi initiator? How are you guys running it? I would hate to install the DPM client on all the VM's. I am about to call it quits on backing the whole group and just make an individual job for each VM server. So infuriating.

    daves

    Monday, April 29, 2013 6:30 PM
  • I think we have our backup situation worked out now. It is not the ideal solution but it will work. We took our single CSV hosting multiple VM's and broke it into 4 separate CSV's. We then applied the serial backup registry mod  and xml file. We now have several backup jobs instead of one, with each job set to backup one node in each CSV. Last night I was able to backup without errors.

    daves

    Tuesday, April 30, 2013 3:26 PM
  • I don't want to burst your bubble, but the first night I switched to software VSS backups it worked perfectly.  Then the next night it all fell apart.  Hopefully you have more success.
    Tuesday, April 30, 2013 3:54 PM
  • I would like to jump in as well.

    I have a 6 Node Hyper-V cluster.  All servers are running 2012 Standard, they are connected to an Equallogic array running v6.0.2, all servers are running HIT 4.5.0.6492. Currently I have 25 VMs running on my cluster.

    I am running DPM 2012 Build 4.1.3408.0.

    Some nights most of the machines seem to backup fine, other nights I am left with almost half that did not backup correctly.  If they do not backup correctly I can often run the job again and have them work.  Other times they won't work until I move the VMs off that host, reboot it and then move them back.

    For errors I have seen a number of times where after running the "vssadmin list writers" The Microsoft Hyper-V VSS Writer has a state of [10] Failed, Error Time out.  I have also seen [7], but I don't have that up right now so I'll have to grab the details of that one again. 

    Watching on the EqualLogic I see some different results.  There are times I can see the snapshot created, set online and logged into.  I see no data transferred by DPM.  Then I see logout requests received from the initiator.  The snapshot is then deleted.  After that I see another login request, which fails since the snapshot has been deleted.

    I have also seen it not create the snapshot, and just the job eventually fail on dpm.

    I'm not sure what is going on, so any ideas that anyone else finds would be great.  If there is any more information that would be helpful please let me know.

    I have previously seen machines get in a state where backups would fail, when I would check in the Failover Cluster Manager interface the machine would be listed as Running (Locked), then when I would check in Hyper-V console it would say the machine was being backed up.  When looking in DPM there were no jobs running backing the machine up.  I would then have to power down the VM and reboot the host.  I couldn't figure out how to unlock the machine so I couldn't migrate it etc.  Once I rebooted I could backup the machine.

    I don't see the reg HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Data Protection Manager\Agent\CSV, the CSV key does not exist.  So I'm not sure if I need to create that then the values below it.

    As I try more or learn anything I'll update this with more information.

    Thanks for any help.

    Thursday, May 2, 2013 8:22 PM
  • Eric, you may need to create that registry key.  I don't know for sure as I am setting it via GPP so it is created automatically for me.

    I have temporarily suspended my backups.  We added a few more VMs and now VMs will sometimes go offline during the backup (even ones that we don't backup at all) which was unacceptable.  I have opened a support case with Dell to see if they can figure it out (Dell support = Free, Microsoft support = pay unless they admit that it is a bug on their end).  So far Dell support has just asked me to provide logs, so we'll see where it goes from there.


    • Edited by gregg79 Friday, May 3, 2013 3:52 PM Added extra detail
    Friday, May 3, 2013 3:51 PM
  • I've been seeing the same errors when backing up using DPM 2012 SP1. 

    My environment:

    • 3 Hyper-V Node Cluster - Windows Server 2012 with Equallogic HIT 4.5.0
    • 3 CSV Volumes for my VMs
    • 23 Virtual Machines (Linux, Windows 2003, Windows 2008 R2, Windows 2012)
    • 1 Dell Equallogic PS4000 Array
    • DPM 2012 SP1

    My initial attempt to backup my virtual machines resulted in one of my Hyper-V nodes getting hung up. It appears that it ran out of memory due to a memory leak. After installing the hotfix from Microsoft (KB 2813630) that addresses know issues with CSV backup, I was able to have more success in my backups. However I'm still getting Event: 12363 "An expected hidden volume arrival did not complete because this LUN was not detected." from time to time. There also still seems to be a memory leak on the Hyper-V node that is holding the CSV volume.

    I'm also seeing the following errors:

    - 

    Hyper-V-VMMS - Event ID: 19050

    'vm-name' failed to perform the operation. The virtual machine is not in a valid state to perform the operation. 

    Hyper-V-VMMS - Event ID: 16010

    The operation failed.

    VSS - Event ID: 8194 

    Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface.  hr = 0x80070005, Access is denied.
    . This is often caused by incorrect security settings in either the writer or requestor process. 

    Operation:
       Gathering Writer Data

    Context:
       Writer Class Id: {e8132975-6f93-4464-a53e-1050253ae220}
       Writer Name: System Writer
       Writer Instance ID: {830e49ca-131e-499f-b35b-73a6b4b0ded4}

    FilterManager - Event ID 3

    Filter Manager failed to attach to volume '\Device\HarddiskVolume65'.  This volume will be unavailable for filtering until a reboot.  The final status was 0xC03A001C.

    volsnap - Event ID: 27

    The shadow copies of volume \\?\Volume{04ef607f-b7f6-11e2-93fa-de1da79be6cb} were aborted during detection because a critical control file could not be opened.

    It seems to me that my problems are a combination of the Dell VSS provider which is causing Event 12363 and Microsoft bugs which are causing memory leaks.

    See http://social.technet.microsoft.com/Forums/en-US/dpmhypervbackup/thread/604409df-ada1-47d1-bdfb-3f938cde0b59

    http://up2v.nl/2013/03/12/storage-issues-on-windows-server-2012-hyper-v-microsoft-struggling-to-fix/

    Friday, May 10, 2013 9:33 PM
  • As part of the testing Dell is having me do, they had me disable the EqualLogic hardware VSS provider using the command:

    "C:\Program Files\EqualLogic\bin\eqlvss" /unregserver (it can be undone via "C:\Program Files\EqualLogic\bin\eqlvss" /regserver)

    Since doing that on Sunday, I haven't had a single SCDPM backup failure.  I also changed the max allowed parallel backups from 3 to 1, but I just switched it back to 3 so we will see how it goes tonight.  Obviously this isn't a fix, but it may work as a band-aid for everyone in the short term.  If I get anything new from Dell I'll make sure to post it.

    Wednesday, May 15, 2013 6:36 PM
  • I can confirm, that disableling the EQL VSS writer will resolve some issues with csv backup.


    This posting is provided "AS IS" with no warranties.

    Thursday, May 16, 2013 1:06 PM
  • Same here.

    daves

    Thursday, May 16, 2013 1:21 PM
  • Hi Guys,

    I'm having exactly the same issue.

    I'd like to find a solution that allows me to still use the hardware vss provider. If you disable it won't your backups run in serial and send your SAN into redirected access mode? The backups will work but your guest servers will take a BIG performance hit.

    Anyone found this?

    Cheers,

    Jon

    Thursday, May 16, 2013 2:17 PM
  • Windows 2012 CSV doesn't do redirected mode anymore
    Thursday, May 16, 2013 2:19 PM
  • ah ok... surely using software vss would still cause performance issues and also make backups take a lot longer?

    Do you think this is an issue with the dell HW VSS? or a possible configuration error?

    Thursday, May 16, 2013 2:24 PM
  • Marcus is right, the penalty for using the software VSS provider in 2012 is substantially reduced (no redirected IO mode and parallel backups are supported).  However, I agree with you Jon.  I still want to use the hardware provider.  I'm not sure what the penalty for using software vs hardware is anymore, but I still want to use hardware.  In fact, I just emailed my case manager at Dell and told him the same thing.

    As for the cause, it could be a bug or a misconfiguration (or both).  If it is a misconfiguration, it must not be something they have documented as all of us are having the issue and no one has found a solution at the present time.

    Thursday, May 16, 2013 2:28 PM
  • squeee! Fixed it!

    Go into Dell ASM on your hosts, Settings, MPIO, Uncheck 'Use MPIO for snapshots'

    Done. Hardware vss provider working. :D

    Thursday, May 16, 2013 2:37 PM
  • Thanks Jon!  That is definitely progress, and I have forwarded that info on to Dell.  I would still say that it isn't fully fixed until you don't have to uncheck that checkbox, but it is definitely a huge step in the right direction.  I won't be able to test it until this evening, but I will let you know if I can reproduce your results.
    Thursday, May 16, 2013 3:12 PM
  • squeee! Fixed it!

    Go into Dell ASM on your hosts, Settings, MPIO, Uncheck 'Use MPIO for snapshots'

    Done. Hardware vss provider working. :D

    I will give it a try tomorrow.

    This posting is provided "AS IS" with no warranties.

    Thursday, May 16, 2013 4:03 PM
  • Did this work for anyone besides Jon?  I didn't fix it for me.  Now my backups start, do nothing, and eventually time out.
    Friday, May 24, 2013 12:20 PM
  • Disabling MPIO for snapshots didn't fix the problem for me.

    I have since disabled the Equallogic hardware VSS provider. After rebooting the cluster my memory leaks have gone away and my backups are happening without any failures. Waiting for a fix so I can use the Equallogic hardware VSS provider.

    Friday, May 24, 2013 1:38 PM
  • Disabling MPIO for snapshots had no effect for me either.

    It still looks random for me wich VM will be backed up and wich one will not.


    This posting is provided "AS IS" with no warranties.

    Monday, May 27, 2013 6:15 PM
  • I just tried a diffent approach and it looks promising or at least it seems to get closer to the core of the issue.

    I did a repair install of the HIT on all my hosts and reconfigured group access - I might have messed something up during my testing ...

    • the hardware vss provider is enabled
    • a domain account / local admin on the hosts is used as a service account
    • snapshot are stored in shared directory, MPIO is used for snapshots, etc.
    • On the DPM server I changed the registry key 'MaxAllowedParallelBackups' from 3 to 1
    Windows Registry Editor Version 5.00
    
    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Data Protection Manager\2.0\Configuration\MaxAllowedParallelBackups]
    "Microsoft Hyper-V"=dword:00000001

    ---

    Now one host can create only one hardware snapshot at a time. The backup is working ... for now

    There are several down trades with this setup.

    • Since the location of a VM in a cluster is unpredictable, you might have all VMs from a protection group queued up on one host waiting to be backed up.
    • If a queued VM is migrated to another host in the cluster, the backup will fail (even with VMM integration setup)
    • Backups will take more time to finish, since one host can only backup one VM

    ---

    From what I have observed and found in my EQL and host logs, it looks like there is a problem when one host is accessing multiple snapshots at the same time.

    Multiple snapshot on the same CSV from different hosts don't seem to cause any problems.


    This posting is provided "AS IS" with no warranties.

    Tuesday, May 28, 2013 2:41 PM
  • I have almost the exact same scenario as gregg79 and I'm seeing the same errors/issues. Please post when you find the solution.
    Thursday, May 30, 2013 3:30 PM
  • "I have almost the exact same scenario as gregg79 and I'm seeing the same errors/issues. Please post when you find the solution." +1

    I open a support case at DELL/Equalogic for the error : Event: 12363 / Source: VSS

    I open a support case at Microsoft for the error : Event: 8194 / Source: VSS. I think it's a DCOM security problem...

    Equalogic ask me too to disable the EqualLogic hardware VSS provider using the command:

    "C:\Program Files\EqualLogic\bin\eqlvss" /unregserver
    No backup failure since...


    Frédéric OGUER






    • Edited by F.OGUER Monday, June 3, 2013 3:55 PM dcom and not scom...
    Friday, May 31, 2013 11:29 AM
  • "I have almost the exact same scenario as gregg79 and I'm seeing the same errors/issues. Please post when you find the solution." +1

    I open a support case at DELL/Equalogic for the error : Event: 12363 / Source: VSS

    I open a support case at Microsoft for the error : Event: 8194 / Source: VSS. I think it's a SCOM security problem...

    Equalogic ask me too to disable the EqualLogic hardware VSS provider using the command:

    "C:\Program Files\EqualLogic\bin\eqlvss" /unregserver
    No backup failure since...


    Frédéric OGUER


    As 


    I can confirm that disabling the Equallogic hardware provider works. However as JonathonMoore, gregg79, and rpanna have stated, I would like to get the EQL HW provider working. I would say at this point the thread has shifted to "How do I get the EqualLogic Hardware Provider to work with Server 2012, DPM 2012, Failover-Cluster, and CSV?"
    Monday, June 3, 2013 3:05 PM
  • Hello,

    the issue might not be on the EQL side alone.

    I have got two 2012 clusters in my lab, one with EQL HIT and the other one without. Both clusters having problems with DPM backup. The only workaround I found so far is to set the MaxAllowedParallelBackups to 1. This seems to get the backup running half way stable - at least I have had a 'success rate' of 85% completed backups over the weekend. Still annoying, but I haven't found a better solution yet.

    You might also have a look at this thread:

    http://social.technet.microsoft.com/Forums/de-DE/dpmhypervbackup/thread/604409df-ada1-47d1-bdfb-3f938cde0b59


    This posting is provided "AS IS" with no warranties.

    Tuesday, June 4, 2013 5:56 AM
  • Hi Dark Grant,

    I'm in French environment, and you ? German or English ?

    "the issue might not be on the EQL side alone." -> i'm agree so i called the DELL/Equalogic and the Microsoft Support last week.

    My Equalogic support case is at level 2 for the moment.

    I'm waiting Microsoft call me back...

    About your thread, i never get the STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021).

    For me, http://support.microsoft.com/kb/2838669 contains every fix:

    Csvflt.sys 6.2.9200.20682
    Clussvc.exe 6.2.9200.20686
    Csvfs.sys 6.2.9200.20686
    Fssagent.dll 6.2.9200.20682
    Kernelbase.dll  6.2.9200.20682
    NTFS.sys 6.2.9200.20684
    Rdbss.sys  6.2.9200.20685
    Srv2.sys  6.2.9200.20685
    Kernelbase.dll  6.2.9200.20685

    I don't disabled ODX.

    I installed DPM 2012 SP1 UR2 : http://blogs.technet.com/b/dpm/archive/2013/04/11/update-rollup-2-for-system-center-2012-service-pack-1-dpm-updates.aspx

    I'm using 10Gb network.

    I disabled Equalogic Hardward Provider only on 1 node and backup works on this one !

    Except the 8194 Error it's perhaps only a EQL error...

    I'm using Firmware 6.0.4. the are a fix about multihost and snapshot :"A snapshot of a volume with multi-host access enabled displayed the following error if multiple hosts attempted to simultaneously access the snapshot: Initiator cannot access this target because an iSCSI session from another initiator already exists and multihost access is not enabled for this target. [Tracking #: 635738]". But it's not linked...

    In your lab :

    • Can you upgrade the Equalogic firmware ?
    • What's your dpm agent version ?

    Frédéric OGUER

    Tuesday, June 4, 2013 9:36 AM
  • Microsoft call me back for the 8194 Error.

    It's a windows 8 BUG on System Writer with DHCP, WINS and CSV ???

    They are investigated...

    If you try to configure Snapshot on your hypervisor, you will see only your System drive.


    Frédéric OGUER

    Tuesday, June 4, 2013 1:20 PM
  • Hello Frederic,

    I updated my EQL to 6.0.4 yesterday. Still got the same problems with the EQL VSS provider.

    Disabling ODX didn't help either.

    I have got RU2 for SC 2012SP1 and KB2838669 installed. I doubled checked the file version yesterday.

    I have got DPM agent version 4.1.3408.0 (DPMRA.exe)

    For now I will stick with software vss. I just usinstalled all HIT/ASM components, except PowerShell module, MPIO and SMI

    ----

    btw: I am from germany


    This posting is provided "AS IS" with no warranties.

    Wednesday, June 5, 2013 1:28 PM
  • I have a case open with Dell EqualLogic Support and a case open with Microsoft Product Support. My Dell support representative is telling me that all of the virtual machines need to have the v4.5 HIT Kit installed. There are no disks that are directly connected to the guest OS using an iSCSI initiator. Has anyone else heard this from Dell support? The Dell documentation seems to indicate this is not the case.

    Regards,

    Kevin

    -- Dell EqualLogic Documentation: --

    According to the PDF documents you sent me; the HIT KIT is only required for guest virtual machines running from host servers running Server 2008 R2.

    The following is from the “HIT_Install_User_Guide_V4.5.pdf” you sent me earlier.

    Installing HIT in Each VM in a Windows Server 2008 R2 CSV Configuration

    For Smart Copies of Virtual Machines to work correctly in a Windows Server 2008 R2 configuration using Cluster Shared Volumes (CSVs), the Host Integration Tools must be installed in each VM. This applies to all such configurations, including Enterprise, Datacenter, Core or any other Microsoft Server release configuration, but only for Windows Server 2008 R2. This does not apply to CSVs with Windows Server 2012.

    Wednesday, June 5, 2013 2:37 PM
  • Hi Kevin,

    Yes, "This does not apply to CSVs with Windows Server 2012". You can read the HIT 4.5 release notes if you need more information :

    Microsoft Windows 8 and Server 2012 Support

    HIT/ME now supports Microsoft Windows 8 and Windows Server 2012. With Windows Server 2012, Smart Copies of the cluster shared volume (CSV) are now application consistent, not just file system consistent. You no longer need to install ASM/ME in the Hyper-V VMs; you only need to install it on the cluster nodes.

    VMs and volumes are now manageable from every cluster node (the icons are blue, not gray). You do not have to change the coordination node or move volumes. Therefore, the

    Move CSV action no longer applies to volumes in Windows Server 2012.

    The following restrictions apply:

    • You cannot create replica Smart Copies on either Windows 8 or Windows Server 2012.
    • To perform a selective restore operation on Windows Server 2012, you must start the operation on the cluster node that owns the VM (but you do not need to move volumes).


    Frédéric OGUER



    • Edited by F.OGUER Wednesday, June 5, 2013 3:02 PM mistake
    Wednesday, June 5, 2013 2:58 PM
  • Hello Kevin,

    I found a thread on DELL forum, recommending the same.

    I will not even think about installing integration components in every VM. There are enough others vendors waiting for new customers.


    This posting is provided "AS IS" with no warranties.

    Wednesday, June 5, 2013 3:03 PM
  • Hi Dark,

    Can you send us the link on the DELL forum thread ?

    I read something about XML for VSS on 2003 and 2008 is not the same...

    Do you backup 2003 VM ?

    Yesterday I get a 5142 Cluster Error : « ERROR_TIMEOUT(1460) » on the CSV !

    This Morning :

    • I disabled TRIM Feature : fsutil behavior set disabledeletenotify 1        
    • I disable ODX Feature by modifying the key:HKLM\System\CurrentControlSet\Control\File System\FilterSupportedFeaturesMode.
    • I unregister Hardware VSS provider on all host
    • I reboot all machines


    Frédéric OGUER

    Thursday, June 6, 2013 8:40 AM
  • Hello,

    I just got some promising results from my tests.

    Since I have the ASM component removed from my servers, I haven't had any backup or cluster errors.

    ---

    • Cluster one
    • -- first running completly without EQL Tools and giving me cluster error 5120 and 5217
    • -- had no errors since I installed PS, MPIO and SMI from the HIT
    • -- 18 hours / 23 VMs / >200 recovery points without errors. DPM is set to backup the VMs every hour between 8:00 and 22:00 - just to see how far I can push it and when it will break the cluster

    ---

    • Cluster two
    • -- first running with a full EQL HIT installation and giving me issues with failed backups on the DPM - no errors in cluster manager
    • -- had no errors since I removed the ASM component from the HIT
    • -- 24 hours / 60 VMs / >100 recovery points without errors using a regular backup schedule

    ---

    The only HIT components I have installed are:

    • PowerShell Tools
    • MPIO DSM
    • SMP

    ---

    The next thing will be:

    • activating ODX again - I disabled it yesterday
    • Increase the 'MaxAllowedParallelBackups' - I have set it to 1 at the moment
    • Setup a new cluster to make sure the positive trend is not a side effect of my testing and that the results are reproducible on a fresh installation

    ---

    I will keep you updated on the progress of my testing


    This posting is provided "AS IS" with no warranties.

    Thursday, June 6, 2013 8:51 AM
  • Some news:

    Setting the 'MaxAllowedParallelBackups' back to the deafult value of 3 immediately results in cluster error 5120 and 5217 during the following backup cycle. Turning it back to 1 resolves the cluster errors.

    ODX doesn't seem to play a role here. I reactivated ODX and the backups are still running fine.


    This posting is provided "AS IS" with no warranties.

    Thursday, June 6, 2013 11:03 AM
  • Some news from supports :

    Event: 12363 Source: VSS : An expected hidden volume arrival did not complete because this LUN was not detected.

    It's a problem of communication between the different services when the account used by ASM is different from the local System account.
    We must add the account used in the Microsoft VSS provider.
    There is the same error on Windows 2003 and ASM 4.0. ASM could not launch SmartCopy as VSS could not be used with the modified account in the ASM configuration.
    For the moment the only "workaround" is to change the registry to add the account and give it the right to the writer and the requester.
    The DEll/equalogic support "are up a DPM architecture to give the most information possible for developers back."

    Event: 8194 Source: VSS : IVssWriterCallback interface.  hr = 0x80070005, Access is denied

    We reproduce the error this morning and send every log to the Microsoft Support (application log, idna of VSS and Cryptsvc, ProcMon). No news for the moment but i think it's a DCOM security problem on no English operationg system...

    Event: 5120/5142/5217 Source : Cluster

    No news for the moment...


    Frédéric OGUER

    Thursday, June 6, 2013 1:34 PM
  • Some news from supports :

    Event: 12363 Source: VSS : An expected hidden volume arrival did not complete because this LUN was not detected.

    No news for the moment, but we detect another problem on the PSM4110 passive controller... it'll correct in the 6.0.5 firmware planned to the 15th.

    Event: 8194 Source: VSS : IVssWriterCallback interface.  hr = 0x80070005, Access is denied

    This problem appears only as part of a cluster ; This event is visible on nodes other than the one who initiated the call to VSS.
    During a VSS call, the Cluster service sends requests to all nodes through the GUM (Global Update Manager). Because the "System Writer" is hosted by the encryption service (cryptographic service or cryptsvc) and that it is executed in a context "Network Service" instead of "System", the return of COM calls a meeting Denied Access because different impersonnations on other cluster nodes
    The problem will not be fixed as it has no functional involvement
    Events can be ignored

    Event: 5120/5142/5217 Source : Cluster

    No news for the moment...


    Frédéric OGUER

    Thursday, June 13, 2013 12:35 PM
  • New Hotfix :

    http://support.microsoft.com/kb/2848344

    Csvflt.sys
    6.2.9200.20712
    Clussvc.exe 6.2.9200.20712
    Csvfs.sys 6.2.9200.20712
    Fssagent.dll 6.2.9200.20712
    Kernelbase.dll 6.2.9200.20712
    NTFS.sys 6.2.9200.20712
    Rdbss.sys 6.2.9200.20712
    Srv2.sys 6.2.9200.20712
    Witness.dll 6.2.9200.20712

    Frédéric OGUER

    Monday, June 17, 2013 2:46 PM
  • New Equallogic Firmware 6.0.5

    An issue that may have caused a passive controller to reboot spontaneously has been corrected to resolve the temporary effect on array redundancy. [Tracking #: 749774]

    In rare occasions when using OffloadDataTransfers (ODX) with Windows 2012 initiators, a specific WriteUsingToken command could have generated an inappropriate response at the target that may have resulted in a controller failover. (See T10 specifications re: WriteUsingToken command) [Tracking #: 762035]


    Frédéric OGUER

    Tuesday, June 18, 2013 9:57 AM
  • DPM has run for several days without issue with MaxParalellBackups set to 1. (Without issue means that DPM backups are successful. There are still VSS EQL HW Provider errors in the application event logs.) I have noticed that DPM works better if there is some available memory on the host. During the last round of tests I distributed the VM's across all of the hosts to allow for at least 5GB available memory on each host server.

    Kevin

    Thursday, June 20, 2013 10:18 PM
  • Hi

    My Environment is:

    4 Hyper-V Datacenter Cluster Server 2012 without Hotfix from 6/14/2013

    2 Equalogic PS6010 Firmware 6.02

    Hitkit 4.5

    DPM 2012 SP1 Rollup 2

    Have the same Problems. The new Hotfix (http://support.microsoft.com/kb/2848344) let crash my Servers. So I removed again. I find out, if you move the failed Machine, the Replication come  back in a successful State! And Guys, check all your Backup Points. About 2 Weeks, I will restore a Machine, but over 3 Months the Replica say, everything is OK, but no Recovery Points was created!!

    Now I have enabled the Option (Run a daily consistency Check ... ) for all Protection Groups!

    And now I have a lot of more Replication Errors. But with moving some Machine I can create all Replication successful. I think we have her Timeout Problems.

    An other Problem is the Backup himself. On the Hyper-V Manager you can see if the Backup is running and some Machine cannot  stop this Process. This give an additional Problem. You need to stop this Machine. Remove all Machines from this Cluster server. Restart this Server and it works again! You cannot see that on the Cluster Manager.

    All my SQL, Exchange- and SharePoint Backups running fine. There are virtual Machines. The Problems are only VM's with the Hit Kit from Dell

    I am waiting for a Hotfix from Dell and or MS. And I am in Contact with a Dell Engineer.

    René


    Roendi



    • Edited by Roendi Monday, June 24, 2013 7:47 AM Change
    Monday, June 24, 2013 7:43 AM
  • Hello,

    I applied EQL FW 6.0.5 and KB2848344 on friday.

    KB2848344 has resolved the issue with cluster error 5120 "STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)" on my system.

    I still get cluster error 5217 when I set 'MaxAllowedParallelBackups' to anything else then 1. But It didn't seem to have a negative impact on my environment.

    ---

    KB2848344 has resolved some issues on the MS side. The EQL VSS provider still causes failed backups on my lab cluster.

    I will stick with the Hyper-V VSS provider for now.


    This posting is provided "AS IS" with no warranties.

    Monday, June 24, 2013 8:36 AM
  • Hi

    My MS Consultant give me a lot of addittional Key. I tried this and for today it works!
     
    If anyone will tried this her are the changes:
     
     
    1. increase the timeout period:-       

     - Under  "HKLM\Software\Microsoft\Microsoft Data Protection Manager\Agent"
     
     - Add a DWORD value with name “ConnectionNoActivityTimeoutForNonCCJobs”

     - Set it to 7200 decimal.       

     - Under  "HKLM\Software\Microsoft\Microsoft Data Protection Manager\Agent"
     
     - Add a DWORD with name “ConnectionNoActivityTimeout”
     
     - Set it to 7200 decimal.

    2.   Registry changes to increase the paged pool memory

     - HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\Memory Management

     - Add Value , and then add the following registry value:
     
     - Value name: PoolUsageMaximum

     - Data type: REG_DWORD

     - Radix: Decimal

     - Value data: 60


     - Setting the value at 60 informs the Memory Manager to start the trimming process at
     
     - 60 percent of PagedPoolMax rather than default setting of 80 percent. If a
     
     - threshold of 60 percent is not enough to handle spikes in activity, reduce this
     
     - setting to 50 percent or 40 percent.


     - Value name: PagedPoolSize

     - Data type: REG_DWORD

     - Radix: Hex

     - Value data: 0xFFFFFFFF


     - Setting PagedPoolSize to 0xFFFFFFFF allocates the maximum paged pool in lieu of other resources to the computer.


    3. Restart dpmra services


    Maby it helps

    Best Regards
     
    Röndi

     

     


    • Edited by Roendi Tuesday, June 25, 2013 8:58 AM Change
    Tuesday, June 25, 2013 8:52 AM
  • I thought I'd mention that I am experiencing the same intermittent errors as well:

    The VSS application writer or the VSS provider is in a bad state. Either it was already in a bad state or it entered a bad state during the current operation. (ID 30111 Details: VssError:A function call was made when the object was in an incorrect state
    for that function
     (0x80042301))

    Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface.  hr = 0x80070005, Access is denied.

    . This is often caused by incorrect security settings in either the writer or requestor process.

    Operation:

       Gathering Writer Data

    Context:

       Writer Class Id: {e8132975-6f93-4464-a53e-1050253ae220}

       Writer Name: System Writer

       Writer Instance ID: {ad25ea3e-ce36-4a0a-9500-9f19f989fef3}

    My setup is three 2-node Hyper-V clusters running Server 2012 Core with 2 Dell PS4100 SANs, and approximately 20 VMs. The SANs are running 6.0.5, HIT is 4.5 on the Hosts. All of the Hosts are fully patched (and fresh installs as well...these clusters have all been recently migrated from 2008 R2). I've even upgraded all firmware and drivers on the hosts, and the switches to the latest versions.

    I am having a hell of a time getting backups to work with the EqualLogic HW provider in DPM 2012 SP1 UR2. In fact on one of my clusters I can't even get any replicas of any of my VMs created at all! Going through this thread it seems like there are a myriad of fixes that work sometimes, but before I go and apply a bunch of registry entries to my DPM server, or my cluster hosts, it seems like the general consensus is that the only real fix at the moment to rely on the Hyper-V Software provider by disabling the EqualLogic provider?

    Tuesday, June 25, 2013 4:02 PM
  • it seems like the general consensus is that the only real fix at the moment to rely on the Hyper-V Software provider by disabling the EqualLogic provider?

    No.  You'll enter a whole new world of pain relying on the MS VSS writer.

    http://up2v.nl/2013/06/19/another-update-of-another-update-improves-cluster-resiliency-in-windows-server-2012/

    Incidentally, I have an HP 3PAR StoreServ 7200 and using the VSS writer from 3PAR I get the same errors detailed in this thread so the problem isn't specific to Dell Equalogic.  Seems neither the storage vendors nor Microsoft have worked out how to robustly use VSS with Windows 2012 CSV.

    Until they get it sorted, using agents in the guests is the only sure fire way to consistently backup my VMs.



    • Edited by slinkoff Tuesday, June 25, 2013 9:03 PM spelling
    Tuesday, June 25, 2013 9:01 PM

  • I am having a hell of a time getting backups to work with the EqualLogic HW provider in DPM 2012 SP1 UR2. In fact on one of my clusters I can't even get any replicas of any of my VMs created at all! Going through this thread it seems like there are a myriad of fixes that work sometimes, but before I go and apply a bunch of registry entries to my DPM server, or my cluster hosts, it seems like the general consensus is that the only real fix at the moment to rely on the Hyper-V Software provider by disabling the EqualLogic provider?

    Hello,

    from my personal experience, it's best to uninstall the EQL VSS provider. I have been to various combinations on my clusters and my final solution for now is:

    Only the following HIT componets are installed:

    • PowerShell module
    • DSM
    • SMI Provider

    MPIO is setup via registry (it's explained at the bottom of the site):

    http://en.community.dell.com/techcenter/storage/w/wiki/2678.dell-equallogic-hit-kit-auto-install-script.aspx

    MS Hotfixes installed:

    • KB2838669
    • KB2848344

    On my DPM I have set 'MaxAllowedParallelBackups' to 5

    ---

    One - my small two node cluster - occasionally reports cluster error 5217. But this one may be related to my testing. I have setup DPM to backup all 23 VM every hour.

    My 4 node cluster haven't had any errors since I applied KB2848344 and unsinstalled the EQL VSS provider on friday.

    ---

    Since my backup runs reliable atm, I will drop this topic and wait for a new release of EQL HIT.


    This posting is provided "AS IS" with no warranties.

    Wednesday, June 26, 2013 6:35 AM
  • Hi,

    New hotfix : http://support.microsoft.com/kb/2870270 (replace KB2848344 ?)

    Csvflt.sys 6.2.9200.20712
    Clussvc.exe 6.2.9200.20712
    Csvfs.sys 6.2.9200.20712
    Fssagent.dll 6.2.9200.20712
    Kernelbase.dll 6.2.9200.20712
    Ntfs.sys 6.2.9200.20736
    Rdbss.sys 6.2.9200.20712
    Srv2.sys 6.2.9200.20712
    Witness.dll 6.2.9200.20712
    Kernelbase.dll 6.2.9200.20712

    http://support.microsoft.com/kb/2869923

    Clussvc.exe 6.2.9200.20767
    Csvfs.sys 6.2.9200.20767
    Vhdmp.sys 6.2.9200.20767

    Frédéric OGUER

    Wednesday, July 17, 2013 8:56 AM
  • if (like me) you installed the kb2848344, you need kb2870270 and after kb2869923

    File

    kb 2796995

    kb 2813630

    kb 2838669

    kb 2848344

    kb 2870270

    kb 2869923

    Csvflt.sys

    6.2.9200.20596

    6.2.9200.20626

    6.2.9200.20682

    6.2.9200.20712

    6.2.9200.20712

    -

    Clussvc.exe

    -

    6.2.9200.20623

    6.2.9200.20686

    6.2.9200.20712

    6.2.9200.20712

    6.2.9200.20767

    Csvfs.sys

    -

    -

    6.2.9200.20686

    6.2.9200.20712

    6.2.9200.20712

    6.2.9200.20767

    Fssagent.dll

    -

    -

    6.2.9200.20682

    6.2.9200.20712

    6.2.9200.20712

    -

    Kernelbase.dll

    6.2.9200.20596

    -

    6.2.9200.20682

    6.2.9200.20712

    6.2.9200.20712

    -

    Ntfs.sys

    -

    6.2.9200.20623

    6.2.9200.20684

    6.2.9200.20712

    6.2.9200.20736

    -

    Rdbss.sys

    -

    -

    6.2.9200.20685

    6.2.9200.20712

    6.2.9200.20712

    -

    Srv2.sys

    -

    -

    6.2.9200.20685

    6.2.9200.20712

    6.2.9200.20712

    -

    Witness.dll

    -

    -

    6.2.9200.20685

    6.2.9200.20712

    6.2.9200.20712

    -

    Kernelbase.dll

    -

    -

    -

    -

    6.2.9200.20712

    -

    Vhdmp.sys

    -

    -

    -

    -

    -

    6.2.9200.20767


    Frédéric OGUER

    Wednesday, July 17, 2013 9:19 AM
  • None of these fixes worked for me (even this latest one) until I disabled ODX (which I didn't want to do).  Turned it off and backups all consistently pass now so seems definite cause for my environment. 

    Now I have to see who will take responsibility for this and get it fixed.  I have a 3PAR 7200 which supports ODX (and it was working nicely) so is it HP's problem with their ODX implementation or Microsoft with theirs?!

    Monday, July 22, 2013 3:13 PM
  • I vote for microsoft because I think this is not the latest patch...

    I will turn on ODX and TRIM this week ...

         

    Frédéric OGUER

    Monday, July 22, 2013 3:18 PM

  • I am having a hell of a time getting backups to work with the EqualLogic HW provider in DPM 2012 SP1 UR2. In fact on one of my clusters I can't even get any replicas of any of my VMs created at all! Going through this thread it seems like there are a myriad of fixes that work sometimes, but before I go and apply a bunch of registry entries to my DPM server, or my cluster hosts, it seems like the general consensus is that the only real fix at the moment to rely on the Hyper-V Software provider by disabling the EqualLogic provider?

    Hello,

    from my personal experience, it's best to uninstall the EQL VSS provider. I have been to various combinations on my clusters and my final solution for now is:

    Only the following HIT componets are installed:

    • PowerShell module
    • DSM
    • SMI Provider

    MPIO is setup via registry (it's explained at the bottom of the site):

    http://en.community.dell.com/techcenter/storage/w/wiki/2678.dell-equallogic-hit-kit-auto-install-script.aspx

    MS Hotfixes installed:

    • KB2838669
    • KB2848344

    On my DPM I have set 'MaxAllowedParallelBackups' to 5

    ---

    One - my small two node cluster - occasionally reports cluster error 5217. But this one may be related to my testing. I have setup DPM to backup all 23 VM every hour.

    My 4 node cluster haven't had any errors since I applied KB2848344 and unsinstalled the EQL VSS provider on friday.

    ---

    Since my backup runs reliable atm, I will drop this topic and wait for a new release of EQL HIT.


    This posting is provided "AS IS" with no warranties.


    I wanted to mention that after removing the Hardware VSS provider on my hosts my backups have been completing successfully for the past three weeks without issue. I used the method suggested by

    Given this issue seems to be hardware independent (affecting Dell, and HP products with different NICs and HW providers) I would agree that it's an issue on the Microsoft side.

    Monday, July 22, 2013 3:27 PM
  • Hi,

    Yesterday, I turned on ODX :

    set-itemproperty hklm:\system\currentcontrolset\control\filesystem -name "FilterSupportedFeaturesMode" -value 0

    This night, I get a 5142 ERROR on the CSV (ERROR_TIMEOUT(1460)) let crash my Servers...


    Frédéric OGUER

    Tuesday, July 23, 2013 9:14 AM
  • So it appears enabling ODX on clustered Hyper-V servers when using DPM 2012 SP1 to backup guest VMs with the software VSS writer causes timeout errors on the storage, resulting in VM crashes.

    This is happening on Dell and HP SAN hardware which supports ODX and with the very latest MS hotfixes.

    I'm opening a support case, I want a resolution so I can get ODX back, it was great for live migrations.

    Tuesday, July 23, 2013 9:49 AM
  • So it appears enabling ODX on clustered Hyper-V servers when using DPM 2012 SP1 to backup guest VMs with the software VSS writer causes timeout errors on the storage, resulting in VM crashes.

    This is happening on Dell and HP SAN hardware which supports ODX and with the very latest MS hotfixes.

    I'm opening a support case, I want a resolution so I can get ODX back, it was great for live migrations.

    my support case is 113060610494265...

    I'm using DELL Blades and Equallogic PS SAN wich supports ODX.


    Frédéric OGUER

    Tuesday, July 23, 2013 9:54 AM
  • I installed Hotfix KB2870270 and KB2869923 on the 18th last week and haved had a single cluster error since. Still using MS software provider at the moment.

    DELL just released the HIT 4.6 EPA. I will give it a try tomorrow on one of my clusters.


    This posting is provided >AS IS< with no warranties.

    Tuesday, July 23, 2013 3:13 PM
  • Is ODX enabled or disabled?  Do you care about ODX?
    Tuesday, July 23, 2013 3:17 PM
  • I have ODX enabled ... but right now I don't care about it. Doing some performance metrering and other testing with and without ODX is not very high on my agenda.


    This posting is provided >AS IS< with no warranties.

    Tuesday, July 23, 2013 3:46 PM
  • I have ODX enabled ... but right now I don't care about it. Doing some performance metrering and other testing with and without ODX is not very high on my agenda.


    This posting is provided >AS IS< with no warranties.


    Do you use CSV cache ?

    Frédéric OGUER

    Tuesday, July 23, 2013 4:02 PM
  • I don't use it.

    Allmost Everything ist deployed out-of-the-box, except for jumbo frames and protocol bindings on the iSCSI network.


    This posting is provided >AS IS< with no warranties.

    Tuesday, July 23, 2013 6:11 PM
  • Looks like I still have no luck with the EQL VSS Provider.

    I installed the HIT 4.6 EPA yesterday evening.

    .. failed backups, one backup job stuck for allmos 6 hours now, "IVssWriterCallback" error 8194 spamming my eventlog every 30 sec ...

    Back to Microsoft Software Provider again


    This posting is provided >AS IS< with no warranties.

    Wednesday, July 24, 2013 1:36 PM
  • Same here. HIT 4.6 does not resolve the issue. Two clusters I've added the EQL VSS provider now have repeatedly failing backups. Third cluster with software provider; no errors at all.
    Wednesday, July 24, 2013 2:42 PM
  • Same here. HIT 4.6 does not resolve the issue. Our Cluster have the same Problem again.


    Roendi

    Friday, August 2, 2013 6:04 AM
  • this error is normal

    Event: 8194 Source: VSS : IVssWriterCallback interface.  hr = 0x80070005, Access is denied

    This problem appears only as part of a cluster ; This event is visible onnodes other than the one who initiated the call toVSS.
    During a VSS call, the Cluster servicesends requests to all nodes throughthe GUM (Global Update Manager).Because the "System Writer" is hostedby the encryption service (cryptographicservice or cryptsvc) and that it is executed in a context "NetworkService" instead of "System", the return ofCOM calls a meeting Denied Access because different impersonnationson other cluster nodes
    The problem will not be fixed as it has no functional involvement
    Events can be ignored


    Frédéric OGUER

    Friday, August 9, 2013 7:58 AM
  • Time to jump in on this thread, as we have a similar setup as some others here :
    1* Svr2012 with DPM 2012 SP1 (4.1.3408.0)
    4* Svr2012 nodes in a Failover Cluster with Hit Kit 4.5 (ASM 4.5.0.6492)
    1* EQL PS4110X
    All servers connected by 4* 10Gb on Juniper 2* EX4550
    We use ODX and the EQL hardware provider

    We regularly have error 8194, wich doesn't seem to harm the cluster.
    Finally I pinpointed how to avoid error 12393, which I tought I would share with you.
    We don't use the DPM schedule for backups, since it keeps makings errors that trigger
    multiple consistency checks at a time.  When this occurs, error 12393 happens on at least
    one of the nodes.  And after an 12393 error, we need to reboot every node to get the
    cluster stable again.
    Instead we use a powershell script, which runs all VM backups one by one, four times
    a day.  This works well, ocasionally we have an error in a backup, which gets solved
    the next time DPM runs.  This solution is not how it should be, but it works for us
    for now.
    The biggest problem we still have is when we have to reboot our nodes (patch tuesday
    or other important updates).  After rebooting the four nodes we need to do a Consistency
    Check by hand on every VM to avoid that DPM will do it by itself, because when DPM does it
    it does multiple VMs at a time which triggers an 12393 error.  When I do a CS one by one after
    rebooting the nodes everything stays fine.  But this is rather time consuming (29 VMs in total ~ 4,2TB),
    so I hope there will come a decent solution rather sooner than later.

    So please keep posting your findings, as it will help al lot of people when a good solution becomes available.

    Regards, Bert


    • Edited by Bert Oris Sunday, August 11, 2013 12:08 PM
    Sunday, August 11, 2013 12:08 PM
  • Translate by Google (Part 1/3) :

    Hello,

    Here is a summary of the failures of our cluster HYPER-V 2012.
    When setting up the backup with DPM2012 and VSS Hardware Provider for Equallogic, backup did not work correctements.
    We have opened a file in Microsoft (REG: 113053110480759) on 31/05/2013 for error VSS 8194: IVssWriterCallback interface. hr = 0x80070005, Access is denied
    Microsoft has concluded that this error (8194) was normal and he had to ignore, our problem backups did not come from there.

    Another case has been opened with Equallogic (Case # 877369257) on 03/06/2013 for VSS error 12363: An expected arrival hidden volume About did not complete LUN Because this was not detected.
    Equallogic asked us to disable the Hardware Provider and use the Software Provider to see if the problem was much Equallogic.

    At that time, we had problems with our cluster with several downtime and we opened a file in Microsoft Cluster for the 5120/5142/5217 06/06/2013 mistakes.

    The configuration of our HYPER-V nodes as follows:


    We immediately disabled the ODX functions TRIM / UNMAP.
    CSV volumes without cover had no problems.

    After several weeks of testing and installation of numerous fixes the problem is still there.
    Regarding CSV, here is a list of Microsoft KB with affected files:

    File

    kb 2796995

    kb 2813630

    kb 2838669

    kb 2848344

    kb 2870270

    kb 2869923

    Csvflt.sys

    6.2.9200.20596

    6.2.9200.20626

    6.2.9200.20682

    6.2.9200.20712

    6.2.9200.20712

    -

    Clussvc.exe

    -

    6.2.9200.20623

    6.2.9200.20686

    6.2.9200.20712

    6.2.9200.20712

    6.2.9200.20767

    Csvfs.sys

    -

    -

    6.2.9200.20686

    6.2.9200.20712

    6.2.9200.20712

    6.2.9200.20767

    Fssagent.dll

    -

    -

    6.2.9200.20682

    6.2.9200.20712

    6.2.9200.20712

    -

    Kernelbase.dll

    6.2.9200.20596

    -

    6.2.9200.20682

    6.2.9200.20712

    6.2.9200.20712

    -

    Ntfs.sys

    -

    6.2.9200.20623

    6.2.9200.20684

    6.2.9200.20712

    6.2.9200.20736

    -

    Rdbss.sys

    -

    -

    6.2.9200.20685

    6.2.9200.20712

    6.2.9200.20712

    -

    Srv2.sys

    -

    -

    6.2.9200.20685

    6.2.9200.20712

    6.2.9200.20712

    -

    Witness.dll

    -

    -

    6.2.9200.20685

    6.2.9200.20712

    6.2.9200.20712

    -

    Kernelbase.dll

    -

    -

    -

    -

    6.2.9200.20712

    -

    Vhdmp.sys

    -

    -

    -

    -

    -

    6.2.9200.20767



    On 01/08/2013, the Microsoft dev suggests a problem on the network:
    "Since the pattern is That The IO is failing with timeout over SMB, I would SUGGEST looking at the network capabilities of this cluster. There Were three instances with 5120 IO timeout. Two of Them Were Caused When CSVFS redirected IO was in fashion for snapshot operations. Once it was in Direct IO mode (view all metadata ops Will still go over SMB). Those out of three, one was very quick to Recovering in Active state. One resulted in the 5142 event as Explained above. And another Took ~ 2 minutes, All which is still not good Because SMB scale-out customers IO Will fail if this cluster is used as a scale-out file server. "

    We first checked the storage network with:
    - Updating M8024-k switches
    - Updated maps and activation Broadcom iSCSI offload (replacing NDIS -. Equallogic see White Paper of July 2013)

    Our configuration is now as follows:



    Frédéric OGUER

    Monday, August 12, 2013 12:12 PM
  • Translate by Google (Part 2/3) :

    On 05/08/2013, the Microsoft support is geared more towards a problem redirect I / O on the CSV.
    On 06/08/2013, the Microsoft support asked us to run a network trace between nodes and advises us to enable QoS.

    This QoS must be managed at the "Hyper-V Extensible Switch." Traffic Clustering (CSV + HeartBeat) and Live Migration does not pass through the switch!
    We modify our configuration to all traffic by the "Hyper-V Extensible Switch."

    We realize at this point that the MAC addresses of cards are not correct!
    A file is opened at Dell (No. 880538152) for a problem on the flex addresses.
    The conclusion is that you completely uninstall the drivers and reinstall them. It also means that completely loses the configuration.

    On one of the nodes (HYPERV4), uninstalling the drivers goes wrong and not resettlement.
    We find ourselves obliged to reinstall the complete node.

    Su another node (HYPERV3), uninstalling the drivers went well but the MAC addresses are always bad.
    New call to Dell Support. This time the mac addresses are also poor in the BIOS ... Remapping FLEX address through the CMC.
    We finally create the aggregate with the technician.

    It is then we need to completely redo the configuration. Shortly after we would redo the aggregate, it tells us that already exists!


    Unable to remove the shadow card.
    It seems necessary to remove the Hyper-V switch before installing or updating drivers Broadcom.
    Our problem is not an isolated: http://mikefrobbins.com/2011/04/21/enabling-jumbo-frames-for-iscsi-on-server-core/
    Warning: There is an issue with the Broadcom drivers Version 14.4.8.4 That Could because the network cards to Become inoperable if a virtual switch already exists on your Hyper-V host server and it is running the core install of Windows Server 2008 R2. I have only Experienced this issue on Dell PowerEdge R710 Servers. I have run the same process on Dell PowerEdge 2950 Servers with the same network cards and drivers without issue. If You have a Dell PE R710, Consider Removing the virtual switches before Installing this driver or Be Prepared to reload the Hyper-V host server if you experience this problem.

    We find ourselves obliged to reinstall complete this node.

    Our configuration is as follows:



    Frédéric OGUER

    Monday, August 12, 2013 12:15 PM
  • Translate by Google (Part 3/3) :

    But our mistake, we forgot to put in the creation of the Hyper-V Switch "MinimumBandwidthMode Weight" parameter. The switch does not know how to manage bandwidth and we find ourselves obliged to redo the entire setting with this parameter!

    Once setup is complete, we realize that the SR-IOV is no longer available. This seems logical since this technology allows to override the switch Hyper-V:


    We resettlement our configuration with SR-IOV ($ EnableIov true) without managing bandwidth (MinimumBandwidthMode Weight).

    We try to pass change the MTU to 9000 on the network clustering and Live Migration, but the Hyper-V switch does not pass packets.
    So we pass on an MTU of 1500.

    We also realize that by installing the Dell Kace agent OpenManage 7.3 and the cluster communication is blocked!

    Optimization of Clustering network (CSV + HeartBeat) therefore seems impossible!

    We decide not to use redirect I / O request to the least possible that network and for this we will reinstate the Equallogic VSS Hardware Provider (HIT 4.6 EPA).
    We limit the bandwidth of agents in DPM, we block the number of simultaneous backup one and we increase the TimeOut DPM agents.


    Restarting the service back VMMS operational writer.
    Restart DPMRA Service (DPM agent) to take into account the change.

    But some backups still does not work, with the same message as the original!
    Reminder of EQUALLOGIC support. Telephone point with Stéphane F. Our problem is not isolated. Other similar cases are under investigation since 21/06/2013.

    We need to redo a point 01/09/2013.

    Meanwhile, we must make a choice between hardware provider with errors on backups or software provider with downtime ...

    Sincerely,

    Frédéric OGUER
    SID - IT Manager


    Frédéric OGUER

    Monday, August 12, 2013 12:21 PM
  • The final script used :

    # DELETE ALL
    Remove-VMNetworkAdapter –ManagementOS –Name "MANAGEMENT" 
    Remove-VMNetworkAdapter –ManagementOS –Name "LAN" 
    Remove-VMNetworkAdapter –ManagementOS –Name "CSV"
    Remove-VMNetworkAdapter –ManagementOS –Name "LM" 
    Remove-VMNetworkAdapter –ManagementOS –Name "HEARTBEAT" 
    Get-VMNetworkAdapter –ManagementOS
    
    Remove-VMSwitch "20GbE switch" -force
    Get-VMSwitch
    
    Remove-NetLbfoTeam "2x10GbE Team" -confirm:$false
    Remove-NetLbfoTeam "20GbE switch" -confirm:$false
    get-NetLbfoTeam
    
    #NIC MTU 
    Get-NetAdapterAdvancedProperty -Name NIC1,NIC2 -DisplayName "Jumbo Packet" | Set-NetAdapterAdvancedProperty -RegistryValue "9014"
    
    #Teaming creation
    New-NetLbfoTeam "2x10GbE Team" –TeamMembers NIC1,NIC2 -TeamingMode Lacp -LoadBalancingAlgorithm TransportPorts –TeamNicName "2x10GbE" -confirm:$false
    # !!! attendre le temps qu'il installe les drivers multiplexor et la carte
    Sleep 60
    Get-NetLbfoTeam
    
    #VMswitch with SR-IOV !! –MinimumBandwidthMode Weight
    New-VMSwitch "20GbE switch" -NetAdapterName "2x10GbE" -EnableIov $true -AllowManagementOS $false
    #Set-VMSwitch "20GbE switch" -DefaultFlowMinimumBandwidthWeight 30
    
    #VMNetworkAdapter
    # l'ordre des réseaux sera l'inverse de l'ordre de création : LAN, LM, CSV, HEARTBEAT
    #Add-VMNetworkAdapter –ManagementOS –Name "HEARTBEAT" –SwitchName "20GbE switch"
    Add-VMNetworkAdapter –ManagementOS –Name "CLUSTER" –SwitchName "20GbE switch"
    Add-VMNetworkAdapter –ManagementOS –Name "LM" –SwitchName "20GbE switch"
    Add-VMNetworkAdapter –ManagementOS –Name "LAN" –SwitchName "20GbE switch"
    
    #VLAN on VMNetworkadapter
    Set-VMNetworkAdapterVlan  –ManagementOS –VMNetworkAdapterName "LAN" -Access -VlanId 11
    Set-VMNetworkAdapterVlan  –ManagementOS –VMNetworkAdapterName "CLUSTER" -Access -VlanId 12
    Set-VMNetworkAdapterVlan  –ManagementOS –VMNetworkAdapterName "LM" -Access -VlanId 13
    #Set-VMNetworkAdapterVlan  –ManagementOS –VMNetworkAdapterName "HEARTBEAT" -Access -VlanId 14
    
    
    #BandwidthWeight: 30(default)+20+30+10+10 = 100
    #Set-VMNetworkAdapter –ManagementOS –Name "LAN" –MinimumBandwidthWeight 10
    #Set-VMNetworkAdapter –ManagementOS –Name "CLUSTER" –MinimumBandwidthWeight 20
    #Set-VMNetworkAdapter –ManagementOS –Name "LM" –MinimumBandwidthWeight 30
    #Set-VMNetworkAdapter –ManagementOS –Name "HEARTBEAT" –MinimumBandwidthWeight 10
    
    #IeeePriority
    Set-VMNetworkAdapter –ManagementOS –Name "LAN" –IeeePriorityTag On
    Set-VMNetworkAdapter –ManagementOS –Name "CLUSTER" –IeeePriorityTag On
    Set-VMNetworkAdapter –ManagementOS –Name "LM" –IeeePriorityTag On
    #Set-VMNetworkAdapter –ManagementOS –Name "HEARTBEAT" –IeeePriorityTag On
    
    New-NetQosPolicy "LM" -LiveMigration -Priority 5
    New-NetQosPolicy "CSV" -SMB –Priority 3
    New-NetQosPolicy "HEARTBEAT" -IPDstPort 3343 –Priority 6
    
    ##IP configuration
    $IP=36
    #MANAGEMENT
    Get-NetAdapter -name *LAN* | New-NetIPAddress -IPAddress 192.168.0.$IP -PrefixLength 24 -DefaultGateway 192.168.0.1
    Get-NetAdapter -name *LAN* | Set-DnsClientServerAddress -ServerAddresses 192.168.0.5,192.168.0.6,192.168.3.3
    Get-NetAdapter -name *LAN* | Set-DnsClient -ConnectionSpecificSuffix "home.sid.tm.fr"
    
    #PowerShell 3.0 does not have any new cmdlet for configuring WINS server settings. 
    $WINS = Get-WmiObject win32_networkadapterconfiguration | Where IPAddress -eq 192.168.0.$IP
    $WINS.SetWINSServer("192.168.0.5","192.168.0.6")
    $WINS.SetTcpipNetbios("2")
    
    #CLUSTER
    Get-NetAdapter -name *CLUSTER* | New-NetIPAddress -IPAddress 192.168.12.$IP -PrefixLength 24 
    Get-NetAdapter -name *CLUSTER* | Set-DnsClient -RegisterThisConnectionsAddress $false
    
    #PowerShell 3.0 does not have any new cmdlet for configuring WINS server settings. 
    $WINS = Get-WmiObject win32_networkadapterconfiguration | Where IPAddress -eq 192.168.12.$IP
    $WINS.SetTcpipNetbios("2")
    
    #Disable IPV6, découverte de couche liaison, répondeur de découverte
    Disable-NetAdapterBinding -Name "vEthernet (CLUSTER)" -ComponentID ms_tcpip6
    Disable-NetAdapterBinding -Name "vEthernet (CLUSTER)" -ComponentID ms_lltdio
    Disable-NetAdapterBinding -Name "vEthernet (CLUSTER)" -ComponentID ms_rspndr
    
    #LM
    Get-NetAdapter -name *LM* | New-NetIPAddress -IPAddress 192.168.13.$IP -PrefixLength 24 
    Get-NetAdapter -name *LM* | Set-DnsClient -RegisterThisConnectionsAddress $false
    
    #PowerShell 3.0 does not have any new cmdlet for configuring WINS server settings. 
    $WINS = Get-WmiObject win32_networkadapterconfiguration | Where IPAddress -eq 192.168.13.$IP
    $WINS.SetTcpipNetbios("2")
    
    #Disable IPV6, découverte de couche liaison, répondeur de découverte
    Disable-NetAdapterBinding -Name "vEthernet (LM)" -ComponentID ms_tcpip6
    Disable-NetAdapterBinding -Name "vEthernet (LM)" -ComponentID ms_lltdio
    Disable-NetAdapterBinding -Name "vEthernet (LM)" -ComponentID ms_rspndr
    
    #HEARTBEAT
    #Get-NetAdapter -name *HEARTBEAT* | New-NetIPAddress -IPAddress 192.168.14.$IP -PrefixLength 24 
    #Get-NetAdapter -name *HEARTBEAT* | Set-DnsClient -RegisterThisConnectionsAddress $false
    
    #PowerShell 3.0 does not have any new cmdlet for configuring WINS server settings. 
    #$WINS = Get-WmiObject win32_networkadapterconfiguration | Where IPAddress -eq 192.168.14.$IP
    #$WINS.SetTcpipNetbios("2")
    
    #Disable IPV6, découverte de couche liaison, répondeur de découverte
    #Disable-NetAdapterBinding -Name "vEthernet (HEARTBEAT)" -ComponentID ms_tcpip6
    #Disable-NetAdapterBinding -Name "vEthernet (HEARTBEAT)" -ComponentID ms_lltdio
    #Disable-NetAdapterBinding -Name "vEthernet (HEARTBEAT)" -ComponentID ms_rspndr
    
    #VMSwitch MTU
    $RegKey ="HKLM:\SYSTEM\CurrentControlSet\Control\Class\{4D36E972-E325-11CE-BFC1-08002BE10318}"
     Get-ChildItem -Path $RegKey -ErrorAction SilentlyContinue| % {
     $path = $_.PSPath
     Get-Itemproperty $path | where {$_.driverdesc -eq "Hyper-V Virtual Ethernet Adapter" -and $_.Characteristics -eq "41"} | % {
     Set-ItemProperty $path -Name "*JumboPacket" -Value "9014"
     }
    }
    
    #MTU
    #Get-NetAdapterAdvancedProperty -Name "vEthernet (LM)", "vEthernet (CLUSTER)" -DisplayName "Paquet Jumbo" | Set-NetAdapterAdvancedProperty -RegistryValue "9014"
    
    
    
    
    
    
    
    


    Frédéric OGUER

    Monday, August 12, 2013 12:21 PM
  • sounds like a lot of work for no gain!

    I've given up on using DPM for child partition snapshot backups.  I'm relying on native SAN snapshots to backup VMs.  A more manual restore process but it's fine for the rare occasion I need to restore an entire VM.   I'm using DPM just for agent backups of SQL, Exchange and files.

    This issue was taking too much troubleshooting time and the loss of performance from turning off ODX as a workaround was not acceptable.

    Wednesday, August 14, 2013 9:49 AM
  • Hi

    I receive this Message from an Enterprise Enterprise Technical Sr LVL3 Consultant from Dell EqualLogic Team

    This is an ongoing issue we are investigating at this time. We are currently aware of the issue and  we are working with Microsoft to address the problem. Unfortunately at this time there is no workaround that I know of.

    So I think we are happy to hear this and we need to wait.

    I will give you more Information if avalible.


    Roendi


    • Edited by Roendi Monday, August 19, 2013 10:47 AM
    Monday, August 19, 2013 7:24 AM
  • HI,

    I've installed on every node of my cluster:

    -KB2838669

    -KB2870270

    - DPM 2012 RU2

    - HIT 4.6

    Like Dave Grant, IVssWriterCallback" error 8194 spamming my eventlog every 30 sec ...


    Little French

    Monday, August 19, 2013 11:00 AM
  • Translate by Google :

    Symptom:
    On your three-node cluster Hyper-V in 2012, you get VSS event ID 8194 source indicating an error

    Log Name:      Application
    Source:        VSS
    Date:          06/06/2013 12:44:26
    Event ID:      8194
    Task Category: None
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      Computer
    Description:
    Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface.  hr = 0x80070005, Access is denied.
    . This is often caused by incorrect security settings in either the writer or requestor process. 
    
    Operation :
       Gathering Writer Data
    
    Context :
       Writer Class ID: {e8132975-6f93-4464-a53e-1050253ae220}
       Writer Name: System Writer
       Writer Instance ID: {4e342a4b-5cc5-4a42-ab69-fdc843778325}

    This event is visible on nodes other than the one who initiated the call to VSS

    Cause:
    Known problem but functional involvement
    This problem appears only as part of a cluster
    During a call VSS, the Cluster service sends requests to all nodes throughthe GUM (Global Update Manager). Because the "SystemWriter" is hosted by the encryption service(cryptographic service or cryptsvc) and that it is executed in a context "NetworkService" instead of "System", the return of COM calls a meeting Denied Access because different impersonnations on other cluster nodes

    Resolution:

    The problem will not be fixed as it has no functional involvement

    Events can be ignored


    Frédéric OGUER


    • Edited by F.OGUER Monday, August 19, 2013 11:17 AM mistake
    Monday, August 19, 2013 11:16 AM
  • Event 8194 is no problem, but event 12363 is.

    Any news when there will be a solution for that one ?

    Monday, August 19, 2013 11:19 AM
  • Hi Bert,

    I have the same answer than Roendi :

    "This is an ongoing issue we are investigating at this time. We are currently aware of the issue and we are working with Microsoft to address the problem. Unfortunately at this time there is no workaround that I know of."

    I must call back Equalogic September 1.

    Best regards,


    Frédéric OGUER

    Monday, August 19, 2013 11:25 AM
  • funny, I had a similar reply from Microsoft:

    Hope you are aware that this is a known issue and we hardly have any workaround available for this, so we have only option collect traces and analyze it. There is no point in changing any configuration as this is already a known issue.

    This will surely be credited against the BUG so will not be a charged incident.

    Hope this helps

    Monday, August 19, 2013 11:33 AM
  • we have a working solution using powershell.  Only after every reboot of the cluster nodes we have to do Consistency Check manually (also using PS).  If we do that we can avoid error 12363 and our cluster and DPM work like they should (using ASM and ODX).

    But after a cluster node reboot, we still have to stop the powershell backup schedules, start the consistency check script, and start the backup schedules again when CS is ready.  We could also automate this using PS, but since it is only ones or twice a month, we do it manual for now.

    If someone wants the scripts and procedures, feel welcome to reply.

    A solution from Microsoft/Dell still would be very welcome.

    Regards, Bert

    Monday, August 19, 2013 11:49 AM
  • Consistency Check use VSS.

    Sometime, i get 12363 error during Consistency Check.

    It's random like backup...

    I will install DPM agents inside VMs


    Frédéric OGUER

    Monday, August 19, 2013 12:04 PM
  • We also thought the errors are random, but I think we found a system in when the errors occur.  When we have two actions on one node (2 backups, 2 CS, or backup and CS) we always get a 12363.  When we have 2 actions, but on different nodes, we rarely get the 12363 error.  To be on the safe side we make sure there is always only one action (backup or CS) running at a time.  Than we never have the 12363 error.

    We also considered installing the agent in every VM, but we're not looking forward into doing this in a production environment.  Therefore we chose to go the powershell route.

    Monday, August 19, 2013 12:15 PM
  • I will try your route without PowerShell :

      • I disabled Auto-CC (Consistency Check) on my PG (Protection Group
      • I planned a daily CC on my PG after Backup
      • On the DPM server I changed the registry key 'MaxAllowedParallelBackups' by node from 3 to 1
      Windows Registry Editor Version 5.00
      
      [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Data Protection Manager\2.0\Configuration\MaxAllowedParallelBackups]
      "Microsoft Hyper-V"=dword:00000001

    More information : http://blogs.technet.com/b/dpm/archive/2011/06/06/how-to-use-and-troubleshoot-the-auto-heal-features-in-dpm-2010.aspx


    Frédéric OGUER


    • Edited by F.OGUER Monday, August 19, 2013 1:17 PM correction
    Monday, August 19, 2013 1:12 PM
  • Will you post the Powershell scripts. I'm interested in seeing what you are doing.

    Thanks.

    Tuesday, August 20, 2013 8:45 PM
  • Hello everyone.

    I was in the same situation as all of you (DELL Hyper-V 2012 Cluster with CSVs stored on a DELL Equallogic PS6100) having issue width DPm 201 2SP1

    After crawling the Internet (DELL Forum, DPM Forum, ..) I finally dropped my VSS Hardware provider (installed through DELL HIT kit) AND disabled the ODX feature on my 3 nodes.

    Since then I can protect VMs without issue.

    Apart, a new Roll Up for DPM is out. not test yet. I've posted here.

    http://blogs.technet.com/b/dpm/archive/2013/07/30/important-update-on-dpm-2012-sp1-update-rollup-3-issues-and-workarounds.aspx?PageIndex=2#comments

    Wednesday, August 21, 2013 9:02 AM
  • yeah, disabling ODX also fixes my issue when using the software VSS, too bad that we want the performance improvement on this cluster.  We shouldn't have to disable a much trumpeted feature of 2012 and new SANs just to get a backup to work.

    Rollup3 makes no difference by the way.

    Wednesday, August 21, 2013 9:32 AM
  • I'll try to explain how we work with our PS scripts to avoid the 12363 error. As said before we still use ODX and the Hit Kit 4.5 on our 4 node cluster with one EQL PS4110X.

    First we have to make sure DPM isn't doing any backups by itself.  So we modified the protection group and set the retention range at 31 days, but only run one backup a week at Saturday morning 6 AM. The first two scripts alter this schedule.  On Friday night we change the DPM backup to Sunday, and on Saterday night it changes back to Saterday.  This way DPM isn't doing any backups itself.  If there is an easier way, please let me know.

    Change_Backup_Schedule_Day_Sa-Su.ps1

    Import-Module DataProtectionManager
    $pg = Get-DPMProtectionGroup -dpmservername "DPMServerName"
    
    $SC0 = Get-DPMPolicySchedule $pg[0] -shortterm
    $mpg = Get-DPMModifiableProtectionGroup -protectiongroup $pg[0]
    Set-DPMPolicySchedule $mpg -Schedule $sc0[0] -TimesOfDay 06:00 -DaysOfWeek Su
    Set-DPMProtectionGroup $mpg
    
    $SC1 = Get-DPMPolicySchedule $pg[1] -shortterm
    $mpg = Get-DPMModifiableProtectionGroup -protectiongroup $pg[1]
    Set-DPMPolicySchedule $mpg -Schedule $sc1[0] -TimesOfDay 06:00 -DaysOfWeek Su
    Set-DPMProtectionGroup $mpg

    Change_Backup_Schedule_Day_Su-Sa.ps1

    Import-Module DataProtectionManager
    $pg = Get-DPMProtectionGroup -dpmservername "DPMServerName"
    
    $SC0 = Get-DPMPolicySchedule $pg[0] -shortterm
    $mpg = Get-DPMModifiableProtectionGroup -protectiongroup $pg[0]
    Set-DPMPolicySchedule $mpg -Schedule $sc0[0] -TimesOfDay 06:00 -DaysOfWeek Sa
    Set-DPMProtectionGroup $mpg
    
    $SC1 = Get-DPMPolicySchedule $pg[1] -shortterm
    $mpg = Get-DPMModifiableProtectionGroup -protectiongroup $pg[1]
    Set-DPMPolicySchedule $mpg -Schedule $sc1[0] -TimesOfDay 06:00 -DaysOfWeek Sa
    Set-DPMProtectionGroup $mpg

     

    Next is the script that actually runs the backups.  For this example we modified the script to only one PG and just a couple of VMs.  The actual script contains two PGs and 29 VMs.  We schedule this script four times a day (3:30 AM, 10:30 AM, 2:00 PM and 5:15 PM).  Schedule might seem weird, but this to avoid conflicts when other tasks are running.

    New-DPMRecoveryPoint_PG1.ps1

    param([string] $dpmname, [string] $pgname, [string] $backupoption) 
    
    if(!$dpmname)
    {    $dpmname = "DPMServerName"}
    if(!$pgname)
    {    $pgname =  "ProtectionGroupName"}
    if(!$backupoption)
    {    $backupoption =  "Expressfull"}
    
    trap{"Error in execution... $_";break}
    &{
      $clipg = Get-ProtectionGroup $dpmname | where { $_.FriendlyName -eq $pgname}
     
     if($clipg -eq $abc)
      {  Throw "No PG found" }
     
      $backupds = @(Get-Datasource $clipg)
     
     foreach ($ds in $backupds[0]) # change to 1, 2 etc for each VS in the ProtectionGroup
      {  
       $j = New-RecoveryPoint -Datasource $ds -Disk
      $jobtype = $j.jobtype
      Write-Output "$jobtype Job has been triggerred..." 
      } 
     
     while (! $j.hasCompleted )
        { 
        }
        Write-Host
     if($j.Status -eq "Succeeded")
        { 
        }
    }
    
    Start-Sleep -s 15
    
    # From here on copy as many as needed
    
    trap{"Error in execution... $_";break}
    &{
      $clipg = Get-ProtectionGroup $dpmname | where { $_.FriendlyName -eq $pgname}
     
     if($clipg -eq $abc)
      {  Throw "No PG found" }
     
      $backupds = @(Get-Datasource $clipg)
     
     foreach ($ds in $backupds[1]) # change to 1, 2 etc for each VS in the ProtectionGroup
      {  
       $j = New-RecoveryPoint -Datasource $ds -Disk
      $jobtype = $j.jobtype
      Write-Output "$jobtype Job has been triggerred..." 
      } 
     
     while (! $j.hasCompleted )
        { 
        }
        Write-Host
     if($j.Status -eq "Succeeded")
        { 
        }
    }

     

    Sometimes (a couple times a week) DPM fails to do a backup of one VM.  Than we get a mail telling:
    Computer: MyVM.HyperCluster.domain.com
    Description: Last 1 recovery points not created.
    DPM encountered a retryable VSS error.
    When this happens there is a 8194 error but that can be ignored.  In the next backup this will be OK again.

    Ones or twice a month we have to reboot the nodes after updates.  We don't use Cluster Aware Updating yet, but do the updates and rebooting manual.  After the reboot of every node (and pausing and resuming the cluster role per node)  we disable the "New-DPMRecoveryPoint_PG1.ps1" schedule on the DPM server. Than we have to run a Consistency Check on every node to avoid the 12363 error. Here's the script we use (once again, this is a shorter version than in production) :

    ConsistencyCheck_PG1.ps1

    param([string] $dpmname, [string] $pgname, [string] $dsname, [string] $isheavyweight) 
    
    if(!$dpmname)
    {    $dpmname = "DPMServerName"}
    if(!$pgname)
    {    $pgname =  "ProtectionGroupName"}
    if(!$dsname)
    {    $dsname =  "Hyper-V Name"} # like  "Backup Using Saved State\VSName" of " like "Backup Using Child Partition Snapshot\VSName"
    if(!$isheavyweight)
    {    $isheavyweight = "true"}
    
    write-host "Start consistency check on $dsname " 
    
        trap{"Error in execution... $_";break} 
        &{ 
            write-host "Getting protection group $pgname in $dpmname..." 
            $clipg = DataProtectionManager\Get-DPMProtectionGroup -DPMServerName $dpmname | where { $_.FriendlyName -eq $pgname } 
    
             if($clipg -eq $abc) 
              { 
                  Throw "No PG found" 
              } 
    
            write-host "Getting $dsname from PG $pgname..." 
            $ds = DataProtectionManager\Get-DPMDatasource $clipg | where { $_.name -eq $dsname } 
    
            if($ds -eq $abc) 
             { 
                  Throw "No Data Source found" 
             } 
    
            if( $isheavyweight -ne "true") 
            { 
                write-host "Starting light weight consistency check..." 
                $j = DataProtectionManager\Start-DPMDatasourceConsistencyCheck -Datasource $ds 
                $jobtype = $j.jobtype 
                if(("Validation") -notcontains $jobtype) 
                    { 
                        Throw "Shadow Copy job not triggered" 
                    } 
                while (! $j.hascompleted ){ write-host "Waiting for $jobtype job to complete..."; start-sleep 30} 
                if($j.Status -ne "Succeeded") {write-host "Job $jobtype failed..." } 
                Write-host "$jobtype job completed..." 
            } 
            else 
            { 
                write-host "Starting Heavy weight consistency check..." 
                $j = DataProtectionManager\Start-DPMDatasourceConsistencyCheck -Datasource $ds -HeavyWeight 
                $jobtype = $j.jobtype 
                if(("Validation") -notcontains $jobtype) 
                    { 
                        Throw "Shadow Copy job not triggered" 
                    } 
                while (! $j.hascompleted ){ write-host "Waiting for $jobtype job to complete..."; start-sleep 30}
                if($j.Status -ne "Succeeded") {write-host "Job $jobtype failed..." } 
                Write-host "$jobtype job completed..." 
            } 
    
        }
    
        
    Start-Sleep -s 15
    
    # From here on copy as many as needed. Use the same or other ProtectionGroups.
    
    
    $dpmname = "DPMServerName"
    $pgname =  "ProtectionGroupName2"
    $dsname =  "Hyper-V Name2" # like  "Backup Using Saved State\VSName" of " like "Backup Using Child Partition Snapshot\VSName"
    $isheavyweight = "true"
    
    
    write-host "Start consistency check on $dsname " 
    
        
        trap{"Error in execution... $_";break} 
        &{ 
            write-host "Getting protection group $pgname in $dpmname..." 
            $clipg = DataProtectionManager\Get-DPMProtectionGroup -DPMServerName $dpmname | where { $_.FriendlyName -eq $pgname } 
    
             if($clipg -eq $abc) 
              { 
                  Throw "No PG found" 
              } 
    
            write-host "Getting $dsname from PG $pgname..." 
            $ds = DataProtectionManager\Get-DPMDatasource $clipg | where { $_.name -eq $dsname } 
    
            if($ds -eq $abc) 
             { 
                  Throw "No Data Source found" 
             } 
    
            if( $isheavyweight -ne "true") 
            { 
                write-host "Starting light weight consistency check..." 
                $j = DataProtectionManager\Start-DPMDatasourceConsistencyCheck -Datasource $ds 
                $jobtype = $j.jobtype 
                if(("Validation") -notcontains $jobtype) 
                    { 
                        Throw "Shadow Copy job not triggered" 
                    } 
                while (! $j.hascompleted ){ write-host "Waiting for $jobtype job to complete..."; start-sleep 30} 
                if($j.Status -ne "Succeeded") {write-host "Job $jobtype failed..." } 
                Write-host "$jobtype job completed..." 
            } 
            else 
            { 
                write-host "Starting Heavy weight consistency check..." 
                $j = DataProtectionManager\Start-DPMDatasourceConsistencyCheck -Datasource $ds -HeavyWeight 
                $jobtype = $j.jobtype 
                if(("Validation") -notcontains $jobtype) 
                    { 
                        Throw "Shadow Copy job not triggered" 
                    } 
                while (! $j.hascompleted ){ write-host "Waiting for $jobtype job to complete..."; start-sleep 30}
                if($j.Status -ne "Succeeded") {write-host "Job $jobtype failed..." } 
                Write-host "$jobtype job completed..." 
            } 
    
        }

    When this is completed (usually the next morning) we start the "New-DPMRecoveryPoint_PG1.ps1" schedule again.

    With these scripts we are able to keep using DPM without the 12363 error.  I still don't know exactly what the 12363 error does, but I notice that after this error the iSCSI traffic is only using one node (the LUN owner) and that performance drops awfully.  I think redirected IO doesn't exist anymore, but the behavior we get very much looks like it.  The only way I found to cure this is to reboot every node again and do a consistency check again on every VM.

    When reading this reply, you will definitely notice that we still are novice PS users.  So if you see anything in our scripts that could be done more elegant, please reply.

    To end this reply I can't keep myself from saying that I think it is a shame that we have to come up with this bunch of workarounds to get a fairly simple cluster backup running.  That MS and Dell are leaving us in the dark doesn't help me building confidence in both companies.  That said, I hope our workaround can help some of you.

    Best Regards, Bert Oris



    • Edited by Bert Oris Wednesday, August 21, 2013 11:55 AM
    Wednesday, August 21, 2013 11:50 AM
  • [Bert],

    I'm impressed by the scripts you have designed. But according to me it's a too big workaround.

    I can't efford changing the PGs retention range.

    But, on the final thought, wouldn't we be satisfied if we could just simply let DPM do its job, without having to care about the CSV situation.

    As [Slinkoff] mentionned,  ODX is a cool feature when you move data around your SAN, and Windows 2012 Cluster Service (and CSV 2.0) do no longer require us to make  VM serialization in order to protect them.

    It shoudl be straight forward!

    Wednesday, August 21, 2013 4:37 PM
  • Since installation HIT 4.6 + DPM 2012 RU2 + KB2838669 & KB2870270 ,

    the DPM Backup of Hyper-V 2012 CSV always Fails with Error 0x80042301 on a lot of VM (0 byte transfer).

    The time for backup 20 VM take 6 hours instead 1 hour before (i use Vss Hardware)

    On every node, a lot of new vss event id 4003: Hyper-V VSS Writer receive freeze event and wait for abort or Thaw event. 

    IVssWriterCallback" error 8194 spamming my node eventlog every 30 sec  (i known we could ignored this but its' frustrating).

    However, a start manually replica it's ok and seems give good performance.

    I'll try to change MaxAllowedParallelBackups to 1 and will see tomorrow.


    Little French

    Tuesday, August 27, 2013 12:16 PM
  • Last week of August : "Dell and Microsoft teams were trying to check if the MPIO driver was not involved. My colleague [...] just informed me that the problem occurs even without MPIO. The research problem therefore continues."

    Frédéric OGUER

    Wednesday, September 4, 2013 1:07 PM
  • I've link my support case Dell to the case open by my "compatriote" F.Oguer. (Thanks to Frédéric for all informations in this blog).This incident is in level four, so probably wait for new firmware EqualLogic, new HIT, new KB or roll-up Microsoft ...


    Little French

    Tuesday, September 10, 2013 11:40 AM
  • Hi everybody,

    No news from DELL.

    More information about my configuration (translated by Google)

    MTU9000onVMswitch:
    On the issue of MTU on the virtual switch, Ichanged the MTU VMswitch via the registry.
    Here is the Power Shell script used:

    #Modification de la MTU sur le Virtual Switch
    $RegKey ="HKLM:\SYSTEM\CurrentControlSet\Control\Class\{4D36E972-E325-11CE-BFC1-08002BE10318}"
    Get-ChildItem -Path $RegKey -ErrorAction SilentlyContinue| % {
    $path = $_.PSPath
    Get-Itemproperty $path | where {$_.driverdesc -eq "Hyper-V Virtual Ethernet Adapter" -and $_.Characteristics -eq "41"} | % {
    Set-ItemProperty $path -Name "*JumboPacket" -Value "9014"}}
                                                                                           

    The Ping works


    Dell KACE dysfunction and OpenManage
    The Dell Kace agent was reinstalled without problem.

    The problem seems to come from OpenManage 7.3.

    Broadcom 17.6 drivers
    We did an updated Broadcom drivers (17.4 to 17.6) and firmware (7.4 to 7.6).
    After consulting the list of fixes,VMQueue and SR-IOV is not supported with version 17.4:
    Enhancements:
    ===============
    - Added Support for VMqueue NetXtremeII 1G and 10G devices.
    - Added SR-IOV Support features for 57712 and 578xx

    SR-IOV and VMSwitch
    The SR-IOV is incompatible with the Windows NIC Teaming.
    (Source : http://technet.microsoft.com/en-us/library/hh997031.aspx
    Incompatibilities. The NIC teaming feature is consistent with networking capabilities in Windows Server 2012 with three exceptions:
    SR-IOV
    Remote DirectMemory Access(RDMA)
    TCP Chimney)
    Our current configuration is not supported ! I plan to reinstall my nodes with the management of the bandwidth(MinimumBandwidthMode  Weight).


    Frédéric OGUER

    Thursday, September 19, 2013 9:15 AM
  • Hi Frédéric,

    We upgrade the Broadcom drivers 17.6 and firmware 7.6.

    No change.

    Backup with VSS Hardware are very slow.

    Backup with vss software sometimes failed and hang (CSV Volume Lost momentary on a node).

    I would open a case Microsoft.

    Have-you open it ? Could you give-me your case number Microsoft for reference?

    Thanks a lot


    Little French

    Tuesday, October 1, 2013 9:31 AM
  • Fonznip,

    Do you install ?

    kb 2870270

    kb 2869923

    Send me a email (foguer-at-sid.tm.fr) or call me (direct line : 01 45 17 43 32) , it'll be more easy in French !

    I'm at Créteil, near Sèvres...

    Microsoft Case Number for Error 5142 : CSV TIMEOUT (open since 06/06/2013)...

    113060610494265

    For "Backup with VSS Hardware are very slow." : what's very slow ?

    Yesterday, i did a replicat of 2 To in 4 Hours, it's 500 GBytes/Hour, 140 MByte/s (disk write speed...) :

    Regards


    Frédéric OGUER



    • Edited by F.OGUER Tuesday, October 1, 2013 11:14 AM insert image
    Tuesday, October 1, 2013 10:43 AM
  • New fix for DPM (UR 3.6):

    http://www.microsoft.com/en-sa/download/details.aspx?id=40318

    Issue #1: DPM has express full technology where DPM tracks the changes via DPM filter driver and the changed block information are tracked as bitmap and is stored in bitmap files.  In some scenarios, DPM bitmap files are becoming very big leading to higher CSV volume consumption.  This issue is fixed in DPM filter and effects only VM protection scenarios.  This fix is done on the DPM filter driver running on the production server.


    Frédéric OGUER

    Tuesday, October 1, 2013 3:14 PM
  • I saw this as well, I plan on attempting to install this tonight and will update the thread unless anyone else has done it already.
    Tuesday, October 1, 2013 6:09 PM
  • I installed it last night and it did not resolve our issue we are still getting the 12363 event.
    Wednesday, October 2, 2013 2:32 PM
  • Not Resolve our issue but backup looks faster (x2)...

    Frédéric OGUER

    Wednesday, October 2, 2013 3:07 PM
  • Wow this dpm hotfix really saved my day. i had constantly growing csv volumes and when i listed all files it didnt add upp with the diskusage i could se. Our SAN guy could only see 50% disk usage when i on my side had less then a few gigabytes left. After applying this Hotfix for System Center 2012 SP1 Data Protection Manager (KB2886362) i have retrieved Terabytes of diskspace.

    Regarding your other issues i had lots of problems before the summers hotfixes. but after that i havent run into any problems my setup is a 12 node cluster and a 3par 7200 array. we are using software vss provider with serialization ( i haven't seen any hardware vss provider from hp that officially supports hyper-v 2012 only one that supports it using hp recovery manager).

    /Regards Jorgen

    Thursday, October 3, 2013 5:43 AM
  • Has anyone gotten any updates from Dell or Microsoft on their open tickets?

    I have had success by creating smaller protection groups and having all the VM’s that are defined in that protection group running on a single node in the cluster.  It also helps to stagger the time that the protection groups run.  The issue appears to occur when VM’s within a protection group get spread across multiple nodes in the hyper-V cluster.  Running in this way will cause the Event ID 12363 which Microsoft/Dell need to be working on a fix, but I haven't gotten a word out of either of them on if they are or not.

    Thursday, October 10, 2013 9:36 PM
  • I was told earlier this week by my current Dell case manager that Dell has been able to show the problem to Microsoft, Microsoft has acknowledged the problem, and that Microsoft told Dell that the problem affects other vendors besides Dell.  Dell and Microsoft are apparently running tests/diagnostics every day to try and figure out how to fix it.  I don't know if that means that they are close to a solution on not, but at least it is something.
    Thursday, October 10, 2013 10:06 PM
  • I also had a similar problem, with a 3Par 7400 and HPs backup software and a hardware VSS provider.
    A backup of the CSV volumes would take down random nodes in the cluster after between 1-5 backups.

    Turns out this was a jumbo frames issue. It was causing excessive pause frames on the FlexFabric and the nodes were crashing. We could have disabled the flow control, but chose to leave Jumbo frames off and have no problems now.


    • Edited by DFarlam Monday, November 11, 2013 10:32 PM my previous entry was nonsense
    Friday, October 11, 2013 6:20 AM
  • I also installed the latest DPM update, but noticed no differences at all.  Even if there is still no solution for the 12363 error, I was looking forward to the speed advantage, but I couldn't see any difference.

    We also figured out to make PGs per node, as this indeed avoids the problem of error 12363.  But in the end this wasn't flexible enough for us as VMs sometimes need to move to another node to balance the load across our cluster.  Therefore we chose to use some PS scripting, you can find them above.

    A last question, a bit off topic.  When we run a Consistency check (after a node reboot we need to), it takes arround 9 hours for 4,3 TB.  This is only 139 MB/s on average, is this a normal figure ?  We are using a PS4110X and a PS4110E.

    Friday, October 11, 2013 7:14 AM
  • "Update Rollup 4 for System Center 2012 Service Pack 1" is out now but no fixes for DPM...

    http://support.microsoft.com/kb/2879276/en-us


    Frédéric OGUER

    Wednesday, October 23, 2013 7:41 AM
  • Anyone try DPM 2012 R2 and Server 2012 R2 combination to see if the problems are fixed?
    Wednesday, October 23, 2013 9:37 PM
  • I tried yestarday and I have broken my Backup now. Restore back to Rollup 3 and I lost mostly Backup Volumes.

    And I am not allone with this Problem!

    http://social.technet.microsoft.com/Forums/en-US/245e0473-a882-488c-addf-598267145187/an-unexpected-error-occurred-during-the-installation-id-4387-dpm2012-r2-upgrade?forum=dpmsetup


    Roendi


    • Edited by Roendi Friday, October 25, 2013 7:49 AM
    Friday, October 25, 2013 7:05 AM
  • Hi Guys

    Some god News. It works mutch better with 2012 R2 but!!!!!!

    Very hard to upgrade for me! I will all my Problem post on the Link above.

    And 2012 R2 still support 2003 Server !!!! You need to now that.

    But all my Server has no more problems by Sync the Volumes and Services.

    I have only a lot of Errors, the same as befor.

    Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface.  hr = 0x80070005, Access is denied.

    Best Regards

    Röndif


    Roendi

    Monday, October 28, 2013 7:50 AM
  • H Guys

    My Backup Server hast crashed last Night and now the Problems are back. So DPM 2012 R2 resolve not the problem.


    Roendi

    Tuesday, October 29, 2013 7:21 AM
  • Hi,

    We change Broadcom to Intel Network Cards : no change.

    We try DPM 2012 R2 / HYPERV 2012 R2 on a 8 nodes cluster : exactly the same problem.

    Warning with HIT 4.6 : when the VM has only IDE controler, the VM is show as Stopped !!! You need to add a SCSI controler to have correct snapshot. 

    We try firmware 7.0 EPA : crash with ODX but very good disk repartition.

    For me, the best workaround for backup is CSV serialisation (work with DPM 2012 / DPM 2012 R2) :

    http://technet.microsoft.com/en-us/library/ff634192.aspx

    Best regards,

    Frédéric OGUER


    Frédéric OGUER

    Friday, November 15, 2013 9:01 AM
  • I can’t believe this problem (DPM-CSV-Error 0x80042301 -> ODX) is already several months old/known, and MS support does not seems to be aware of it.   I still have an open case (November 2013) for this (second one already) with no progress whatsoever, I was glad finding this post. It is obvious that this problem is not (specific) hardware-vendor related as there are people using different hardware: Dell, IMB, HP …with the exact same problem! Mostly using an ODX capable SAN. We ourselves are using the latest HP Blades (BL460c Gen8), and an 3PAR 7200 as SAN.  All servers (Host, VM's, DPM,...) are running Windows 2012 and are fully patched, inclusive hotfixes, firmware, drivers ect.  Also upgrading to DPM2012R2 did not solve the problem …but disabling ODX did the magic trick :-). And yes, it has a (small) negative impact on the SAN performance, but rather this than unreliable backups. I hope/aspect Microsoft will come with a fix soon, otherwise we will consider upgrading the Hyper-V hosts to Windows2012R2 (seems not to have this problem). But HP  is not 100% (2012R2 drivers) ready for this.  Greetings from Belgium -  Ivan Henderix

     


    Ivan HENDERIX Ivan Henderix Field System Engineer KPD Services NV/SA






    • Proposed as answer by Danny Vincken Wednesday, November 20, 2013 2:31 PM
    • Edited by Ivanhoe007 Wednesday, November 20, 2013 3:35 PM
    Wednesday, November 20, 2013 2:26 PM
  • [...]

    I hope/aspect Microsoft will come with a fix soon, otherwise we will consider upgrading the Hyper-V hosts to Windows2012R2 (seems not to have this problem). But HP  is not 100% (2012R2 drivers) ready for this.  Greetings from Belgium -  Ivan Henderix

     


    Ivan HENDERIX Ivan Henderix Field System Engineer KPD Services NV/SA

    Don't bother upgrading, just on the assumption that it is fixed in 2012 R2, because we are running Windows 2012 R2 HyperV Failover Cluster, it has the same issues with ODX. Disabling ODX also fixes it on Windows 2012 R2.

    There are of course other/better reasons to upgrade to 2012 R2. Just wanted to give you an heads up that the issue is not (yet) fixed in R2 either.

    Wednesday, November 27, 2013 5:16 PM
  • I just heard from a collegue that Veeam doesn't have this problem in a similar setup.  We will check it out next week, and I'll post again when I know more.

    Regard, Bert Oris

    Friday, December 6, 2013 6:37 PM
  • I just heard from a collegue that Veeam doesn't have this problem in a similar setup.  We will check it out next week, and I'll post again when I know more.

    Regard, Bert Oris

    We were using Veeam when we first setup our infrastructure; however, we had similar issues and still waiting for a fix to get this working.

    We have several VMs in production and simply cannot guess that re-installaing and reconfiguring Veeam will work this time. We haven't tried to disable ODX as we still need this feature to be activated.

    Hopefully, a fix will be released soon.

    - Luc

    Thursday, December 12, 2013 5:21 PM
  • New Cumulative Hotfix (not solve this problem but replace 2870270, 2869923 and 2908415) :

    http://support.microsoft.com/kb/2878635


    Frédéric OGUER

    Thursday, December 26, 2013 9:25 AM
  • Thx.. we will try this hotfix out in a week or two.
    Anyone tried this one already ?


    Ivan HENDERIX Ivan Henderix Field System Engineer KPD Services NV/SA

    Monday, January 13, 2014 12:46 PM
  • As part of the testing Dell is having me do, they had me disable the EqualLogic hardware VSS provider using the command:

    "C:\Program Files\EqualLogic\bin\eqlvss" /unregserver (it can be undone via "C:\Program Files\EqualLogic\bin\eqlvss" /regserver)

    Since doing that on Sunday, I haven't had a single SCDPM backup failure.  I also changed the max allowed parallel backups from 3 to 1, but I just switched it back to 3 so we will see how it goes tonight.  Obviously this isn't a fix, but it may work as a band-aid for everyone in the short term.  If I get anything new from Dell I'll make sure to post it.

    Had same issue with an EqualLogic and after disabling the hardware VSS DPM started to backup Hyper-V child partitions correctly. It's quite frustrating that after a whole year Microsoft and Dell haven't come back with a solution. However, I must say the Server 2012 software provider is fairly fast, I'm a bit impressed. Even so using the hardware one would be desirable.
    Tuesday, January 14, 2014 5:30 PM
  • Upgrading to 2012 R2 in DPM and Server 2012 R2 for our Fail-Over clusters hasn't made a difference.

    Both 2012 R2 clusters I've built had to have bulit-in ODX disabled via Powershell, and the Dell HIT toolkit has had to be installed without the ODX provider in order for DPM 2012 R2 to take proper backups. I still get the odd VSS failure in DPM, but at least backups are working without taking down the VMs or offlining the CSV (which happened a few times using the built-in 2012 R2 ODX provider).

    The Clusters I built were also built completely from scratch with the latest patches, NIC firmware, NIC drivers, and BIOS for our Dell severs. We use the MS iSCSI software provider, and our Equallogic SAN runs  7.0.5 firmware.

    Wednesday, January 29, 2014 7:17 PM
  • Not surprised on this, Dell's HIT kit has not been updated to support 2012 R2 yet so I wouldn't expect it to work without making special changes.

    On another note, I had a call today with the Dell Equallogic engineering team, they informed me that they are actively working with Microsoft on this issue and that it is indeed a Microsoft issue. The quickest way for this to gain traction and have Microsoft resolve this is for everyone to open a ticket and/or if you have a ticket open to notify Dell about your Microsoft ticket so that they can track them and put more heat on Microsoft to resolve this issue. The more tickets Microsoft gets about this issue the more resources they will devote to resolving it.

    Wednesday, January 29, 2014 11:04 PM
  • Upgrading to 2012 R2 in DPM and Server 2012 R2 for our Fail-Over clusters hasn't made a difference.

    Both 2012 R2 clusters I've built had to have bulit-in ODX disabled via Powershell, and the Dell HIT toolkit has had to be installed without the ODX provider in order for DPM 2012 R2 to take proper backups. I still get the odd VSS failure in DPM, but at least backups are working without taking down the VMs or offlining the CSV (which happened a few times using the built-in 2012 R2 ODX provider).

    The Clusters I built were also built completely from scratch with the latest patches, NIC firmware, NIC drivers, and BIOS for our Dell severs. We use the MS iSCSI software provider, and our Equallogic SAN runs  7.0.5 firmware.


    7.0.2 is in Beta and will be release next week...


    Frédéric OGUER

    Tuesday, February 11, 2014 4:34 PM
  • Not surprised on this, Dell's HIT kit has not been updated to support 2012 R2 yet so I wouldn't expect it to work without making special changes.

    On another note, I had a call today with the Dell Equallogic engineering team, they informed me that they are actively working with Microsoft on this issue and that it is indeed a Microsoft issue. The quickest way for this to gain traction and have Microsoft resolve this is for everyone to open a ticket and/or if you have a ticket open to notify Dell about your Microsoft ticket so that they can track them and put more heat on Microsoft to resolve this issue. The more tickets Microsoft gets about this issue the more resources they will devote to resolving it.


    I open a case in May ! Microsoft close it today...

    It's not a DPM bug but a Hyper-V VSS bug.


    Frédéric OGUER

    Tuesday, February 11, 2014 4:35 PM
  • Hello everyone,
    Timeout error on CSV, this seems to be normal by design.

    I have a great backup server with a network 2 * 10 Gbit / s
    The only bottleneck is the equallogic discs.

    There is no way (with software VSS) to manage priorities between volumes and snapshots.

    Production volumes have hard latencies exceeding 40 ms . At this time , we see appear the errors CSV_TIMEOUT / 5120 . This was reduced with the latest patches CSV ( size replica is smaller ) . The only workaround I found is to enable the limitation of bandwidth DPM agents for network performance does not exceed the performance disk arrays ...

    Regarding support for Windows Server 2012 R2 is expected with 4.7 HIT announced in March.

    On 7.0.1 firmware there is a bug removal snapshot. Also identified on Veam .

    http://forums.veeam.com/microsoft-hyper-v-f25/equallogic-on-latest-firmware-volume-snapshots-not-deleting-t20088.html

    Frédéric OGUER

    Tuesday, February 11, 2014 4:41 PM
  • Upgrading to 2012 R2 in DPM and Server 2012 R2 for our Fail-Over clusters hasn't made a difference.

    Both 2012 R2 clusters I've built had to have bulit-in ODX disabled via Powershell, and the Dell HIT toolkit has had to be installed without the ODX provider in order for DPM 2012 R2 to take proper backups. I still get the odd VSS failure in DPM, but at least backups are working without taking down the VMs or offlining the CSV (which happened a few times using the built-in 2012 R2 ODX provider).

    The Clusters I built were also built completely from scratch with the latest patches, NIC firmware, NIC drivers, and BIOS for our Dell severs. We use the MS iSCSI software provider, and our Equallogic SAN runs  7.0.5 firmware.

    Not really sure what you mean by "ODX provider". Dell's Hit Kit only has the VSS provider, while ODX is an OS function. Did you guys install the VSS provider? Aslo, how many backups per csv/server do you have DPM configured (it's a regkey on the DPM server)? We haven't seen any issues since switching to the software provider (since Dell's hardware provider (hitkit v.4.6) doesn't currently support VSS functions in 2012/2012 R2), and installing the latest hotfixes - http://support.microsoft.com/kb/2920151/en-us.
    Thursday, March 20, 2014 3:52 PM
  • 3 News hotfix for 2012 :

    http://support.microsoft.com/kb/2901896

    http://support.microsoft.com/kb/2929078

    http://support.microsoft.com/kb/2929869

    Csvfs.sys

    Csvvbus.sys

    Csvflt.sys

    Volsnap.sys

    KB 2901896

    6.2.9200.20927

    6.2.9200.20927

    KB 2929078

    6.2.9200.20931

    6.2.9200.20931

    KB 2929869

    ???

    ???

    6.2.9200.20930


    Frédéric OGUER

    Monday, March 24, 2014 2:22 PM
  • So we are all still on same two options from beginning :

    a)  Use software VSS and hope that it will not crash OS inside VM's (which had happened to us)

    b) Use Hardware VSS and manually fix broken Backups.

    We have this issue since begining of this thread, and after trying OS reinstall, software VSS, patching with various proposed hotfix it is still present after almost a year now ! Right now we use option B, with manual backup fix, but this is also non-error free solution, since in three occasion from january we had to restart each node in cluster because of Errors like 1230, 1205, 1069, 5120 ... which all trigger with activation of DPM backup (two times during scheduled backup and once during manual fix). We have 2012 three node cluster + DPM 2012 SP1

    Did anyone tried HIT 4.7 EPA + W2012R2 + DPM2012R2 ?

    -

    Ognjen

    Monday, April 28, 2014 12:19 PM
  • Hi,

    Yes, we tried HIT 4.7 EPA + W2012R2 (new install with april update) + DPM2012R2.

    Some news :

    We tested ODX : it's worked without problem ! Yes we can reactivated it !

    HIT 4.7 is available. The following issue has been fixed since Dell EqualLogic Host Integration Tools for Microsoft (HIT/Microsoft) version 4.7 Early Production Access (EPA), released February 2014:

    • Previously, Hyper-V vNICs on a Windows 2012 R2 host could cause the included and excluded subnets to not display properly in Auto-Snapshot Manager’s MPIO Settings view. The issue has been resolved.
    • Selective restore of Hyper-V objects was disabled in HIT/Microsoft Version 4.7 EPA and has been re-enabled in the final production version of HIT/Microsoft Version 4.7.
    • Previously, two problems caused the EqlReqService to intermittently crash. These problems have been fixed.

    The v7.0.4 firmware is available. it patches the OpenSSL (“Heartbleed”) vulnerability. This bug affected OpenSSL versions 1.0.1 (including 1.0.1f) and 1.0.2-beta1 releases.

    DPM 2012 SP1 UR6 is available : http://support.microsoft.com/kb/2958098

    DPM 2012 R2 UR2 was available : http://support.microsoft.com/kb/2958098

    We tested it for 2 bugs but, for us, this one is not corrected : SetBackupComplete called prematurely causes SetBackupSucceeded to be called and 0x80042301 in VSS.

    The only workaround is the XML serialization (or set one Protection group by host with different backup schedule).

    But new problem : who get CSV 5120 error during BITS transfer with SCVMM ?



    Frédéric OGUER




    • Edited by F.OGUER Tuesday, May 6, 2014 9:30 AM
    Tuesday, May 6, 2014 9:23 AM
  • a)  Use software VSS and hope that it will not crash OS inside VM's (which had happened to us)

    Do you upgrade integration services ? Do you have enough disk space inside the VM ? Do you have an error message ?

    We have no crash inside VM with software VSS.


    Frédéric OGUER

    Tuesday, May 6, 2014 9:26 AM
  • Hi

    My Backup Problem persiste. Every Week I have other Problems. At the moment I want change this Value but in my Registry I have dare a Reg_Binary Value. Is this a proper Value or a misconfiguration?

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Data Protection Manager\2.0\Configuration\MaxAllowedParallelBackups] "Microsoft Hyper-V"=dword:00000001

    Configuraton is on my Hyper-V 2012 Hosts a Reg_Binary Value.

    My DMP is 2012 R2 fully Patched

    Thanks for any Feedback


    Roendi


    • Edited by Roendi Thursday, July 24, 2014 6:24 AM
    Thursday, July 24, 2014 6:23 AM
  • hi,

    it has been a while since I reported anything, since we kept using our PS script to run DPM instead of the DPM schedules.

    A case with Dell and Microsoft helped us solve our case, thanks to the perseverance of both support engineers.

    In the end all we did was stop the EqlASMAgent [EqualLogic Auto-Snapshot Manager Agent] service on all four cluster nodes.

    We are running three VM-backups at a time using the hardware VSS writer without any problems for over a week now.

    Best Regards, Bert

    Friday, August 15, 2014 9:26 AM
  • Hi at all

    Today I receife this from Dell ;-)

    Once the hardware VSS is correctly configured and works with  ASM/ME  then it can be really considered an MS issue and customers should open a case with Microsoft.

    There have been a number of issues uncovered by MS with their DPM application which has recently released DPM service pack 3 that has resolved a number of their problems with Hyper-V and DPM using hardware VSS providers.

     The Dell Case  00915355 was closed on 30.6.14. The Fix come with DPM 2012 R2 Rollup 2 .

    Now we have migrate to Hyper-V 2012 R2. Backup works fine now

    Best Regards

    Röndi


     


    Roendi

    Thursday, August 21, 2014 9:04 AM
  • Roendi,

    I'm about to install a brand new DPM R2 RU 3. Currently, my cluster is W2k12 (not R2). Did you solve this problem only when you migrate your cluster to R2?

    Thompson


    ThompsonCosta

    Wednesday, September 3, 2014 3:57 PM
  • Hi,

    We resolved our issues with DPM 2012R2, and host upgrade to 2012 R2 along with latest HIT tools from Dell.

    • Proposed as answer by Mitrasman123 Friday, August 14, 2015 1:48 PM
    Tuesday, July 21, 2015 5:47 PM
  • I have recently stumped upon the same issue with a new SCDPM deployment at a customer. And I have just found the issue for our scenario!

    Although we can create snapshots/checkpoint with Hyper-V, an online VM backup with DPM fails. It appears the VSS Writer on the Guest OS is somehow stalled. Apparently it is caused by the Trend Micro virusscanner on the VMs (Guest OS). Disabling Trend Micro did not work, but uninstalling it solved the problem.

    The fact is, the customer had recently deployed Trend Micro on several VM's, but there are no exclusions configured yet. I still have to find out which exclusions we have to configure, but I am in the right track for now.

    Just wanted to inform you in case you run into this issue.


    Boudewijn Plomp | ITON Consultancy

    Please remember, if you see a post that helped you please click "Vote as Helpful", and if it answered your question, please click "Mark as Answer". This posting is provided "AS IS" with no warranties, and confers no rights.

    Tuesday, September 15, 2015 8:41 AM