locked
DPM 2012 SP1 Beta - Causing Server 2012 Hyper-V Cluster hang / ISCSI problems RRS feed

  • Frage

  • Hi All,

    First of all, I know it's a beta and these are the perils of being an early adopter, but I've got a serious problem.

    I've upgraded our production Hyper-V cluster to Server 2012. The setup is a 4 node cluster running CSVs on an ISCSI SAN with MPIO via dual gigabit Ethernet networks. The SAN storage is provided by Open-E DSS7 and replicated to another server in a different building.

    Post the upgrade everything about the cluster seemed stable and to work as expected - live migrations etc all working. I then turned my attention to backups, and I discovered that Server 2012 wasn't supported by DPM. Fortunately there is a beta of DPM 2012 SP1 which adds support for Server 2012, unfortunately there is no upgrade path from the beta to RTM of SP1. Not wanting to upgrade our production DPM server to a beta, I installed a copy of DPM 2012 SP1 beta on a VM to provide a stopgap backup solution for VM level backups of certain machines that couldn't be backed up in other ways. I realise that running the backup server on the same cluster / SAN as the stuff that's being backed up is an odd thing to do, but this at least serves to provide snapshots, SAN replication provides resilience, and like I say, this is a stopgap.

    Then I started noticing problems. First symptom was that on starting / rebooting VMs, sometimes other VMs would hang for perhaps 30s - 2m, people would start complaining that SharePoint had gone unresponsive etc. However, they would come back to life in a minute or two.On a couple of occasions we came in in the morning to find a number of VMs off or paused (backups ran overnight). Both of these problems occurred only when the DPM server was turned on. I thought the issue might be general load on the SAN, having both the backup server and the machines being backed up living on the same CSV / hardware. I moved the DPM server to a different ISCSI box and put on aggressive throttling (200Mbps) to try to reduce load, but the problem continues.

    The event logs on the Hyper-V cluster suggest I/O timeouts to the SAN at the times of the backups. Lot's of event ID 1069, 1205, 1146, 1230,  (various cluster resources failed). The interesting one I think is 5120 Cluster Shared Volume 'Volume5' ('VOLUME NAME') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    Is anyone else using SP1 beta to successfully backup a 2012 Hyper-V cluster? 

    Is anyone seeing the same problem?

    Is it likely that this is a problem with SP1 beta, will it be fixed at RTM?

    Any suggestions for a stopgap solution?

    I think I might try setting up a test physical DPM server to check the issue isn't in someway related to the fact that the DPM server sits on the same cluster it's backing up. I'm also happy to consider the problem could lie elsewhere i.e. with the SAN storage (this was upgraded from v6 to v7 at the same time as the 2012 upgrade, but as soon as I tell the vendor that the problem relates to running a beta of DPM they will be pointing fingers at that.

    Thanks,

    Tim

    Donnerstag, 22. November 2012 13:02

Alle Antworten

  • Hi,

    Other customer running Windows 2012 hyper-V cluster and DPM 2012 SP1 have reported similar problems, however it has been determined that any backup product using shadow copies results in the same errors, this isn't a DPM Sp1 issue.  The Windows 2012 cluster CSV team is investigating this problem, hopefully they will find the cause and have a fix available before SP1 is released.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Freitag, 23. November 2012 04:08
  • Hi Mike,

    Thanks for the information. Interesting to hear others are having problems, I had searched for quite a while but couldn't find anyone reporting exactly the same issue in the forums. 

    Is there any way I can be notified when a fix is available? Is this likely to come out in a patch Tuesday update, or will it be a hotfix? Often the patch Tuesday updates are fairly vague - just stating "Various performance and stability fixes" which makes it tricky to know if a specific issue has been addressed yet.

    I have moved the DPM server to a standalone Hyper-V host with local storage - so it is now completely independent of the systems it's backing up. So going to try running this over night tonight to see if this helps.

    Thanks,

    Tim

    Montag, 26. November 2012 11:56
  • Hi,

    The investigation from the windows team is still ongoing so until they identify the problem and code up a fix I cannot say how will be made available. I'm keeping my eyes this issue so I can update this post with the final outcome.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Montag, 26. November 2012 14:10
  • That would be most appreciated Mike, thanks very much.
    • Als Antwort vorgeschlagen cciuleanu Mittwoch, 6. Februar 2013 12:29
    Montag, 26. November 2012 14:21
  • I'm encountering the same issue on my side, except that I'm using clustered Storage Spaces. I'm also noticing that VMs stop responding while the backups get up to a certain point in the backup process. I've tried to use backup serialization in DPM, it worked one day and after that failed. At first I thought it wasn't using the CSV VSS provider but after reviewing the DPM logs, it looks ok on that side.  I saw that for some reason Windows tries to cache the VHDX in memory while the backup is running. I was able to see this using RAMMap from Sysinternals. This caused problem for larger VMs (150GB+). When that was happening the hosts became more and more unresponsive and the throughput dropped dramatically when using Veeam (~1.5MB/s) In one instance, I saw one of the host BSOD (not sure yet if this is related). We have a case opened with Microsoft for this as well.

    Needless to say that this is a major pain in our migration from VMware to Hyper-V!

    Mittwoch, 28. November 2012 18:58
  • I'm replying to this thread because

    a) It's one of the only two threads on the whole internet that mentions "'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR"
    b) I'm getting the same symptoms.

    I can confirm that one of my nodes in my Hyper-V 2012 cluster recently experienced the following event:

    Log Name: System
    Source: Microsoft-Windows-FailoverClustering
    Event ID: 5120
    Logged: 02/12/2012 18:01:30

    Details: Cluster Shared Volume 'Volume2' ('ClusterStorage Volume 2') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    I can also confirm that I am using DPM 2012 SP1 Beta to back up this cluster.  I have been running this environment for quite some time now, and I can confirm that I've received 15 of these kinds of events (14 of which I was completely oblivious to).  What prompted me to do research this time is that I discovered that 3 of my virtual machines were in a paused state and were not available.  My other node (two node cluster) has had 5 of these events.

    As this is already in the hands of Microsoft I won't log a call but will follow this thread.  If there is any further information I can provide please ask.

    Oh yes, my primary storage is Fiber Channel, so it's not an iSCSI problem.
    Montag, 3. Dezember 2012 08:11
  • I also have this issue.  I am running a 2012 cluster on SMB3.0 storage (Microsoft scale-out file server cluster). My 2012 virtual servers usually shut down or go into a paused state. My 2008R2 servers usually hang for an extended period of time, and then become available again. Resolution on this issue would be much appreciated.

    Montag, 3. Dezember 2012 15:56
  • I've found a workaround for this I thought I'd share. It's a little convoluted but if like me not having your servers backed up was giving you sleepless nights, it might be worth it.

    Server 2012 introduces Hyper-V Replica allowing you to push an offline copy of your VMs to a remote server / site for DR purposes. This works from cluster to standalone. It's pretty simple to set up. You need a server with Hyper-V role installed to host the replicas. 

    Once your replicas are set up you can use DPM to backup the replicas. The replica VMs are turned off normally anyway so if backups do cause brief disk glitches it isn't going to interrupt any important services. My guess is this is a cluster related issue anyhow, so having the replicas on a standalone machine removes that issue. 

    HV Replica does allow hourly snapshots of the replicas, but it seems that it's not possible to change the frequency of these, so this isn't an efficient way of providing a decent retention time. For some reason, when I tried it DPM would only see the replicas to backup if snapshots were turned off.

    I've only set this up today, so can't comment on the long term reliability, but so far so good.

    Tim


    • Bearbeitet TimBoothby Freitag, 7. Dezember 2012 12:38 Typo
    Freitag, 7. Dezember 2012 12:36
  • I've found a workaround for this I thought I'd share. It's a little convoluted but if like me not having your servers backed up was giving you sleepless nights, it might be worth it.

    Server 2012 introduces Hyper-V Replica allowing you to push an offline copy of your VMs to a remote server / site for DR purposes. This works from cluster to standalone. It's pretty simple to set up. You need a server with Hyper-V role installed to host the replicas. 

    Once your replicas are set up you can use DPM to backup the replicas. The replica VMs are turned off normally anyway so if backups do cause brief disk glitches it isn't going to interrupt any important services. My guess is this is a cluster related issue anyhow, so having the replicas on a standalone machine removes that issue. 

    HV Replica does allow hourly snapshots of the replicas, but it seems that it's not possible to change the frequency of these, so this isn't an efficient way of providing a decent retention time. For some reason, when I tried it DPM would only see the replicas to backup if snapshots were turned off.

    I've only set this up today, so can't comment on the long term reliability, but so far so good.

    Tim


     Hi Tim

    It is sadly not supported to backup the replicas

     ref : http://blogs.technet.com/b/dpm/archive/2012/08/27/important-note-on-dpm-2012-and-the-windows-server-2012-hyper-v-replica-role.aspx


    my blog is at http://flemmingriis.com , let me know if you found the post or blog helpfull or leaves room for improvement

    Sonntag, 9. Dezember 2012 16:13
  • Hi Flemming,

    Thanks for the warning. The backups appear to be running successfully, but I've not attempted to restore one. Reading that blog article it sounds like the replication process could be modifying the VHDs at the same time as DPM backing them up, so presumably they could be in a messed up state.

    Back to the original plan of cross fingers and wait then :-(

    Tim

    Dienstag, 11. Dezember 2012 17:24
  • Hi all,

    As the System Center DPM SP1 RTM is released as per this link :

    http://social.technet.microsoft.com/Forums/en-US/dataprotectionmanager/thread/35a8d34e-1891-47c4-b1f8-64f4511c14ee/

    Does anyone knows if the problem described in this post is fixed?


    MCSE, MCTS, VCP, AIS, MCITP

    Montag, 24. Dezember 2012 23:07
  • Hi,

    Some Windows 2012 code defects have been identified and the Windows team is hard at work getting them fixed, tested and eventually released. There is no eta at this time, but I do know they are a high priority, so just need to let them finish them up and get posted.

    Thanks in advance for your continued patience.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Dienstag, 25. Dezember 2012 00:31
  • Thank you Mike,

    We are looking forward for this stable version. We have already deployed to 2 clients hyperv cluster with windows 2012, and at the moment only direct guest backup they perform. And we should configure through hypervisor backup to accept the projects :)


    MCSE, MCTS, VCP, AIS, MCITP

    Dienstag, 25. Dezember 2012 19:27
  • The problem is not fixed. I have deployed DPM 2012 with SP1 and this still causes the issue seen here.

    Sounds like this is a Windows patch that Microsoft is working on. I also have deployed 2012 in production and am only doing guest-based backups now because if I do host-based it brings down most of the VMs on the cluster

    Donnerstag, 27. Dezember 2012 03:04
  • Glad Mike pointed this thread out to me before I did my cluster migration.  I'll be watching for updates. 
    Freitag, 28. Dezember 2012 15:16
  • I'm having the same issue on one of my 2 clusters.

    Cluster1, 3x HP DL360 G7's connected to a Compellent SAN via 4GB FC & McData 4700 switches

    Cluster2, 2x Dell R620's directly connected to a Dell MD3620f via 8GB FC

    Both clusters run the same exact Windows Server 2012 Datacenter, imaged the same way at roughly the same time.  Cluster1 does not have any problems with backups, but cluster2 experiences these I/O problems after 8-24 hours of backups.  I've found that running one backup at a time manually doesn't seem to cause issues, but if I allow all of the backups to run on schedule and in parallel then Cluster2 will crash hard.

    Both clusters are being backed up by the same DPM 2012 SP1 (RTM) server.  The only real difference between them is the hardware, and the lack of a SAN Switch in Cluster2.  


    Lync/Asterisk blog: www.andrewparisio.com


    Donnerstag, 3. Januar 2013 22:24
  • I'm having the same issue on one of my 2 clusters.

    Cluster1, 3x HP DL360 G7's connected to a Compellent SAN via 4GB FC & McData 4700 switches

    Cluster2, 2x Dell R620's directly connected to a Dell MD3620f via 8GB FC

    Both clusters run the same exact Windows Server 2012 Datacenter, imaged the same way at roughly the same time.  Cluster1 does not have any problems with backups, but cluster2 experiences these I/O problems after 8-24 hours of backups.  I've found that running one backup at a time manually doesn't seem to cause issues, but if I allow all of the backups to run on schedule and in parallel then Cluster2 will crash hard.

    Both clusters are being backed up by the same DPM 2012 SP1 (RTM) server.  The only real difference between them is the hardware, and the lack of a SAN Switch in Cluster2.  


    Lync/Asterisk blog: www.andrewparisio.com



     KB2791729 is available from support if you open a case that have helped me on CSV residing on iscsi , i havent seen any issues on FC (edit so much for no problems on FC) Next post is FC :)

    my blog is at http://flemmingriis.com , let me know if you found the post or blog helpfull or leaves room for improvement


    Donnerstag, 3. Januar 2013 22:54
  • I'm having the same issue, aslo 2 clusters:

    cluster1 4x HP ML330 G6, 2x 8 Gbit FC Switch, HP P2000 G3

    cluster2 (testing) 2x HP ML110 G6, directly connected via 4 gbit FC to HP P2000 G3

    Sometimes some LUN disappear, or is inaccessible (and I have to switch on/off maintenance mode on this LUN), sometimes VMs on affected HyperV host pause.

    Both clusters have problems witch backup and I see these events.

    Cluster Shared Volume 'Volume6' ('HyperV Data 6') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.


    Donnerstag, 3. Januar 2013 23:07
  • Just checking in, is there any eta available yet for this fix?
    Donnerstag, 10. Januar 2013 16:52
  • Hi,

    I believe they are doing final validation of the fixes, so hopefully it will be released soon.  The fixes are from the Windows group, so I don't have much insight on their release schedule / plans.  When I learn more, I'll update the post again.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Donnerstag, 10. Januar 2013 17:42
  • If we call into PSS is there a fix available or will they defer to just deploying guest agents until the patch is released?
    Sonntag, 13. Januar 2013 03:32
  • As of 1-11-13, the fix is not yet available to any customers except a few helping to test them.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Sonntag, 13. Januar 2013 04:14
  • I am experiencing the same issue with a Hyper-V Server 2012 cluster and DPM 2012 SP1 (with rollup 1) running in a Windows 2012 VM on the same cluster.     The cluster storage is on an iSCSI SAN however backups are stored on a separate iSCSI NAS.    This configuration worked perfectly fine when the same hardware was running Hyper-V 2008 R2 SP1 and DPM 2012 on a Windows 2008 R2 guest.

    Please make the fix available as soon as possible

    Mark

    Montag, 14. Januar 2013 03:48
  • Hi All,

    This Windows 2012 fix was just released that should resolve two major issues that customers were reporting.

    Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
    http://support.microsoft.com/kb/2799728

    If you continue to see problems protecting Windows 2012 Hyper-V guests after installing the above hotfix, please open a support case for further investigation.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    • Als Antwort vorgeschlagen Wagner Polachini Dienstag, 15. Januar 2013 10:54
    • Nicht als Antwort vorgeschlagen TimBoothby Dienstag, 15. Januar 2013 16:28
    Montag, 14. Januar 2013 18:26
  • Excellent news Mike. I shall try it in the morning and report back.

    Thanks for being on the ball with this.

    Montag, 14. Januar 2013 21:04
  • Thanks Mike!
    Montag, 14. Januar 2013 21:23
  • Thanks Mike, will try this afternoon and confirm
    Montag, 14. Januar 2013 21:28
  • Just updated both nodes, rebooted and started running a backup of all VMs on the cluster.

    Still receiving the following on both production LUNs

    "Cluster Shared Volume 'LUN2' ('LUN2') is no longer available on this node because of 'STATUS_IO_TIMEOUT(c00000b5)'. All I/O will temporarily be queued until a path to the volume is reestablished."

    Dienstag, 15. Januar 2013 00:03
  • I have installed the hotfix on the cluster nodes and the DPM backup appears to be working well so far however the issue can take a few hours to appear so I will report back later.   RichL_PLA, do you have any VDS/VSS providers installed on the hosts from the SAN manufacturer?    If so, these may be causing a conflict unless they are Windows 2012 certified.

    Dienstag, 15. Januar 2013 00:39
  • I have installed the hotfix on the cluster nodes and the DPM backup appears to be working well so far however the issue can take a few hours to appear so I will report back later.   RichL_PLA, do you have any VDS/VSS providers installed on the hosts from the SAN manufacturer?    If so, these may be causing a conflict unless they are Windows 2012 certified.

    My first error came about 7-10 minutes after the backup started.

    No hardware VSS providers installed

    I also made sure to disable ODX per MS:

    After you install the hotfix, CSV volumes do not enter paused states as frequently. Additionally, a cluster’s ability to recover from expected paused states that occur when a CSV failover does not occur is improved.

    To avoid CSV failovers, you may have to make additional changes to the computer after you install the hotfix. For example, you may be experiencing the issue described in this article because of the lack of hardware support for Offloaded Data Transfer (ODX). This causes delays when the operating system queries for the hardware support during I/O requests.

    In this situation, disable ODX by changing the FilterSupportedFeaturesMode value for the storage device that does not support ODX to 1.

    Dienstag, 15. Januar 2013 00:42
  • You could also try serializing the backups assuming that you are using DPM http://technet.microsoft.com/en-us/library/ff634192.aspx

    So far no issues on the DPM backup I am running at the moment

    Regards

    Dienstag, 15. Januar 2013 01:07
  • You could also try serializing the backups assuming that you are using DPM http://technet.microsoft.com/en-us/library/ff634192.aspx

    So far no issues on the DPM backup I am running at the moment

    Regards


    As I understand it with CSV2, serialized backup is no longer necessary
    Dienstag, 15. Januar 2013 01:12
  • Hi All,

    This Windows 2012 fix was just released that should resolve two major issues that customers were reporting.

    Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
    http://support.microsoft.com/kb/2799728

    If you continue to see problems protecting Windows 2012 Hyper-V guests after installing the above hotfix, please open a support case for further investigation.

    Worked for me. Thank you very much.

    Wagner M. Polachini - IT Infrastructure Analyst

    Dienstag, 15. Januar 2013 10:55
  • Yes serialization of backups is not required on Hyper-V 2012 however it reduces storage I/O since only one VM is being backed up at a time.    The DPM backups have been successful and no errors since applying the hotfix to the hosts.
    Dienstag, 15. Januar 2013 12:04
  • I installed the hotfix and still got STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021) on 4 out of 52 VM backups. No VM's were left in a paused state. I am using serialization and no hardware VSS providers on fiber channel storage.
    Dienstag, 15. Januar 2013 13:55
  • I installed the hotfix and still got STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021) on 4 out of 52 VM backups. No VM's were left in a paused state. I am using serialization and no hardware VSS providers on fiber channel storage.
    No serialization here and it happened on 3 of my 21 VMs backups
    Dienstag, 15. Januar 2013 14:00
  • In addition after applying this hotfix there seemed to be a severe memory leak on the node storage owner. Memory ballooned up from a day to day 42Gb to 128GB (99% utilization on the host)

    I stopped the backup and the memory dropped immediately. Running it again now (just a resume on a single VM) and the memory is slowly creeping back up again from 43 to 55 and climbing still.

    Dienstag, 15. Januar 2013 15:14
  • Thanks Rich - you've put your finger on it. I've done the update and turned off ODX. I've done a number of backups successfully - there seemed to be some brief glitches in the availability of some of the VMs, but nothing crashed. 

    Then all the VMs on one of the nodes started flashing critical messages, shutting down, rebooting, migrating to other hosts etc. Looking into it, the host seemed to be out of memory, even with most of the guests offline. As with Rich, this was node was the storage owner. Cancelling the in progress backups immediately freed up the RAM.

    I agree with Rich's diagnosis - severe memory leak.



    • Bearbeitet TimBoothby Dienstag, 15. Januar 2013 16:32
    Dienstag, 15. Januar 2013 16:23
  • Thanks Rich - you've put your finger on it. I've done the update and turned off ODX. I've done a number of updates successfully - there seemed to be some brief glitches in the availability of some of the VMs, but nothing crashed. 

    Then all the VMs on one of the nodes started flashing critical messages, shutting down, rebooting, migrating to other hosts etc. Looking into it, the host seemed to be out of memory, even with most of the guests offline. As with Rich, this was node was the storage owner. Cancelling the in progress backups immediately freed up the RAM.

    I agree with Rich's diagnosis - severe memory leak.


    Glad to hear second validation here as well.

    I opened a case with PSS and am awaiting a call back. I'll report my findings as soon as I know something

    Dienstag, 15. Januar 2013 16:25
  • You can tell where my backups begun
    Dienstag, 15. Januar 2013 17:00
  • Here is the memory free on the storage owner node. I cleared the node of all guest prior to running this test. Task manager and performance monitor don't show any particular process eating all the RAM - but something clearly is.

    Dienstag, 15. Januar 2013 18:03
  • FYI, working with the perf team and ran RAMMap, seems to be a memory leak of sorts for the volume shadow copy process for the VM trying to be backed up

    Dienstag, 15. Januar 2013 20:43
  • We have seen this, too.  Within our monitoring software, SQL Sentry, we noted that the memory ballooning is tied to the file cache - which I'm guessing is related to the shadow copy/vds/vss stuff.  We have just installed the released hotfix and we're working to see if the stability issues are resolved, which were a much bigger deal for us...and have left me sleep deprived.

    I've pasted a screenshot below from SQL Sentry Performance Advisor.  It shows the memory peaks with each VM being backed up on that host.

    One other thing - I had the host exhaust its memory when the pagefile was set to 4GB, but have since changed it to allow WS2012 to do whatever (system managed).  Not sure if that has helped, but can't say it has hurt either.  The host has 96GB and in VMM I had reserved 6GB...but when DPM kicked off, it just mowed right over all of it.  Sigh.
    • Bearbeitet MarkLarma Dienstag, 15. Januar 2013 23:20
    Dienstag, 15. Januar 2013 21:07
  • I am experiencing the same issue with low memory on the hosts especially when a VM with a large virtual disk is being backed up by DPM.     The hotfix has stopped any errors appearing on the cluster or CSV disks now however the VMM agent service crashes on the host which has the VM being backed up due to low memory and when I tried to restart it while the backup is still running the server rebooted with bug check 0x000009e.     The host was also the CSV owner at the time.

    Can anyone confirm if the hotfix also needs to be applied to the DPM server (which is running on Windows 2012)?

    Mittwoch, 16. Januar 2013 00:20
  • Hi All,

    I've been monitoring this closely and have been told that the memory leak issue is actively under investigation by the Windows team. At this time I don't have any more information.  Thanks for sharing your experience, it helps prioritize the efforts to find a solution.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Mittwoch, 16. Januar 2013 00:45
  • Hi everybody,

    I had a case opened with Microsoft for the problem where most (if not all) the RAM gets consumed during a VM backup with DPM. I saw the same behaviour as you guys where I saw that the VHDx being backed up was cached. In the case of the large VMs, it's problematic to say the least ;-). I've been told to use DynCache to control the size of the FS cache. I have yet to try it, will do today in conjunction with the hotfix KB2799728.

    Mathieu

    Mittwoch, 16. Januar 2013 18:20
  • I received a response from my case lead this evening that the product and development team are working on a resolution. Nothing more than that for the time being
    Donnerstag, 17. Januar 2013 05:15
  • After installing KB2799728, I got this console error (on all server, I applied KB). I can manage my clusters only remotly from server without KB2799728.

    I can aslo confirm memory leak when backup runs.


    Donnerstag, 17. Januar 2013 11:15
  • After installing KB2799728, I got this console error (on all server, I applied KB). I can manage my clusters only remotly from server without KB2799728.

    I can aslo confirm memory leak when backup runs.



    This is a known issue with KB2750149. If you uninstall it, you will not receive that error when opening Failover Cluster Manager
    Donnerstag, 17. Januar 2013 12:19
  • Would the following registry items do anything to help?

    Windows Registry Editor Version 5.00

    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management]
    "LowMemoryThreshold"=dword:00001800

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Virtualization]
    "MemoryReserve"=dword:00001800

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Virtualization\AdditionalMemoryReserve]
    "FailOverClusteringMemReserve"=dword:00001000

    Donnerstag, 17. Januar 2013 20:34
  • To answer my own question...it made no difference.  Come on, MS...where's the hotfix or a workaround that we don't have to call in for?
    Donnerstag, 17. Januar 2013 21:58
  • I think this is crap!

    MS put out a new product that can not be backed up. I have tried several backup programs. All with the same result, CSV reporting 0KB free and all WM´s is stopped critical. 

    Everyone told me that using DPM should be safe. It is MS product and it´s a sure thing it will work. BUT IT IS THE SAME PROBLEM!

    PUBLISH A WORKING PATCH NOW! 

    Montag, 21. Januar 2013 08:48
  • I have tried several backup programs. All with the same result, CSV reporting 0KB free and all WM´s is stopped critical. 
    Everyone told me that using DPM should be safe. It is MS product and it´s a sure thing it will work. BUT IT IS THE SAME PROBLEM!
    It's not a DPM issue, but NTFS driver... All backup solutions must be affected due it's needed to fix a file system driver...
    Montag, 21. Januar 2013 08:55
  • Same problem here. Fully patched 2012 cluster with DPM 2012 SP1 RTM + Rollup 1.
    Montag, 21. Januar 2013 09:39
  • Hello Everyone,

    I continue to monitor this thread and I know how frustrating this problem is, but I can assure you that the Windows team is really hard at work fixing and testing fixes for the issues that have been uncovered.   It appears that some customers are effected more than others due to scale and various workloads placed on the cluster, windows component interoperability timing etc.  Another fix is in the works and will hopefully be released soon, however as I have said in the past, I'm not privy to the Windows team's time tables.  Thanks again for sharing your experiences, as that does sometimes help prioritize internal processes, although this one is already a very high priority. 


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Montag, 21. Januar 2013 23:28
  • I did just try this KB that has been released. It did not resolve anything. 

    Br

    Patrik

    Dienstag, 22. Januar 2013 09:45
  • I have tried several backup programs. All with the same result, CSV reporting 0KB free and all WM´s is stopped critical. 
    Everyone told me that using DPM should be safe. It is MS product and it´s a sure thing it will work. BUT IT IS THE SAME PROBLEM!

    It's not a DPM issue, but NTFS driver... All backup solutions must be affected due it's needed to fix a file system driver...

    Yes i know that. But first MS told us that this problem only occurred when using other backup software. 

    Is it possible to run 2012 VM´s in 2008 R2 Hyper-V Cluster? I starting to thing about reinstalling my Hyper-V servers and configure a new 2008 R2 Cluster. 

    Br

    Patrik

     
    Dienstag, 22. Januar 2013 11:58
  • Is it possible to run 2012 VM´s in 2008 R2 Hyper-V Cluster? I starting to thing about reinstalling my Hyper-V servers and configure a new 2008 R2 Cluster. 

    Br

    Patrik

    Yes, this is fine. I have two clusters, one Windows 2012 with 39 VM's and one Windows 2008 R2 with 309 VM's, of which quite a few of them are Windows 2012.  Don't bother installing the Hyper-V components on top of them though, because the integrated components are already a newer version than what Windows 2008 R2 will install.

    Looking forward to a fix for this and a fix for other issue I won't go into before I can migrate my main cluster to Windows 2012.

    Dienstag, 22. Januar 2013 12:10
  • Is it possible to run 2012 VM´s in 2008 R2 Hyper-V Cluster? I starting to thing about reinstalling my Hyper-V servers and configure a new 2008 R2 Cluster.
    If your VM based on VHD (non VHDX) - it's possible to migrate easy to 2008 R2 back. Even if VM configuration will be unreadable - just create new VM and assign necessary VHD-files to it.

    • Bearbeitet AndricoRus Dienstag, 22. Januar 2013 12:11
    Dienstag, 22. Januar 2013 12:10
  • After changing the ODX setting i get these errors in log:

    Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    and

    Software snapshot creation on Cluster Shared Volume(s) ('\\?\Volume{846c5e2e-a28a-4610-8006-21cee18f6a27}\') with snapshot set id '{486ff065-0f08-4a7f-b3aa-4f5f5c565581}' failed with error 'HrError(0x0000139f)(5023)'. Please check the state of the CSV resources and the system events of the resource owner nodes.

    Br
    Patrik

    Dienstag, 22. Januar 2013 12:57
  • Same issue here.

    The fix did solve the problem that VM's go into pause or stopped.

    But the memory issue is still there while running backups of big VM's (+ 400gig) using DPM.

    It eat's up all the memory of the host that is holding the CSV containing the VM.

    Cheers,

    Ramon

    Mittwoch, 23. Januar 2013 10:58
  • I just received an update from my case owner that the product team is still working this with no ETA on delivery just yet.

    I was also notified of a temporary workaround that isn't very scalable outside of smaller environments, however i am testing this now. The recommendation in the interim is to move all VMs to a single node in the cluster and also make sure that node is the owner of all CSVs, then perform the backup.

    I'm validating this now.

    Mittwoch, 23. Januar 2013 14:42
  • Interesting workaround. That sounds wrong to me though as it's the node that is the CSV owner that has the memory leak. When I tried something like this all the VMs ended up showing critical memory alerts then shutting down and moving to other nodes in a not very graceful manner.

    The workaround I've been using on a 4 node cluster is to remove all VMs from one node, make this the owner of the CSVs and then backup. All the RAM will get sucked out of this sacrificial node but the backups are working and the VMs aren't affected. 

    Tim

    Mittwoch, 23. Januar 2013 14:55
  • I too am skeptical.
    Mittwoch, 23. Januar 2013 15:01
  • After changing the ODX setting i get these errors in log:

    Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    and

    Software snapshot creation on Cluster Shared Volume(s) ('\\?\Volume{846c5e2e-a28a-4610-8006-21cee18f6a27}\') with snapshot set id '{486ff065-0f08-4a7f-b3aa-4f5f5c565581}' failed with error 'HrError(0x0000139f)(5023)'. Please check the state of the CSV resources and the system events of the resource owner nodes.

    Br
    Patrik

    Anyone else getting these kinds of errors after applying KB and changing ODX setting? 

    Br
    Patrik

    Donnerstag, 24. Januar 2013 12:25
  • http://support.microsoft.com/kb/2803748/en-us

    Donnerstag, 24. Januar 2013 12:37
  • Anyone else getting these kinds of errors after applying KB and changing ODX setting? 

    Br
    Patrik

    Hi Patrik,

    I've checked a couple of nodes and no, I'm not seeing those errors.

    Tim

    Donnerstag, 24. Januar 2013 14:48
  • RichL_PLA,

    Does the workaround of moving all the VMs and CSVs to one node work for DPM backups?    I am too afraid to try as we don't have any host base backups since migrating the cluster to Hyper-V 2012 so the safer option at the moment is to not run the backup.

    Donnerstag, 24. Januar 2013 14:55
  • I to am having the memory leak issue to the point it crashes the Host and all the VMs on that host save critical and jump ship.   Very frustrating.  I have applied KB2799728 and am now waiting on whatever the latest fix to this fiasco will be.  



    • Bearbeitet Seth H. _ Donnerstag, 24. Januar 2013 21:01
    Donnerstag, 24. Januar 2013 21:00
  • Not meaning to sound pendantic, but if KB2799728 causes a memory leak, shouldn't the hotfix be removed?  Surely guaranteed memory leaks is a bigger issue than random virtual machines potentially going into a paused state during backup?

    I suppose a valid question is, are there people deploying this hotfix NOT getting the memory leak?

    Freitag, 25. Januar 2013 09:44
  • Not meaning to sound pendantic, but if KB2799728 causes a memory leak, shouldn't the hotfix be removed?  Surely guaranteed memory leaks is a bigger issue than random virtual machines potentially going into a paused state during backup?

    I suppose a valid question is, are there people deploying this hotfix NOT getting the memory leak?

    Doesn't appear to have resolved the CSV crash problem for me, seeing the issue with both DPM and Veeam. Memory leak wise we're also suffering so -2 for us!
    Freitag, 25. Januar 2013 10:48
  • The more I think about this, the more I just don't understand how this never showed up in testing.  The three bugs I've seen within the last month have been pretty big...this one, the Failover manager MSC crashing after the .net 4.5 update (there's a patch now thankfully) and another I saw posted regarding updating to the new DPM SP1 where creating another protection group blows up the MMC...I mean jeez...  I expect to see some bugs, but this level of sloppiness is not a good thing from MS.  I think we all expect better.

    On that note...I would say to the dev team, thanks for working hard on this and we look forward to the hotfix.  To the folks testing...please stay focused, hire more testers, etc.

    Freitag, 25. Januar 2013 18:46
  • We have also a case open with german premier support. On our side issue is really easy to reproduce. We tried many different settings and scenarios. Only stable config for us is at the moment:

    - disable odx  ( Set-ItemProperty hklm:\system\currentcontrolset\control\filesystem -Name "FilterSupportedFeaturesMode" -Value 1 )

    - uninstall hotfix KB2799728 to avoid memory leak

    - enable per host and lun serialization in dpm ( http://technet.microsoft.com/en-us/library/hh757922.aspx )

    These settings seem to be really stable for our cluster while doing backups. As we are at the moment in the middle of the migration from 2008r2 cluster and issue raised slowly with filling up the cluster with vm's I think the issue is heavily related to load and iops on csv volumes and storage network.

    Freitag, 25. Januar 2013 22:19
  • Are there any more information about this yet? We are in the process of just starting our image-backups on our new 2012 cluster, but fortunately ran into this thread just before starting it in production.

    We are waiting for a fix to come, and someone in this thread testing it and confirming the functionallity before we go to production.

    Dienstag, 29. Januar 2013 07:43
  • Are there any more information about this yet?

    We are waiting for a fix to come, and someone in this thread testing it and confirming the functionallity before we go to production.

    PSS reported me today - the hotfix planned to be released in approx. 2-3 weeks: after internal testing will be completed...
    Dienstag, 29. Januar 2013 07:47
  • just have to wait ...  because we have the same issue

    sys_admin

    Dienstag, 29. Januar 2013 09:58
  • 6 node cluster running 88 VM's with iSCSI Storage on HP Lefthand with production workloads!

    Everything fine until we migrated heavier workloads to the cluster.

    Then.... we experienced the paused VM issue back in December.
    Then.... we applied the patch a few weeks ago and had the CSV IO Timeout issues every other day.
    Then.... we Disabled ODX yesterday and now have the memory leak issue.

    Server 2012 Hyper-V 3.0 has become a nightmare to administer with these problems.

    Come on Microsoft we need this memory leak fixed!!!



    Mittwoch, 30. Januar 2013 17:13
  • just uninstall the hotfix and do a serialized backup until the hotfix is available. Backup is a way slower but everything should work stable.
    Donnerstag, 31. Januar 2013 10:33
  • This hotfix has ended up being a huge problem for us. We had no memory leak problems before installing the hotfix. We had memory leak problems once the hotfix was installed, and now we're unfortunately still experiencing the memory leaks after uninstalling. BE WARNED! DO NOT INSTALL this hotfix mentioned!!!

    I called PSS and they have nobody available to help and there is no fix. We're basically screwed until we can find a place to migrate all of these VMs to so that we can completely reinstall Server 2012 at this point. This is beyond upsetting given that this is a generally available product at this point.


    Aaron Marks

    Sonntag, 3. Februar 2013 06:47
  • I was wondering if there was more information about the following:

    (1) Has anyone experienced this problem when using guest-based backups (instead of host-based backups)?

    (2) Has anyone experienced this problem when backing up to a destination that is not within the CSV hosting the Hyper-V cluster? In other words, a separate iSCSI target for example (one not part of the cluster).

    Thank you all for this excellent thread.

    Montag, 4. Februar 2013 14:15
  • I'd just like to add my voice to the many that are experiencing this issue.  We are having the identical problem on our Hyper-V 2012 iSCSI cluster being backed up by DPM 2012 SP1 with update rollup 1.  I installed the hotfix and immediately we started having the memory leak issue on the node that is the owner of the CSV volumes.  

    I have enabled serialized backups in DPM so we'll see how things go tonight but this is a pretty serious problem for us.

    In regards to the question RJMPhD asked, we have not experienced this problem backing up Windows 2012 Hyper-V servers that are not clustered/attached to CSV volumes.  We have two 2012 Hyper-V servers that we run development workloads on and those are standalone with standard iSCSI volumes on the same SAN as the clustered servers we're experiencing problems with.  The standalone non-clustered 2012 servers are able to be backed up without any volumes or VMs going offline.

    Montag, 4. Februar 2013 21:55
  • In regards to the question RJMPhD asked, we have not experienced this problem backing up Windows 2012 Hyper-V servers that are not clustered/attached to CSV volumes.  We have two 2012 Hyper-V servers that we run development workloads on and those are standalone with standard iSCSI volumes on the same SAN as the clustered servers we're experiencing problems with.  The standalone non-clustered 2012 servers are able to be backed up without any volumes or VMs going offline.

    I should have clarified; I was specifically asking about the question where the Hyper-V instances are running within the CSV, but the destination of the backup is not. Regardless, it sounds like this is a very serious problem and I can only imagine how frustrating it might be.

    Mittwoch, 6. Februar 2013 13:07
  • Hello,

    I think RJMPhD, your question fits into the environment we have. We are running a Hyper-V 2012 cluster with CSV on an iSCSI SAN and using DPM 2012 SP1 for backup. DPM is a separate physical machine with local disks (not on a SAN). We are experiencing all the issues described above. We also experienced the memory leak problem from the hotfix and unfortunately I have to confirm, uninstalling it does not prevent memory leaks. We had to reinstall that node from scratch. 

    Currently we are running as suggested above with serialized DPM backup. Our SAN does not support ODX so no need to disable that.

    BTW, the script mentioned in the MS KB (http://technet.microsoft.com/en-us/library/hh757922.aspx) has a bug. It assumes the CSV name starts with "Volume". Well, our does not and I ended up with empty XML file. MS if you hear this, please fix the script (look for the line "$filelist = dir $dir\Volume*" and change it to "$filelist = dir $dir\*" ).

    Now a question to others:

    Even though we are now running serialized backup, using a CSV owner node that does not have (and never had) this dreaded hotfix installed we are still experiencing high memory consumption. While backing up all free memory on this host is taken up to 99-100%. When backup is finished memory usage drops by 4-5 GB. The VMs on this host do not seem to experience memory pressure. Is this behavior expected and if yes, can we control the amount of cache the backup of csv is using on the owner node ? 

    Mittwoch, 6. Februar 2013 14:57
  • Hello,

    I think RJMPhD, your question fits into the environment we have. We are running a Hyper-V 2012 cluster with CSV on an iSCSI SAN and using DPM 2012 SP1 for backup. DPM is a separate physical machine with local disks (not on a SAN). We are experiencing all the issues described above. We also experienced the memory leak problem from the hotfix and unfortunately I have to confirm, uninstalling it does not prevent memory leaks. We had to reinstall that node from scratch. 


    Interesting; thank you for the clarification. It sounds like you are using host-based backups --- is this true?. I wonder about the pros/cons of host-based versus guest-based backups. Specifically, I wonder how live migration interacts with host-based backups; what happens within DPM when/if a guest migrates from a particular host? (Perhaps this is too far off topic and should be its own thread.)
    Mittwoch, 6. Februar 2013 15:04
  • Interesting; thank you for the clarification. It sounds like you are using host-based backups --- is this true?. I wonder about the pros/cons of host-based versus guest-based backups. Specifically, I wonder how live migration interacts with host-based backups; what happens within DPM when/if a guest migrates from a particular host? 

    Just a quick answer so we are not cluttering this thread. Yes we do use a host based backup. For us this is the best option as our VMs are 99% development/test environments that are on separate domains, have TMGs blocking traffic, etc. and they do change quite often. As for the problem with live migration - if I try to live migrate manually while guest is backed up it fails but nothing wrong happens. I can retry it later. The VMM dynamic optimizer seems to be clever enough to detect this as I did not yet observed any fails during automatic migrations. All these are based on a fairly short term observations so I may not be getting a full picture.
    Mittwoch, 6. Februar 2013 18:56
  • I should have clarified; I was specifically asking about the question where the Hyper-V instances are running within the CSV, but the destination of the backup is not. Regardless, it sounds like this is a very serious problem and I can only imagine how frustrating it might be.

    Hi RJMPhD, sorry for misunderstanding what you were asking.  In our environment our DPM server is one of the few standalone physical servers with a directly attached SAS array were it stores the backups.  So in our case yes, the Hyper-V instances are running within the CSV volumes but are being backed up to a destination that is outside of the CSV. 

    • Bearbeitet HorusCG Donnerstag, 7. Februar 2013 16:31
    Donnerstag, 7. Februar 2013 16:31
  • Hi Everyone,

    I have received an update from the Windows folks regarding a fix for the outstanding issues (including the memory leak).  They are currently in the final round of thorough testing of the new fixes and once validated will be released to the public in a new KB.  That KB will supersede and be a replacement for the original that was released a few weeks back.  Assuming testing all goes well, anticipate the fix to be available by the end of the Month, if not sooner.

    Again I would like to thank all of you for the participation on this thread sharing your experiences and workaround validation, and look forward to hearing feedback from the new fix once it's available.

    Stay tuned.... I will update the thread with the new KB number when the fix is available for download.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Donnerstag, 7. Februar 2013 17:13
  • I have already exported, converted all WM´s, re-installed Hyper-V Cluster with 2008 R2 and re-configured everything. Just imported the last WM´s. Now installing DPM again to run backups. Hopefully alot better then on 2012.

    I know it is easy to complain. But i think Windows Server 2012 with Hyper-V would be great when they fixed all the problems. Also i want to say, I will never again be first to try out new MS products. I will wait about 6-12 months before trying.

    Br
    Patrik


    • Bearbeitet boje_ Freitag, 8. Februar 2013 11:13
    Freitag, 8. Februar 2013 11:12
  • We have a smaller environment and could copy all VM's onto one Host. This doesn't seem to fix the memory leak issue though.

    Simon Holman
    Expeed Technology
    http://expeed.com.au

    Mittwoch, 13. Februar 2013 00:18
  • Hi,

    Updated fix will be made available soon to address some of the issues you face. stay tuned.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.


    Mittwoch, 13. Februar 2013 14:59
  • Hi,

    Updated fix will be made available soon to address some of the issues you face. stay tuned.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.



    I see you deleted your previous post, I'm assuming we should expect today?
    Mittwoch, 13. Februar 2013 16:07
  • That would be fantastic as I have a maintenance window tonight!
    Mittwoch, 13. Februar 2013 21:59
  • Not yet made it to the public KB - once it's available I'll provide the KB number.

    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Mittwoch, 13. Februar 2013 22:15
  • Hey Mike,

    I saw this hotfix talking about a handle leak, also related to DPM. Is this the one we are looking for?

    "Assume that you use the WmiPrvSE.exe process for performance data collection on a Windows 8-based or Windows Server 2012-based computer. In this situation, a handle leak may occur in one of the WmiPrvSE.exe instances. Additionally, Microsoft System Center 2012 features that rely on performance data (for example, System Center Virtual Machine Manager (SCVMM), Data Protection Manager (DPM), and System Center Operations Manager (SCOM) may fail." http://support.microsoft.com/kb/2790831/en-us

    Best regards,

    Hans Vredevoort
    MVP Virtual Machine
    @hvredevoort
    www.hyper-v.nu



    Donnerstag, 14. Februar 2013 08:24
  • Hi Hans

    I just applied this hotfix to my 2 servers and it did not solve the problem. Hopefully the hotfix Mike is talking about is released very soon.

    Simon


    Simon Holman
    Expeed Technology
    http://expeed.com.au

    Donnerstag, 14. Februar 2013 09:05
  • Thanks for trying this out Simon.

    Sure hope the real fix comes very soon!

    Best regards, Hans


    Senior Consultant and Architect Servers and Storage Solutions Nobel

    Donnerstag, 14. Februar 2013 11:05
  • I was near the end of my maintenance last night when I saw your post, Hans.  Thank you so much for posting it as I'm sure you're anxious as well - but in reading it, it didn't appear to apply to this issue.  However, I will be looking into it as it nonetheless seems relevant.  Thanks!

    Donnerstag, 14. Februar 2013 15:38
  • Hi Guys,

    The memory leak causing the most grief is in the Windows Cluster csvflt.sys and that is now resolved in the new fix to be released.  The fix is code complete and tested, we're just waiting for it to be published and made available to the public.... I'm hoping later today. 


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Donnerstag, 14. Februar 2013 17:24
  • I really hope this gets released today and doesn't go on into another weekend without a resolution.

    Simon Holman
    Expeed Technology
    http://expeed.com.au

    Freitag, 15. Februar 2013 11:00
  • I really hope this gets released today and doesn't go on into another weekend without a resolution.

    Simon Holman
    Expeed Technology
    http://expeed.com.au


    You got that right...Mike is just being a tease with that hotfix.  :-)
    Freitag, 15. Februar 2013 15:21
  • Is this new fix supposed to supersede the original discussed hotfix? If so does that mean that we can leave the original hotfix installed and install this new one over it? Or are we going to have to uninstall the original hotfix before installing the new hotfix? Just asking so we can have this question out-of-the-way before the new hotfix is released.

    Aaron Marks

    Freitag, 15. Februar 2013 17:42
  • Hi All,

    The hold up on making the fix public is with getting the new KB article updated to include the new fixes and published.  Yes, the new fix will both supersede and replace the original fix. You can install it over the original fix, or simply install it as the only fix if installing it for the first time.  I know we have people working hard on the getting the KB finished and published, but still no solid ETA.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Freitag, 15. Februar 2013 18:00
  • @Mike, we're really struggling in waiting for this fix. I've had a case open for a few weeks now: 113020110184584

    Would it be possible for you to reach out to my case owner and let him know the new KB number for this hotfix so he can get it to me before it is released? I was previously supplied with another "private" hotfix numbered KB2791729 which didn't help the problem at all (possibly made it worse). I'm assuming that this must not be the same hotfix that you're recently mentioning. If you are willing to contact me over email to provide the new hotfix number, please contact me through my contact page on my blog: http://blog.aaronmarks.com/?page_id=50

    Thank you!


    Aaron Marks

    Samstag, 16. Februar 2013 09:46
    • Als Antwort vorgeschlagen Aaron M Marks Samstag, 16. Februar 2013 20:41
    Samstag, 16. Februar 2013 11:24
  • Hi

    The Windows team has just released a V2 of the fix to address CSV backup issues and is available for download today.  This will address the known memory leak issue along with some other issues that were discover during testing.

    This fix Supersedes the original fix and includes all fixes contained in the original.

    Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
    http://support.microsoft.com/kb/2813630

    The Windows team is investigating other issues found during testing and not included in this release.  However they wanted to get this fix published since the memory leak issue is fixed and provide immediate relief. 

    Please make us aware of any issue you face after installing this fix and again thanks for your continued patience while we continue our scaled out testing



    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.


    Samstag, 16. Februar 2013 14:06
  • Thanks Mike, I have applied the hotfix to the hosts and the DPM backup is running well so far without serialization.    I will report back once the backups have completed.

    Regards

    Samstag, 16. Februar 2013 15:06
  • Hi all,

    I have applied the last fix and all available updates to both Win 2012 hyperv hosts. Then i backed up 13 VM's succesfuly, but on the CSV owner  host only, i see many of this error: "Unexpected failure. Error code: 48F@01000003". Keep in mind that this error i used to see before the fix also. I used veeam software  to backup all the vm's.


    MCSE, MCTS, VCP, AIS, MCITP

    Samstag, 16. Februar 2013 22:02
  • I have installed the patch and so far, so good. I have gotten past the point where it would cause issues previously.

    I'll rest easy once all backups are completed.


    Simon Holman
    Expeed Technology
    http://expeed.com.au

    Samstag, 16. Februar 2013 22:08
  • Hi all,

    The DPM backups have completed successfully with no errors on the cluster so the hotfix appears to have resolved the problem

    Regards

    Sonntag, 17. Februar 2013 08:41
  • I'm seeing the same thing here. All seems to be working happily.

    Simon Holman
    Expeed Technology
    http://expeed.com.au

    Sonntag, 17. Februar 2013 08:44
  • UPDATE: I am now seeing the error

    Details: Cluster Shared Volume 'Volume2' ('ClusterStorage Volume 2') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    I was NOT seeing this error prior to installing the hotfix above.


    Simon Holman
    Expeed Technology
    http://expeed.com.au

    Sonntag, 17. Februar 2013 09:09
  • Hi Simon,

    Are you using DPM for backups    If so is it running on Windows 2012 and have you installed the hotfix on that server as well?     I am not sure if it is required considering the DPM server does not have CSV volumes however I am running DPM 2012 SP1 on Windows 2012 with the hotfix applied and haven't experienced any issues so just wondering if the patch is also needed on the DPM server.

    Regards

    Sonntag, 17. Februar 2013 09:35
  • Same...

    Cluster Shared Volume 'DataVHD' ('DataVHD') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.


    Aaron Marks

    Sonntag, 17. Februar 2013 10:38
  • I've been experiencing this issue on a two-node cluster as well and unfortunately KB2813630 hasn't helped. During the backup window today, the various CSV errors detailed above are still occuring.

    In addition, here's the graph of available memory on one of the nodes which became the CSV owner at around 9am after the backup which started around 4am crashed the other node at that time due to it running out of memory. At this point, I cancelled every pending job for VMs over about 75Gb in size which is about the limit that can be transferred before the node runs out of memory. You see clearly how each VM is represented as a dip on the graph.

    During this backup, ODX was not disabled; I've now disabled it on both nodes and will carry out some further testing tomorrow.

    Montag, 18. Februar 2013 01:31
  • Hi GuySmith,

    What backup software are you using and have you applied to hotfix to the backup server if it is Windows 2012?

    Regards

    Montag, 18. Februar 2013 02:24
  • Hi Mark,

    It's DPM 2012 SP1 (4.1.3313.0) running on Windows Server 2008 R2 SP1 so no need to apply the hotfix on that side.

    Montag, 18. Februar 2013 02:29
  • Install this update (http://support.microsoft.com/kb/2813630) on a two-node cluster. After archiving DPM 2012 SP1 abnormalities were not found.

    sys_admin


    Montag, 18. Februar 2013 05:43
  • Install this update (http://support.microsoft.com/kb/2813630) on a two-node cluster. After archiving DPM 2012 SP1 abnormalities were not found.

    sys_admin



    As per numerous posts above, this update has been installed by a number of us (indeed, as you'll see we were waiting for it) and it doesn't seem to resolve the problem.
    Montag, 18. Februar 2013 08:28
  • I didn't get the STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021) error until AFTER I installed 2813630.

    Before then I just had the memory leak issue.


    Simon Holman
    Expeed Technology
    http://expeed.com.au

    Montag, 18. Februar 2013 08:52
  • We got the same auto pause error again tonight. I'm wondering at the moment if installing KB2791729 alongside KB2813630 might fix this "CSV_AUTO_PAUSE_ERROR". Anyone tried this by chance? I don't know if I"m willing to waste any more time troubleshooting this without contribution from Microsoft. 

    Did anyone else open up a case with PSS and find how utterly lacking Microsoft's support is to MS Partner's these days? I reported the complaint to my Microsoft tPAM as we've also had even worse issues with PSS for DPM. The DPM PSS team is so overloaded that you generally can't get a call back for 48-72 hours even when they tell you 2 hours. 


    Aaron Marks

    Montag, 18. Februar 2013 09:17
  • I have enabled the CSV serialization like Stefan mentioned above so we'll see if that resolves the issue .

    Simon Holman
    Expeed Technology
    Australian Web Hosting

    Montag, 18. Februar 2013 10:17
  • I installed the hotfix Mike posted as well as the one that Hans listed, so I had hopes things would be good....  KB2790831 and KB2813630 (v2)

    I'm also getting a ton of errors.  I tried pushing things last night with doing the backups after VMM did an optimization so that some VMs weren't on the same node as the CSV and it went to a dark and evil place.  We were able to keep things stable before with having the VMs on the same node as their respective CSVs, but this obviously isn't ideal.

    I'm parsing through the logs and seeing VSS getting access denied errors when the backups start (Event ID 8194 - VSS is the source)

     

    Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface.  hr = 0x80070005, Access is denied.

    . This is often caused by incorrect security settings in either the writer or requestor process.

    Anyone else seeing this sort of thing in the system log of the local node where you see the csv pause errors?

     

     

    Montag, 18. Februar 2013 16:19
  • MarkLarma, I get the exact same Event ID 8194 error that you do.

    I have 4 CSV volumes on a SAS attached HP san and one volume that resides on a Server 2012 SMB3 share.

    I thougth it could have something to do with the SMB3 volume but even if I move my VHDX files from it to the SAN and remove the SMB volyme i get the same error.

     

    Montag, 18. Februar 2013 18:03
  • MarkLarma, I get the exact same Event ID 8194 error that you do.

    I have 4 CSV volumes on a SAS attached HP san and one volume that resides on a Server 2012 SMB3 share.

    I thougth it could have something to do with the SMB3 volume but even if I move my VHDX files from it to the SAN and remove the SMB volyme i get the same error.

     

    I also have a SAS attached SAN.  Our setup is the Intel Modular Server MFS25 with four of the MFS5520VI blades. (Each has dual x5560's, 96GB, 4 Intel Gig NICs, LSI SAS HBA).  The array is the Promise E610sD with two add-on shelves.  Anyway, all firmware is current and aside from DPM this thing is rock solid.

    I perused the logs a bit more and they have several errors in there just littering it up once things start going badly.  I'd be happy to send these to Microsoft if they'd like it.  I'd imagine, ToniKo, that you're seeing the same thing (as are a lot of folks I'm guessing). 

    I will be putting the VMs on the hosts with their storage and hopefully tonight it won't have issues.  I also wish everyone else luck with this issue...and wish Microsoft would make it so we didn't need luck :-)

    Montag, 18. Februar 2013 18:29
  • Installing KB2813630 on the hosts and the DPM server, did not solve the memory leak problem for me.

     

    I am running a 3 node cluster.

    2 hosts containing all the vm's, 1 node containing the csv's.

     

    This setup was working good for small vm's. As long I was backingup one vm at a time.

    This way the node holing only the csv's has enough free memory to fill during DPM backup.

     

    After installing KB2813630 I tried to backup one vm, which has one disk of 400G. After 30 minutes the memory usage went from 8G to 120G and I had to abort the backup, otherwise the host containg the csv's crashed, causing al vm's to go down.

     

    Ramon


    Montag, 18. Februar 2013 19:04
  • Hello All,

    I would like to comment on this error:

    Cluster Shared Volume 'Volume2' ('ClusterStorage Volume 2') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.


    STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR is generated when csvfs filter attempts to retrieve the Copy On Write bitmap for a snapshot volume that has been cleaned up.  This error is most likely occurring on large scale hyper-v deployments and is one of the issues we discover after fixing other scale out problems addressed in the V2 fix. Due to ongoing long haul testing required to be done, we did not want to hold up V2 of the fix that we just released, so the Windows group will release a more compressive V3 patch a little later to address that and other issues found during large scale testing.

    For any customers still experiencing the same symptoms as outlined in KB2813630 after installing the fix, please check binary versions on all nodes.

    File name       File version       File size        Date        
    ======      =========   ======    ====
    Csvflt.sys     6.2.9200.20626   205,824      06-Feb-2013
    Clussvc.exe  6.2.9200.20623   7,217,152   07-Feb-2013
    Ntfs.sys       6.2.9200.20623   1,933,544   07-Feb-2013

    If Binaries are correct on all nodes, please open a support case so we can investigate the issue further.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Montag, 18. Februar 2013 19:58
  • Mike,

    Thank you for your honesty and letting us know that MS recognizes and is working on addressing this error message.

    I checked all of those versions on all nodes and found them to be the same as the ones you mentioned. I mentioned my case number above (113020110184584). I'm not having much confidence in the support I'm receiving considering that PSS was days slower in getting back to me with the V2 fix that you posted to these forums. Even still, they only got back to me because I requested it. PSS has gone down hill these days and basically all you hear when speaking to them is how they have to go talk to premier support. As a Microsoft Partner, how are we supposed to go about getting real support. Years ago I used to get fantastic support from Microsoft. @Mike, do you have the ability to reach out to my case owner (an escalation engineer by the name of Satya) and ask if he can focus on this fix and work together with me on anything that you need.

    -Aaron


    Aaron Marks

    Dienstag, 19. Februar 2013 06:01
  • Hi,

    I've applied KB2813630 (v2) and this does seem to have resolved the memory leak issue.

    I'm still seeing issues though. As far as I can see CSV access is disrupted shortly after initiating a backup. The canary for me seems to be the Linux guest machines. I have a CentOS based MySQL server which went almost completely unresponsive - after 10 minutes being unable to log onto it I have to hit the reset button. I had another CentOS machine which crashed and rebooted itself. 

    I suspect that the Linux machines are just more sensitive to the interruption to the CSV, and the Windows guest machines are still having issues but are handling it better. Is anyone else brave / foolish enough to be running Linux guests and are you seeing similar issues?

    I have seen event ID 5120 logged against the CSV - Cluster Shared Volume 'VolumeX' is no longer available on this node because of 'STATUS_NETWORK_NAME_DELETED(c00000c9)'. All I/O will temporarily be queued until a path to the volume is reestablished. 

    I guess I now wait for v3 of the patch.

    Tim

    Dienstag, 19. Februar 2013 15:40
  • I am getting errors event ID 5120 and 5217 on a Hyper-V 2012 cluster with KB2813630-v2 installed on the hosts which appear on the CSV owner node shortly after the DPM backups start.     In this environment the hotfix is not installed on the DPM server virtual machine although the memory leak issue appears to be fixed.

    However on another separate Hyper-V 2012 cluster which the hotfix is installed on the hosts and the DPM server (VM) there are no cluster errors at all.     Mike, can you confirm if the hotfix also needs to be installed on the DPM server to resolve these issues? 

    Mittwoch, 20. Februar 2013 14:37
  • Is there any more information from MS about the ETA och the v3 (vFinal?) of this fix?

    Dienstag, 26. Februar 2013 18:38
  • I have a case open with Microsoft Premier Support and so far haven't heard an ETA of a final fix...
    Mittwoch, 27. Februar 2013 10:15
  • Hi, 

    We have a similar issue like the above examples with cluster shared volumes. - Biggest issue is storage timeouts - and when we have the storage issues Windows event counters stops writing counter info to local disk in host).

    It seems that one thing makes a difference is when flow control is enabled on the switches (or on a path between the hyper-v hosts that uses CSV) it seems to affect Windows 2012 i a really bad way (Slow live migrations, CSV timeout).

    This is just a obsevation - so I think that someone else should do some testing before everyone changes network configurations :-)

    Freitag, 1. März 2013 02:51
  • Hi all,

    I found the CSV errors event ID 5120 and 5217 occurred because the DPM server which is running on a virtual machine in the cluster was backing up itself.    Once the DPM VM was taken out of the protection group the errors stopped appearing.   Therefore the hotfix seems to have resolved all the issues but it will be interesting to find out what fixes are included in v3 of the hotfix.

    Freitag, 1. März 2013 14:59
  • Is there any ETA for a new hotfix/patch?

    I get random problems backing up a file server on our w2012 cluster.

    We have tried CSV on a SAS atached storage, Iscsi storage and SMB3 share. It seems to backup small VMs fine.(200-500gb)

    But our file server with about 11TB storage on 3 different VHDX files, it cant be backed up now.

    If on CSV in DAS och ISCSI the CSV volumes dissaperas and the cluser hungs.

    In SMB3 share, the server that has that share, dismounts the whole volume, it can have something to do with IO bottle necks.

    But its starting to get frustrating to not have backup of the server, only backup we have now is shadow copyes and a replicated server within hyperv3. =/

     

    Montag, 4. März 2013 11:13
  • Installing KB2813630 on the hosts and the DPM server, did not solve the memory leak problem for me.

     

    I am running a 3 node cluster.

    2 hosts containing all the vm's, 1 node containing the csv's.

     

    This setup was working good for small vm's. As long I was backingup one vm at a time.

    This way the node holing only the csv's has enough free memory to fill during DPM backup.

     

    After installing KB2813630 I tried to backup one vm, which has one disk of 400G. After 30 minutes the memory usage went from 8G to 120G and I had to abort the backup, otherwise the host containg the csv's crashed, causing al vm's to go down.

     

    Ramon


    It seems, I was also missing an update for the DPM Agent on the hosts. After updating this, the memory leak was solved.
    Donnerstag, 7. März 2013 08:43
  • Is there any ETA for a new hotfix/patch?

    I get random problems backing up a file server on our w2012 cluster.

    We have tried CSV on a SAS atached storage, Iscsi storage and SMB3 share. It seems to backup small VMs fine.(200-500gb)

    But our file server with about 11TB storage on 3 different VHDX files, it cant be backed up now.

    If on CSV in DAS och ISCSI the CSV volumes dissaperas and the cluser hungs.

    In SMB3 share, the server that has that share, dismounts the whole volume, it can have something to do with IO bottle necks.

    But its starting to get frustrating to not have backup of the server, only backup we have now is shadow copyes and a replicated server within hyperv3. =/

     

    Hi there, did you apply KB2813630-v2 - and did it not resolve the issue?

    Donnerstag, 7. März 2013 22:41
  • Same problem, I have applied this hotfix but unfortunately didn't solve the issue. I have also opened support case with MS but so far no answer.
    Freitag, 8. März 2013 11:34
  • Same Problem. Hotfix KB2813630-v2 on all nodes applied. This happend while backup runs: 

    Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer available on this node because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    Freitag, 8. März 2013 16:15
  • Is there any ETA for a new hotfix/patch?

    I get random problems backing up a file server on our w2012 cluster.

    We have tried CSV on a SAS atached storage, Iscsi storage and SMB3 share. It seems to backup small VMs fine.(200-500gb)

    But our file server with about 11TB storage on 3 different VHDX files, it cant be backed up now.

    If on CSV in DAS och ISCSI the CSV volumes dissaperas and the cluser hungs.

    In SMB3 share, the server that has that share, dismounts the whole volume, it can have something to do with IO bottle necks.

    But its starting to get frustrating to not have backup of the server, only backup we have now is shadow copyes and a replicated server within hyperv3. =/

    Hi there, did you apply KB2813630-v2 - and did it not resolve the issue?

    Yeah, i double checked the files and they are there =/

    Csvflt.sys     6.2.9200.20626   205,824      06-Feb-2013
    Clussvc.exe  6.2.9200.20623   7,217,152   07-Feb-2013
    Ntfs.sys       6.2.9200.20623   1,933,544   07-Feb-2013

    Freitag, 8. März 2013 21:11
  • Does this issue also occur with non clustered Hyper-V 2012 servers? we have been experiencing issues since setting up our Hyper-V servers with local storage where machines will pause during backup operations (Veeam 6.5).

    We were recommended to install KB2791729 which we obtained from Microsoft but we held off installing as it was unreleased and not yet public and were concerned about possible side effects.

    Does this KB2813630-v2 patch replace KB2791729 ?

    Thanks

    Dienstag, 12. März 2013 13:20
  • Add me to the list of having this issue.  I installed the KB2813630 fix when it came out a couple weeks ago, and the last two nights have had my cluster go down.  It worked OK until Sunday night.  I also have an MS case open on this.
    Dienstag, 12. März 2013 13:30
  • We have apply KB2813630-v2 to all our Hyper-V 2012 cluster nodes and still we are getting events 5120 and 5142 when backuping with DPM 2012.
    Dienstag, 12. März 2013 16:04
  • -Jasse-, we get same errors


    sys_admin

    Donnerstag, 14. März 2013 05:58
  • We have apply KB2813630-v2 to all our Hyper-V 2012 cluster nodes and still we are getting events 5120 and 5142 when backuping with DPM 2012.
    Has there been an update since Tuesday, are you running CU1 as well? 
    Donnerstag, 14. März 2013 21:46
  • I am also having the same issue. It is not always caused by DPM starting a backup. Today, I just went to the VMM server and tried to connect to console of a VM running on one node. That node started these same signature i/o issues. Hard boot of the node gets it back online. Come on MS. Help us out!
    Freitag, 15. März 2013 20:24
  • Similar problem is here in even Japanese Environment.

    Before applying KB2813630.

    • CSV disks of 4 hosts connected by Fibre-SAN seems no problem during backup.
    • Some guest's data disk disappeared. There is no problem about system disk. When rebooting guest machine, data disk was come back.
    • virtual DPM server's data disk which is target of backup data always disappeared, so that backup always failed.

    After applying KB2813630

    • CSV disks of 4 hosts connected by Fibre-SAN seems no problem during backup.
    • Some guest's backup was successed.
    • Some guest's state had changed to power off.

    KB2813630 changed problem better, but not a perfect solution. More hotfix is needed.

    Montag, 18. März 2013 03:14
  • I've been stalking this thread since it was started.  We have been battling this issue since DPM was in CTP1.  The latest advice we have been given is to disable TRIM which we have done in our lab and production environments (fsutil behavior set disabledeletenotify 1).  We continue to have issues after making this change.  We've had a case open on this for roughly 5 months...what a nightmare.  I'll update as we receive information.

    STATUS_CONNECTION_DISCONNECTED(c000020c)
    STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)

    Dienstag, 19. März 2013 16:20
  • I had the same issue and now I got some help from Microsoft. In my case it was a problem with ODX.

    First we tried to installing the latest hotfix for ODX (KB2796995) and rebooting the cluster nodes.

    Details regarding the ODX Hotfix :

    As per the research team this issue occurs because the copy engine incorrectly initializes regular copy chunks. Therefore, the copy engine restarts the entire copy process for the file when nonzero bytes are copied through the ODX. When the copy engine restarts, the destination file size is incorrectly set if all the following conditions are true:
    • The copy type is noncached.
    • Nonzero bytes are copied through the ODX.
    • The file size is not aligned to a sector boundary.

    But that did not do the trick for me so we disabled the ODX by changing the "FilterSupportedFeaturesMode value for the storage device that does not support ODX to 1. and rebooted the clusternodes.

    Location : HKLM\System\Current Control Set \Control \FileSystem\FilterSupportedFeaturesMode

    Now Everything works fine..


    Peo

    Mittwoch, 20. März 2013 13:02
  • Certain clusters I have setup experience no errors when the DPM backups run and others do.    They are built identically with similar hardware and I have also tried disabling ODX however it hasn't made any difference.    The VMs and CSVs remain online however events 5120 and 5217 appear on the cluster within the first minute of the DPM backups running.     CSV serialisation is not setup and the default 3 MaxAllowedParallelBackups is set on the DPM servers.

    Freitag, 22. März 2013 10:04
  • We were seeing the same on one cluster, events 5120 and 5217 being logged on the cluster within a minute or so of DPM backups running.

    Last night however, 5142 was logged repeatedly:

    Cluster Shared Volume 'Volume3' ('Cluster Disk 4') is no longer accessible from this cluster node because of error 'ERROR_TIMEOUT(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.

    A lot of the VMs that were coexisting on the volume died. There are 20 at the moment. Some become unresponsive, some blue screened, some had weird symptoms like couldn't login, Hyper-V reporting unable to connect to configuration storage etc.

    Can confirm the updated versions of Csvflt.sys, Clussvc.exe, Ntfs.sys are on each node (currently 6 node cluster, FC IBM DS3524 storage).

    There's nothing really relevant in the cluster logs, I see this for the same 5 VMs repeatedly:

    000006c0.00001abc::2013/03/22-03:13:58.915 INFO  [RCM [RES] SCVMM VMNAME embedded failure notifciation, code=0 _isEmbeddedFailure=false _embeddedFailureAction=0

    Dienstag, 26. März 2013 21:11
  • How did you determine it was a problem with ODX?

    As another TechNet user so eloquently put it: If Windows 2012 Hyper-V is supposed to be the game changer MS say it is, I don't want to play anymore.

    Dienstag, 26. März 2013 21:27
  • Add me to the list of having this issue.  I installed the KB2813630 fix when it came out a couple weeks ago, and the last two nights have had my cluster go down.  It worked OK until Sunday night.  I also have an MS case open on this.

    The temporary workaround of my case was also disabling ODX.  We still see the event 5120 messages, but haven't had the cluster go down.  Also eagerly waiting for a permanent fix.  If there is good news, I suppose, it's that MS is getting flooded with this problem (per the team lead of the support team), and therefore is a high priority.
    Mittwoch, 27. März 2013 18:36
  • That is, if we can actually suppose such a thing :( Did you restart your cluster nodes after disabling ODX? I'll do the same on our clusters, we can't afford for them to keep going down.

    Mittwoch, 27. März 2013 21:04
  •  I can confirm disabling ODX is temporary fix, although I still see 5120 events, but for now VMs remain online. Also I did restart my hosts after I applied KB2813630  
    Donnerstag, 28. März 2013 10:37
  • Microsoft - Any updates when the final fix will be available? Current status? Please keep us informed!

    Freitag, 29. März 2013 11:23
  • Last I heard was mid May :-(
    Montag, 1. April 2013 14:08
  • Im getting the same warning. But with non of the mentioned KB installed. The backup are being done, and non of the VMs are in stopped or paused state.

    Cluster Shared Volume 'DISK1' ('DISK1') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    Donnerstag, 18. April 2013 08:51
  • Also in the same boat with these errors, DPM 2012 SP1 UR2 + Windows 2012, 10 node cluster using CSV.

    We are currently migrating from 2008R2 cluster to 2012 so this is quite scary. Already had to fix 2 VM's which couldn't start.


    • Bearbeitet -DeNMaN- Freitag, 3. Mai 2013 01:02
    Dienstag, 23. April 2013 03:19
  • Im just wondering when the failures occur, does that only happen when all of the hosts in the cluster have a VM that they are hosting?

    Zarko

    Dienstag, 23. April 2013 06:33
  • Hi all

    I do have quite the same problem, with one small difference, but first the details

    2 Node Cluster with Hyper-V-Server 2012
    HP SAN connected over iSCSI
    CSV 2TB MPIO connected

    We do have for some ProjectLabs quite different Private Clouds on the Cluster, all backed up with DPM 2012 SP1.

    The time the Backup happens we also get the 5120 Event, but, only the Windows 2012 VM's are going off, Windows 2008 R2 and Linux VM's are not going off.

    I did try the Hotfix mentioned in the Thread, no luck

    Thanks,

    Tom

    Freitag, 3. Mai 2013 20:32
  • We are seeing this error too. I january only on ISCSI devices, where the hotfix worked, odx disabling. 

    Now at another cluster with Fibre Channel, we see the error again. Hotfix and ODX disable did not do the trick. 
    The only difference here is, we use dpm sp1, at the other cluster it was another backup solution (using VSS). 

    So I hope we get any news on this. 

    Thanks
    Patrick 

    Dienstag, 7. Mai 2013 02:03
  • Last I heard was mid May :-(

    Any update from Microsoft on when this patch will be releases. Is this still on for mid may as per Marcus' post?

    Dienstag, 7. Mai 2013 10:31
  • I'm also having this problem. The latest hotfix and disabling ODX did not solve anything. This is really frustrating.  I'm using serialized backups with dpm 2012 sp1. 
    Sonntag, 12. Mai 2013 17:21
  • Still have the problem.  Running a production environment on a RTM product, with no ability to back it up without crashing servers and corrupting databases.  No big deal.
    Dienstag, 14. Mai 2013 17:21
  • I agree with Pete, we were really hoping to see something on patch Tuesday.
    Dienstag, 14. Mai 2013 17:29
  • ok, its been Mid-May in Australia for 12 hours now!!! Where's my fix :)

    Marcus Krämer, where did you hear this from? Seems more and more likely this is going to be an SP1 fix.

    Mittwoch, 15. Mai 2013 01:53
  • Hi Paul,

    have a look at this Article, the Hotfix was released today and it seems to solve the Problems. I've installed the Patch already via CAU and did not receive any Errors since now.

    http://support.microsoft.com/kb/2838669

    Lets hope the MS finally got it now.

    I'll update you when i receive any Errors.


    • Bearbeitet Hummeldum Mittwoch, 15. Mai 2013 10:12 Forget to paste Link ;)
    Mittwoch, 15. Mai 2013 10:12
  • I have been following this blog for a few months now. I opened a case with Microsoft last week after our main cluster failed with the same errors mentioned above. The technician just emailed me with the following fixes that were released today. I plan to install them in the next week or two when we have a maintenance window. Here is the information:

    Virtual machine enters a paused state or goes offline when you try to create a backup of the virtual machine on a CSV volume in Windows Server 2012:

    http://support.microsoft.com/kb/2824600

    Update that improves cluster resiliency in Windows Server 2012 is available

    http://support.microsoft.com/kb/2838669/EN-US

    You cannot add VHD files to Hyper-V virtual machines in Windows Server 2012

    http://support.microsoft.com/kb/2836402/EN-US

    Windows 8 and Windows Server 2012 update rollup: May 2013

    http://support.microsoft.com/kb/2836988



    Mittwoch, 15. Mai 2013 16:37
  • Hi Micheal,

    just for your information:

    So, in my opinion, to solve the problem http://support.microsoft.com/kb/2838669/EN-US is enough.

    Mittwoch, 15. Mai 2013 17:31
  • I was encountering the 2 of the issues described in KB2838669.

    Before this KB, I was getting Failover Clustering timeout errors once a week when my DPM starts its snapshots.
    Yesterday I've installed this KB on 1 of my node, and things goes wrong : I've been encountering Failover Clustering 8 times in only 5 hours starting from the beginning of my DPM snapshots. Worst ? All my virtual machines hosted on this node crashed ( which was not the case when I had some failover clustering errors before ).

    Weirdest thing ? All my DPM snapshots were successful anyway !!

    So result of the KB ? I shouldn't have installed it :/

    I'm running my nodes on Win Srv 2012, and my DPM server is runnung DPM 2012 SP1. The only hotfix I installed before on my HyperV hosts is  kb2813630.



    • Bearbeitet tena6ous Donnerstag, 16. Mai 2013 07:28
    Donnerstag, 16. Mai 2013 07:25
  • I'm experiencing a memory leak that I think is related to this thread, but I would like some feedback on what others are experiencing.  I have a 2 node Hyper-V 2012 cluster (full install) and I'm using DPM 2012 SP1 to back it up.  On the node that owns the CSV, there is an increase in memory that seems to coincide with my backups for time and amount of data transferred.  For large backups like Exchange, this fills up the server's memory and will crash the cluster if left alone.  The memory does not become available after the backups complete.  If I change the owner node on the CSV, the memory clears up immediately and I can even move the CSV back without issue.

    There may also be a small memory leak that is not related to the backup times, but dissipates when I change the CSV owner.

    I've installed all available updates on the two host servers (including those released yesterday) as well as hotfixes KB2813630-v2 and KB2838669.  I've also disabled ODX and serialized the backups.

    I'm not seeing related errors in Failover Cluster Manager, but I'm watching the servers like a hawk and changing the CSV owner node as needed to clear up the memory leak.

    My storage device is an EqualLogic PS6100X with the latest HIT Kit (4.5) installed.

    Is this what others are experiencing?  Any thoughts?

    This thread has been very helpful and I've been following it very closely for the past week or so!  Thank you all for your input! ^_^

    Donnerstag, 16. Mai 2013 16:34
  • Hi Paul,

    have a look at this Article, the Hotfix was released today and it seems to solve the Problems. I've installed the Patch already via CAU and did not receive any Errors since now.

    http://support.microsoft.com/kb/2838669

    Lets hope the MS finally got it now.

    I'll update you when i receive any Errors.


    Hi Hummeldum,

    I tried a CAU generate update preview list and couldn't find the 2838669 update anywhere and the May CU doesn't seems to include it according to the KB...

    Did you applied the update manually?


    David

    Freitag, 17. Mai 2013 04:08
  • Hi Hummeldum,

    I tried a CAU generate update preview list and couldn't find the 2838669 update anywhere and the May CU doesn't seems to include it according to the KB...

    Did you applied the update manually?


    David


    David

    Freitag, 17. Mai 2013 04:09
  • I'm experiencing a memory leak that I think is related to this thread, but I would like some feedback on what others are experiencing.  I have a 2 node Hyper-V 2012 cluster (full install) and I'm using DPM 2012 SP1 to back it up.  On the node that owns the CSV, there is an increase in memory that seems to coincide with my backups for time and amount of data transferred.  For large backups like Exchange, this fills up the server's memory and will crash the cluster if left alone.  The memory does not become available after the backups complete.  If I change the owner node on the CSV, the memory clears up immediately and I can even move the CSV back without issue.

    There may also be a small memory leak that is not related to the backup times, but dissipates when I change the CSV owner.

    I've installed all available updates on the two host servers (including those released yesterday) as well as hotfixes KB2813630-v2 and KB2838669.  I've also disabled ODX and serialized the backups.

    I'm not seeing related errors in Failover Cluster Manager, but I'm watching the servers like a hawk and changing the CSV owner node as needed to clear up the memory leak.

    My storage device is an EqualLogic PS6100X with the latest HIT Kit (4.5) installed.

    Is this what others are experiencing?  Any thoughts?

    This thread has been very helpful and I've been following it very closely for the past week or so!  Thank you all for your input! ^_^

    We have almost the exact same setup and problem.  Windows 2012 cluster, 7 nodes running primarily SQL VM’s.  Dell Blade servers and EqualLogic PS6110XV with HIT Kit 4.5.  We have installed all the hotfixes including KB2838669, and disabled ODX as well.  A backup job triggers the memory leak, but not all the time.   Using rammap we can see the VM’s show up and never release the memory.  We will max out 256GB of memory in hours sometimes.  When I move the CSV to another node the problem follows the CSV.  My only fix is to reboot the node having the problem then move the CSV back.  We have put in 80 hours with MS so far on this.

    Using Veeam instead of DPM.

    This morning in veeam I disabled using Dell Equallogic VSS HW provider and now only using MS CSV Shadow copy.  I am going to see if that helps.



    • Bearbeitet awinstead Freitag, 17. Mai 2013 14:35
    Freitag, 17. Mai 2013 14:34
  • I'm experiencing a memory leak that I think is related to this thread, but I would like some feedback on what others are experiencing.  I have a 2 node Hyper-V 2012 cluster (full install) and I'm using DPM 2012 SP1 to back it up.  On the node that owns the CSV, there is an increase in memory that seems to coincide with my backups for time and amount of data transferred.  For large backups like Exchange, this fills up the server's memory and will crash the cluster if left alone.  The memory does not become available after the backups complete.  If I change the owner node on the CSV, the memory clears up immediately and I can even move the CSV back without issue.

    There may also be a small memory leak that is not related to the backup times, but dissipates when I change the CSV owner.

    I've installed all available updates on the two host servers (including those released yesterday) as well as hotfixes KB2813630-v2 and KB2838669.  I've also disabled ODX and serialized the backups.

    I'm not seeing related errors in Failover Cluster Manager, but I'm watching the servers like a hawk and changing the CSV owner node as needed to clear up the memory leak.

    My storage device is an EqualLogic PS6100X with the latest HIT Kit (4.5) installed.

    Is this what others are experiencing?  Any thoughts?

    This thread has been very helpful and I've been following it very closely for the past week or so!  Thank you all for your input! ^_^

    We have almost the exact same setup and problem.  Windows 2012 cluster, 7 nodes running primarily SQL VM’s.  Dell Blade servers and EqualLogic PS6110XV with HIT Kit 4.5.  We have installed all the hotfixes including KB2838669, and disabled ODX as well.  A backup job triggers the memory leak, but not all the time.   Using rammap we can see the VM’s show up and never release the memory.  We will max out 256GB of memory in hours sometimes.  When I move the CSV to another node the problem follows the CSV.  My only fix is to reboot the node having the problem then move the CSV back.  We have put in 80 hours with MS so far on this.

    Using Veeam instead of DPM.

    This morning in veeam I disabled using Dell Equallogic VSS HW provider and now only using MS CSV Shadow copy.  I am going to see if that helps.



    I'm glad to know that I'm not the only one having issues with this!  It sounds likes we're seeing slightly different symptoms, but likely from the same problem.  That might be due to the differences in scale as we are a pretty small outfit.

    In my environment, moving the CSV to a new owner frees up the memory immediately.  At first this went pretty smooth, but now the CSV often goes completely offline briefly and sometimes causes my VMs to crash in the process.  After the move, though, the memory immediately starts growing again.

    For me it's not necessarily tied to backups, but to high disk activity (SQL, Exchange, DFS). If I can keep the VM and CSV on the same node, it seems happy. The fact that I only have two nodes might explain why moving the CSV sorts things out for me.

    I haven't tried disabling the hardware VSS writer yet, but I thought about trying that. I'd love a fix that would let me use all the new features, but for now I'd be content with a workaround that would let me sleep through the night without needing to get up and check on my servers!

    In the meantime, I'll be keeping a close eye on this thread.

    Freitag, 17. Mai 2013 20:23
  • we had also quite some time invested with premier support. we got it stable some time ago.

    try these settings and see if it also works for you:

    - disable odx

    - disable trim

    - use software vss

    - disable nic optimization (offloading,vmdq, etc.)

    - use lun and host serialization on dpm

    - use dpm 2012sp1 cu2

    yeah you are back in 2005 :-) but for us it worked and it really sucks if csv's are not stable.


    stefan

    Samstag, 18. Mai 2013 19:50
  • Thank you Stefan, that gives me a few things to try that I haven't tried yet.

    I haven't opened a case with Microsoft yet, partly because I'm getting the impression that they have not yet been able to actually solve the problem (as opposed to providing workarounds).  For anyone who has worked with premier support, have they given you any indication on a timeline for these problems to be fixed?

    Also, can anyone confirm if downgrading to 2008 R2 would clear this up?  It would be a pretty big project to rebuild my guests and I'm not sure if that would be better than a stripped down 2012 or not.

    Mittwoch, 22. Mai 2013 18:09
  • I have applied KB 2838669 hotfix along with the latest Windows updates to the hosts and DPM server (for good measure) however the same error events 5120 and 5217 are occurring on the cluster when the backup first starts.   No further issues with memory leaks or virtual machines pausing however this was resolved after applying KB 2813630 when it was released.
    Donnerstag, 23. Mai 2013 12:17
  • Jumping on:

    I manage a Hyper-V cluster, 6 nodes (HP bl460 g8's connected through iSCSI to Lefthand P4800 storage) 64GB each, good validation reports, fully up to date, carrying about 60 VM's.

    I experienced the 5120/5217 "STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)" and "snapshot set id [..] failed with error 'HrError(0x0000139f)(5023)" since first working with DPM 2012 SP1 to backup the VM's. Hotfix KB2813630 did not help, and neither does the latest KB2838669. In fact, this last released hotfix "resiliency improvement" actually seems to increase the number of errors during backup. Looks like the bug is still here.

    The DPM replicas seem to be OK and the VM's do not crash nor enter unwanted (paused, incomplete) states.

    The errors also appear in our test cluster (2 nodes connected using fibre, no iSCSI)

    I'm thinking about opening a support call, unless I know this issue is recognised by Microsoft and it's still being worked on. Does anyone know, or could a Microsoft engineer keep us posted in this thread?

    Donnerstag, 23. Mai 2013 21:29
  • Hello all,

    I have also lots of issues with CSVs and also with DPM.

    First I was thinking that the CSV hung because, removed the agent and applied all existing patchs. Now CSV is stable (FC Lun zoning was wrong and only half hosts were able to contact the lun directly, others were redirecting using cluster network, but nothing pointing out that, even Test-Cluster that was showing full green success test for cluster disks). I re-install the agent and the issue come back with VM backups, but no more CSV paused.

    I opened a call to Microsoft support, asking me to apply these patchs using the LDR branch (QFE):

    http://support.microsoft.com/kb/2838669/EN-US

    http://support.microsoft.com/kb/2795944/EN-US

    http://support.microsoft.com/kb/2837407/EN-US (?).

    For installing the LDR: http://social.technet.microsoft.com/wiki/contents/articles/3323.how-to-forcibly-install-the-ldr-branch-from-a-particular-hotfix-package.aspx

    Didn't have time to apply the LDR branch yet (should have been done with CAU hotfix plugin, but actually, the file version is from GDR and not LDR).

    Edit: BTW, this is not the subject, but do you also get VMM service crashed when configuring VMM continuous protection in DPM ?
    (Set-DPMGlobalProperty -KnownVMMServers vmmserver01.sogeti. local + DPM-VMM Helper Service configuration)

    Guillaume




    • Bearbeitet Guigui38 Freitag, 24. Mai 2013 10:47
    Freitag, 24. Mai 2013 10:39
  • Correction: cluster was stable week, it just hang...
    Freitag, 24. Mai 2013 11:48
  • we had also quite some time invested with premier support. we got it stable some time ago.

    try these settings and see if it also works for you:

    - disable odx

    - disable trim

    - use software vss

    - disable nic optimization (offloading,vmdq, etc.)

    - use lun and host serialization on dpm

    - use dpm 2012sp1 cu2

    yeah you are back in 2005 :-) but for us it worked and it really sucks if csv's are not stable.


    stefan

    Stefan, would you mind telling me what hardare you have?  Specifically what type of SAN?
    Freitag, 24. Mai 2013 20:54
  • It's a bit early to say, but my testing seems to show that my memory problems may be tied to dynamic volumes.  I had major memory leaks every night when my system state backups kicked off (agent installed within VM guest) that corresponded to the amount of data being backed up.  I created fixed size volumes on my EqualLogic SAN and moved the biggest offenders over; so far I've not encountered this memory leak again.

    I do see other, slower memory leaks throughout the day on different VMs.  When I move my two dynamic volumes from one host to the other, the memory frees up immediately.  I do not seem to have this issue with VMs on the fixed volumes.

    After reading Stefan's post, I decided to read up a bit on TRIM.  That's when I got the idea that the problem could be a sort of conflict between TRIM and dynamic volumes.  I can't say for sure, but things are starting to look up for me.  If I can stabilize everything using fixed volumes, I might even be bold enough to try re-enabling ODX and non-serialized backups.

    Here's hoping my luck's changed!

    • Als Antwort vorgeschlagen JeanLouis Mittwoch, 29. Mai 2013 17:20
    • Nicht als Antwort vorgeschlagen JeanLouis Mittwoch, 29. Mai 2013 17:21
    Mittwoch, 29. Mai 2013 01:43
  • Hi All,

    I been on with MS support about this issue for roughly 2 months now, unfortunately we still haven't got to the bottom of it yet.

    This is the latest hotfix for error 5120 V-3 http://support.microsoft.com/kb/2824600?wa=wsignin1.0

    Mittwoch, 29. Mai 2013 17:29
  • I too have been fighting issues with CSV, Windows 2012 Cluster, DPM, merging snapshot trees and general performance problems for several months now. I've worked with PSS on several of the issues and applied the hot fixes as they have come out and still things aren't working as expected. I did however move every resources to a single node on the cluster and that has at least made me stable. I can now merge in the background, do DPM backups and the performance doesn't go into the toilet. Still waiting on a more robust solution but glad that I can at least sleep at night.  
    Mittwoch, 29. Mai 2013 19:39
  • Hello all,

    as I see a lot of people are following this thread sharing their own experience. This is good and I hope this helps MS solving this problem.

    I think that will be helpful to clarify that last and most updated fix to this problem is http://support.microsoft.com/kb/2838669/EN-US.

    KB2838669:

    • Includes any other previously relased hotfix for CSV backup problems (27997282801054, 2796995, 2813630, 2824600)
    • Updates all involved OS files (currently files updated are Csvflt.sys, Clussvc.exe, Csvfs.sys, Fssagent.dll, Kernelbase.dll, Ntfs.sys, Rdbss.sys, Srv2.sys, Kernelbase.dll. More details in KB article)
    • Is the only KB related to CSV included in "Recommended hotfixes and updates for Windows Server 2012-based Failover Clusters", http://support.microsoft.com/kb/2784261/en-us

     

    In my opinion the first action to solve this problem is to apply KB2838669.

    If the issue persists I suggest other two steps:

    1. If you installed a VSS HW Provider on your host force DPM agent to not use it. You can set the registry key UseSystemSoftwareProvider as described in http://support.microsoft.com/kb/2462424/en-us
    2. Enable per node and per CSV backup serialization as you were using DPM 2010. The procedure is described in http://technet.microsoft.com/en-us/library/hh757922.aspx

    Try step 1 first and evaluate results, if it solves do not serialize your backups. Parallel backup should be faster and configuring CSV serialized backup is not funny.

    Last consideration: this thread started with problems in an iSCSI environment and many people have Dell EqualLogic, still iSCSI storage. Among those who still have this problem I'd like to know how many have EqualLogic or other iSCSI solutions and how many have a FC solution. Unfortunately I can't count you.

    Alberto

    Donnerstag, 30. Mai 2013 10:53
  • Dell EqualLogic, iSCSI storage here. Installed KB2838669 and did not help. I disabled VSS HW provider a few days ago and that seems to have helped. Waiting on results from Microsoft so I can share with Dell.  Still have Trim and ODX disabled.  I will be enabling that again in a few weeks.

    My backups run much slower since disabling VSS HW provider, plus i can’t use a proxy, so I don’t use the host resources.  At least things seem to be stable!


    Donnerstag, 30. Mai 2013 14:56
  • Hello,

    Equallogic here (PS6000, FW 6.0.2 and HIT 4.5)

    I started another 'lab exercise' this morning ...

    • uninstalled the Equallogic HIT from one of my clusters.
    • Setup MPIO manually
    • deployed a new DPM and spawned 20 test VMs
    • set my DPM to allow 5 parallel Hyper-V backups
    • created 2 protection groups 10 and 13 VMs ... 3 old VMs were left on the cluster

    ---

    So far it looks good, five VMs can be backed up simultaneously

    The time it takes to complete the backup is acceptable for a 1Gb network

    I will let this setup run over the weekend and see if it is stable.

    ---

    All servers got the latest Windows updates and KB2838669 is installed on the nodes


    This posting is provided "AS IS" with no warranties.

    Donnerstag, 30. Mai 2013 15:50
  • Dell MD3620i using iSCSI here

    Thinking of reinstalling the OS on the host computers without Dell MPIO drivers myself.


    • Bearbeitet brock_paul Donnerstag, 30. Mai 2013 17:17
    Donnerstag, 30. Mai 2013 17:07
  • Hello and good morning,

    after I removed all third party software yesterday, I still get Error 5120 and Error 5217 in cluster manager.

    At the same one of the hosts shows VSS Error 12293 in application log. The other host from my two node cluster doesnt show any errors.

    The VSS Error seems to occur at the end of the backup job.


    This posting is provided "AS IS" with no warranties.

    Freitag, 31. Mai 2013 07:42
  • Hello all,

    The cluster errors 5120 and 5217 seem to occur on either fibre channel or iSCSI SAN environments.     We are using IBM DS3950 (FC+iSCSI), DS3524 (FC) and DS3000 (iSCSI) SANs on separate clusters, all experience the same issues when DPM starts the initial snapshot of the VMs.    KB 2838669 has otherwise stabilised the other issues with memory leaks and VMs pausing and CSVs going offline during backups.

    Freitag, 31. Mai 2013 15:47
  • After having my setup run over the weekend, it turns out to become even worse.

    I am getting cluster errors over and over.

    Events 1230, 1146, 1069 show up in Failover Cluster Manager.

    At the same time I get Hyper-V-VMMS Error 20102, 14202, 19050, 14070 and FailoverClustering Error 1205 on my hosts.


    This posting is provided "AS IS" with no warranties.

    Montag, 3. Juni 2013 09:13
  • Same Problem here. Latest Hotfix installed on all nodes. Odx Disabled. Error 5217 an 5120. Iscsi Storage Infortrend S12E-R2140-4. No Backups possible with this bug. Any solution?
    Montag, 3. Juni 2013 16:49
  • Just finished the MS Webinar "Meet the New Datacentre: The modern approach to IT". They discussed backups and reliability in the Private cloud and I just couldn't resist bringing up this topic up. One of the SME's responded to my questions and said it would be worthwhile looking at Update rollup 2 for DPM SP1:
    http://blogs.technet.com/b/dpm/archive/2013/04/11/update-rollup-2-for-system-center-2012-service-pack-1-dpm-updates.aspx

    He also mentioned that information on DPM 2012 R2 is being released this week at TechEd North America. http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013

    Dienstag, 4. Juni 2013 03:44
  • EqualLogic 6100X here, HIT Kit 4.5 (latest).

    As an update from my previous post, switching from thin provisioned volumes to fixed size volumes seems to have resolved my memory leak issue.  I can't say for certain that there is no longer a memory leak at all, but I'm not getting the massive memory leaks during the backups anymore.  So, more sleep for me!

    I had not been seeing the cluster errors recently that others have mentioned, but I did get error 5120's on two of my CSVs last night.  I can't say for sure if this was related to backups or not.  I had a some backups/synchronizations happening around the same time, but not for the two volumes that reported 5120.  I've also been seeing the Hyper-V VSS writer reporting a bad state; not sure if that's related.

    I have disabled ODX and serialized my backups, but I have not disabled hardware VSS or disabled TRIM.

    My systems are stable enough for now (at least making through the night), but I will continue to monitor them and this thread very closely.

    Dienstag, 4. Juni 2013 15:22
  • Hello

    I changed MaxAllowedParallelBackups to 1 and now Backup runs, but i still get the error Unexpected failure. Error code: 48F@01000003 when the backup runs. Anybody knows what Unexpected failure. Error code: 48F@01000003 means? 

    Thanks Felix

    Mittwoch, 12. Juni 2013 11:51
  • Microsoft has released a new bugfix which is reported to solve a lot of problems. It is a fix for KB 2813630 which redirects you to this article (KB 2838669). The hotfix that corresponds to KB 2813630 has been superseded with the hotfix KB 2838669, which contains all fixes that were previously included in KB 2813630. 

    I lost count of all the fixes of fixes but believe this is the third fix for the same problem. 

    http://support.microsoft.com/kb/2838669/en-us

    This article introduces an update for cluster resiliency improvements in Windows Server 2012. After you install this update, the following issues that you experience are resolved:

    Issue 1
    Consider the following scnario:
    • You have the Hyper-V server role installed on a Windows Server 2012-based file server.
    • You have lots of virtual machines on a Server Message Block (SMB) share.
    • Virtual hard disks are attached to an iSCSI controller.
    In this scenario, you cannot access to the iSCSI controller.

    Issue 2
    Consider the following scenario:
    • You have a two-node failover cluster that is running Windows Server 2012.
    • The cluster is partitioned.
    • There is a Cluster Shared Volume (CSV) on a cluster node, and a quorum resource on the other cluster node.
    In this scenario, the cluster becomes unavailable.

    Note This issue can be temporarily resolved by restarting the cluster.

    Issue 3
    Assume that you set up an SMB connection between two Windows Server 2012-based computers. The hardware on the computers do not support Offloaded Data Transfer (ODX). In this situation, the SMB session is closed unexpectedly.

    Issue 4
    Consider the following scenario:
    • You have a Windows Server 2012-based failover cluster.
    • You have a virtual machine on a CSV volume on the cluster.
    • You try to create a snapshot for the virtual machine. However, the snapshot creation is detected as stuck. Therefore, the snapshot set is aborted.
    • During the abortion process of the snapshot, the CSV volume is deleted after the snapshot shares are deleted.
    In this scenario, the abortion process is paused automatically because of an error that occurs on the cluster.

    Issue 5
    Assume that you have a Windows Server 2012-based failover cluster. Two specific snapshot state change requests are sent from disk control manager to CSV proxy file system (CSVFS). The requests are present in the same message. In this situation, disk control manager is out-of-sync with CSVFS.

    Issue 6
    Assume that you create a snapshot for a CSV volume on a Windows Server 2012-based failover cluster. When the snapshot creation is still in progress, another snapshot creation is requested on the same CSV volume. In this situation, the snapshot creation fails and all later snapshot creation attempts on the CSV volume fail.

    Donnerstag, 13. Juni 2013 13:00
  • I installed this hotfix 2 weeks ago and it did not resolve my issues. I may have something else going on but have discovered that I'm mostly stable if I have all VM's on the same node of the cluster. VSS snapshots are still very intrusive to the cluster and can't be done while the VM's are being used or I get lots of calls about all the VM's on the cluster being slow.
    Donnerstag, 13. Juni 2013 14:26
  • Paul what are your outstanding issues?
    Montag, 17. Juni 2013 20:27
  • I have several things happening that seem to be related. The CSV has generally poor performance when at load. This is specifically noticeable when doing a VSS backup using DPM 2012 SP1. When we fire off a VSS job to backup a VM from the cluster the other VM's become sluggish and from the VM perspective the disk in the VM's that are not being backed up go to 100% utilization. This is so bad that VM's can not be backed up during business hours and services sometimes fail. When VSS backups of the VM's are done off hours often the VSS writer crashes on the Hyper-V host and sends the VM backups into a state of inconsistent. This seems more stable when I have all the VM's on one host. When I move the VM's to separate hosts on the cluster the problem gets worse.

    Montag, 17. Juni 2013 21:23
  • Yikes that seems bad. There were some suggestions previously:

    • Disabling ODX
    • Use software VSS
    • Disable NIC optimiazation (offloading, vmdq, etc)
    • Lun + Host serialization in DPM
    • Upgrade to DPM 2012 SP1 CU2

    Which of these are relevant in your scenario? Also, there's a more recent patch available - KB2848344 released a couple of days ago which supersedes the previous patch 2838669.

    On one of our clusters we have KB2790831 installed and ODX disabled, and things are somewhat stable (compared to before when the entire CSV would drop off and all the VMs crash every single time we run a backup). Still not 100% stable, we are looking to deploy KB2848344 + DPM2012SP1CU2 (currently CU1).

    Montag, 17. Juni 2013 22:53
  • Hi

    I have all this problems and i have tried out evrything that is sugested in this discussions. 

    It is ekstremly frustrating that Microsoft is not fixing this problems and reading about SP2 and new features.

    Microsoft should focus on get this up and running and all attention on SP2 should be on fixing this kind of error insted of new features. 

    Ater my opinion server 2012 and Hyper-v cluster is not even close to be a beta release. 

    Rigth now i see this error from cluster event:

    Cluster Shared Volume 'Volume1' ('Cluster Disk 2') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    This error on the hosts:

    Error 1:

    Volume Shadow Copy Service error: Error calling a routine on a Shadow Copy Provider {400a2ff4-5eb1-44b0-8a05-1fcac0bcf9ff}. Routine details PreCommitSnapshots({bd31f7f5-0465-45da-b6aa-2c62382e723e}) [hr = 0x8007139f, The group or resource is not in the correct state to perform the requested operation.

    ].

    Operation:

       Executing Asynchronous Operation

    Context:

       Current State: DoSnapshotSet

     

    A VSS writer has rejected an event with error 0x800423f3, The writer experienced a transient error.  If the backup process is retried,

    the error may not reoccur.

    . Changes that the writer made to the writer components while handling the event will not be available to the requester. Check the event log for related events from the application hosting the VSS writer.

    Error 2:

    Operation:

       PostSnapshot Event

    Context:

       Execution Context: Writer

       Writer Class Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}

       Writer Name: Microsoft Hyper-V VSS Writer

       Writer Instance ID: {532c5135-ad6e-4ddf-a503-2cd0051039f0}

       Command Line: C:\Windows\system32\vmms.exe

       Process ID: 2904

    Error 3:

    Software snapshot creation on Cluster Shared Volume(s) ('\\?\Volume{a624accb-c275-48ff-b5dd-9a426dd69586}\') with snapshot set id '{276e7ca1-607b-4d42-831d-bc62e05663ba}' failed with error 'HrError(0x0000139f)(5023)'. Please check the state of the CSV resources and the system events of the resource owner nodes.

     

    Error 4:

    Cluster Shared Volume 'Volume1' ('Cluster Disk 2') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

     

    Error 5:

    Unexpected failure. Error code: 48F@01000003

    Mittwoch, 19. Juni 2013 11:18
  • 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)

    Looks like you're experiencing another not well-known Auto_Pause_Error issue... try to install hotfixes below:

    - Update that improves cluster resiliency in Windows Server 2012 is available

    - Virtual machine enters a paused state or goes offline when you try to create a backup of the virtual machine on a CSV volume in Windows Server 2012

    - Windows Server 2012 update rollup: May 2013

    Second update purpose to resolve some ODX issues

    Ensure that you're using backup with host/LUN serialization

    Try to disable delete notifications on a volumes:
    fsutil behavior set disabledeletenotify 1

    If problem still persist after hotfixes installation - try to disable ODX on the hosts manuallt:
    Set-ItemProperty hklm:\system\currentcontrolset\control\filesystem -Name "FilterSupportedFeaturesMode" -Value 1

    Mittwoch, 19. Juni 2013 11:27
  • Hi and thanks for response.

    As i wrote in my post i have tried everyting.

    All servers are patches to the latest patch, all the updates you suggest here are installed. ODX is disabled as you suggested, but still same problem.

    From time to time backup is running fine and at speed around 1 GBps, even when this failures are happening all servers and SAN has nothing to do, CPU utilisation on host is under 2%, memory usage is low and SAN utilisation is under 1 %.

    Robert

    Mittwoch, 19. Juni 2013 11:53
  • Robert,

    We both seem to be having similar issues. Sometimes I can get the backups to work when I put all the VM's on a single node of the cluster. The host computers and the SAN don't seem to be working that hard but the VM's all seem to slow way down (their disks show 100% utilization inside the VM) when I'm doing a backup. I'm opening a new case with Microsoft and I'll see where this one goes.

    Mittwoch, 19. Juni 2013 14:47
  • Robert, I too see these same errors as error 3 and 4 appearing on the failover cluster, EventID 5217 and EventID 5120. We have 6-10 node clusters with 30-40 VMs per CSV.

    What is the result of these errors, are backups failing and/or VMS crashing?  Also some of those patches above are only a few days old - have you applied the most recent?

    Mittwoch, 19. Juni 2013 20:46
  • Today I created a new LUN and CSV from our existing storage unit. I moved the storage of 12 of our VM's to that new CSV and everything seems to be working as expected with no more hanging. I'm not sure what the problem is with the old CSV but I'm going to continue to move off of the CSV that seems to be having problems.
    Donnerstag, 20. Juni 2013 02:35
  • Today I created a new LUN and CSV from our existing storage unit. I moved the storage of 12 of our VM's to that new CSV and everything seems to be working as expected with no more hanging. I'm not sure what the problem is with the old CSV but I'm going to continue to move off of the CSV that seems to be having problems.
    Was your old CSV's created in 2008 r2? If so that's the same for us. Maybe there is something there that is causing the problems? 
    Freitag, 21. Juni 2013 06:45
  • For us we had 2 MD3620i's in geographically diverse locations for DR purposes in our 2008 R2 environment. Our VM's were split between those storage units with the thought that at lease we would have half of them up and running in a catastrophe. With 2012 and replication we rolled all of them onto a single LUN and leveraged replication to the remote location. I think this is where our troubles started. When I separated the VM's to different LUN's in our 2012 environment then things got a lot better. I suspect that we were pushing the limits of the MD3620i and the 4TB LUN when everything was together. With that said we did accidently purchase the legacy hardware snapshot option with our storage units. I recently configured the hardware snapshot instead of using the software VSS writer from Microsoft and that has solved our problems for sure. No more slowness, no more problems.
    Montag, 24. Juni 2013 18:13
  • For us we had 2 MD3620i's in geographically diverse locations for DR purposes in our 2008 R2 environment. Our VM's were split between those storage units with the thought that at lease we would have half of them up and running in a catastrophe. With 2012 and replication we rolled all of them onto a single LUN and leveraged replication to the remote location. I think this is where our troubles started. When I separated the VM's to different LUN's in our 2012 environment then things got a lot better. I suspect that we were pushing the limits of the MD3620i and the 4TB LUN when everything was together. With that said we did accidently purchase the legacy hardware snapshot option with our storage units. I recently configured the hardware snapshot instead of using the software VSS writer from Microsoft and that has solved our problems for sure. No more slowness, no more problems.

    Can someone else confirm that when you use a Hardware VSS Provider (e.g. Starwind Hardware VSS) that the Problem is solved? (including latest Hotfixes / Updates?)

    Greetings,

    Edge

    Montag, 1. Juli 2013 05:50
  • I think a lot of us with this issue, myself included, are using Hardware VSS Providers although there seems to be a mix of both.

    I can say that our CSV volumes containing VHDs contributing to the problem (according to RAMMap output) are on MBR disks and were migrated from Server 2008 R2. Maybe if we get enough information together we can find a correlation.

    Also, is anyone else finding this thread nearly impossible to reply to due to the script running out of control? It happens on multiple browsers/OSes for me.

    Mittwoch, 3. Juli 2013 20:11
  • I have got an Equallogic and two clusters running stable for about 2 weeks.

    I still get some 5120 and 5217. After the latest hotfixes the number of errors have droped to maybe 5-10 per week. I haven't had the time to dig further into it but I suspect my virtual file server cluster to cause the problems - the VMs were migrated from 2008R2 Hyper-V cluster to 2012 Hyper-V cluster and have been causing trouble since day one. Maybe I'll just dump and recreate the virtual cluster when I have got the time.

    My setup is running without hardware provider a.t.m. I am only using the DSM, PowerShell Module and SMP from the EQL HIT. DPM ist setup to do 5 parallel backups.


    This posting is provided >AS IS< with no warranties.


    • Bearbeitet Dark Grant Donnerstag, 11. Juli 2013 05:19 typo
    Donnerstag, 11. Juli 2013 05:19
  • Hello all,

    last hotfix of the "backup VM on CSV" saga is http://support.microsoft.com/kb/2870270/en-us - Update that improves cloud service provider resiliency in Windows Server 2012. It supersedes KB2848344 and any previously released (KB2838669, KB2813630, KB2790728, etc.) on this issue.

    I suggest also http://support.microsoft.com/kb/2869923/en-us - Physical Disk resource move during the backup of a Cluster Shared Volume (CSV) may cause resource outage, strictly related to the same topic.

    For anyone who is experiencig 5120 and 5217 have a look at this post: http://social.technet.microsoft.com/Forums/windowsserver/en-US/223eb499-53cd-4590-980a-4078d0b52bd3/statusclustercsvautopauseerror-not-fixed-with-kb2848344.As you can see the MSFT guy says:

    Seeing an Event 5120 with an error code of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR may be expected and can be safely ignored in most situations.  It basically means that clustering knew of a software snapshot, but the software snapshot was deleted.  So now clustering is resynchronizing it’s state on the view of the snapshots. …

    So in general, you should only be worried if you see lots of 5120’s with an error code of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR.  That is a sign that clustering is in need of constantly resync’ing it’s state for the snapshots.

    • Als Antwort vorgeschlagen arndawg Donnerstag, 8. August 2013 12:25
    Dienstag, 16. Juli 2013 08:31
  • Solved it for me at least. I sometimes do get the 5120 and 5217' events but apparently it is to be expected.
    Donnerstag, 8. August 2013 12:25
  • As much and I would like to say that the hardware provider has solved my problems it has not. I have been keeping up to date with the hotfixes and patches, have a hardware provider in place separated the iSCSI traffic on it's own subnet and lots more. So far my cluster is stable until the DPM backups fire off. When that happens if I have VM's spread across more than one host then the VM's slow way down and the cluster eventually becomes so unstable I have been forced to unplug the hosts. I am at a loss as to what the problem really is but it seems related to backend storage connectivity. I have engaged Microsoft with this problem in the past and they keep saying that everything looks great. It might be time to engage them again or simply rebuild my hosts.
    Montag, 19. August 2013 16:19
  • We continue to have problems as well. Currently Dell and Microsoft have a case open. They have multiple dell customers with the same problem.

    Dell EqualLogic VSS HW Provider issue:

    -          Microsoft requested we perform additional tests with MPIO disabled as they suspected it was playing a role

    -          Testing was completed and results provided

    -          The Microsoft Engineer plans to start analyzing the results today

    -          Next steps will be based off of the analysis but we suspect they will supply us with additional instructions for enabling deeper logging to help debug this issue

    Freitag, 23. August 2013 21:00
  • Hey all,

    is there update on this issue?

    I am expecting the same issue.

    Two Cisco UCS Chassis with DELL Equallogic Storage, 10 GbE iSCSI attached. Servers running W2K12 with actual updates. VMM has UR3 installed. Suggested Hotfixes related to Hyper-V and Failovercluster installed. DELL HIT KIT 4.6 installed. DPM also running with actual updates and Hotfixes.

    But still I see that some VMs are going into Saved State or crashing, because the CSV goes offline. Aslo there are VSS 8194 errors on host, but regarding to my internet research, they can safely be ignored (?).

    DPM reports Backup was successful.

    VMs are working when they are rebootet.

    Since some of the affected machines are SQL Server 2012 and Sharepoint servers (2013) this issue is affecting production because I never know when exactly this issue occurs.

    How to resolve this issue?

    Thanks to all in advance!

    Mittwoch, 11. September 2013 13:28
  • Awinstead,

    Did you get any feedback from our friends at Microsoft. We are certainly still feeling this pain.

    Mittwoch, 11. September 2013 16:40
  • We continue to have problems as well. Currently Dell and Microsoft have a case open. They have multiple dell customers with the same problem.

    Dell EqualLogic VSS HW Provider issue:

    -          Microsoft requested we perform additional tests with MPIO disabled as they suspected it was playing a role

    -          Testing was completed and results provided

    -          The Microsoft Engineer plans to start analyzing the results today

    -          Next steps will be based off of the analysis but we suspect they will supply us with additional instructions for enabling deeper logging to help debug this issue

    Did you get any updates from MS/Dell ?

    We are still having the same issues.

    Thanks!

    Donnerstag, 17. Oktober 2013 02:00
  • Anyone try Server 2012 R2 and DPM 2012 R2 yet to see if they may fix this problem? We have experienced all the issues in this thread with EqualLogic SAN and 2012 cluster. High level Dell and MS support tickets have not yielded any solution so far.
    Dienstag, 22. Oktober 2013 17:28
  • Hello,

    we are using a Windows Server 2012 Hyper-V Failovercluster on a Dell PowerEdge R810 with Dell EQL PS4110X and PS4110X since two Months. We had a similar connection timeout with the CSV Group when backing up all VMs with BackupExec 2012. The five VMs gone offline. 

    Freitag, 25. Oktober 2013 11:39
  • We just had a cluster explosion over the weekend that looks suspiciously similar. Our cluster hosts are still 2012 R1, but I upgraded our DPM environment to 2012 R2 prior to this happening.

    Since installing all 17 hotfixes, or however many there are by now, things have been somewhat better, but still having issues.

    HP EVA4400, software VSS providers, 4 node cluster fairly lightly loaded.

    Dienstag, 5. November 2013 19:53
  • i have 2 clusters, a 10 node 2012sp1 cluster on a eva 4400 and a 6 node 2012r2 cluster on a 3par 7200 both being backed up by dpm 2012r2 and i get the same errors and reboots on both clusters. so its not fixed in 2012r2.

    Ill try 3par hardware vss providers when i get my hands on it.

    Sonntag, 10. November 2013 22:59
  • we have KB2870270 installed on our 2012 cluster hosts, trying to protect with dpm 2012 R2 with software VSS. We have ODX, TRIM settings as default, as i believe the KB2870270 suppose to fix the issues related to them.

    I added few VM's to the backup and the initial replica creation went fine, but strangely after few hours even when there are no backup jobs running we start getting 5120('STATUS_IO_TIMEOUT(c00000b5), 1230,1146, 5142 (ERROR_TIMEOUT(1460) at which point the node looses the CSV.

    Montag, 25. November 2013 21:52
  • I have a Dell PS4000XV and 2x R710s in a failover cluster and I use DPM 2012 R2 to backup the VMs serially on the PS4000 from the cluster nodes. I recently migrated the cluster from 2008 R2 to 2012 R2 by copying the cluster roles and then detaching/attaching the iSCSI volumes.  Once the migration to 2012 R2 was complete I tried to backup a migrated VM with DPM and found that at the later stage of the VM backup the system hung while it read/write what seemed to be the entire VM size of data and then things went back to normal. I tested this about 5-6 times more using both nodes and the same issue occurred consistently. Only once did a VM timeout and crash due to the hang, but during every backup the clients timed out connecting to the services on the VMs with the storage on the node being backed up.

    As a test I have created a new CSV on the PS4000, moved some of the VM storage to it and the backups are completing without issue. I am going to recreate the remaining CSVs and move the VMs.

    Dienstag, 26. November 2013 15:23
  • Hi being reading this thread with interest : i have a 10 node 2012R2 Hyper-v 3.0 cluster (fully patched)

    With 2012 R2 Storage spaces backend, SMB3.0

    With Fibre attached san Storage.

    DPM2012 R2 , fully patched.

    and i still occasionally get this errors on the Storage spaces Cluster.

    Paused State because of '(c0130021)' all I/O will be temporarily be queued.

    This seems crazy, as i am running all the latest agents and versions of code. I will look to raise a case with premier support. and report back any findings.

    regards

    Mark

    Dienstag, 10. Dezember 2013 13:53
  • This problem seems to have flared up again for me over the last few weeks, can't put my finger on what has changed, it did seem to resolve itself for a while after the May 2013 hotfix.

    Anyway - there is another hotfix which might be relevant to some people although it seems to tackle a specific issue where the guest VM crashes at backup time if it has many snapshots.

    http://support.microsoft.com/kb/2908415/en-us


    • Bearbeitet TimBoothby Donnerstag, 12. Dezember 2013 13:32
    Donnerstag, 12. Dezember 2013 13:11
  • There is yet another hotfix for CSV backup issues which got sneaked out over Xmas - http://support.microsoft.com/kb/2878635

    This article introduces an update that improves the resiliency of the cloud service provider in Windows Server 2012. This update is dated December 2013.

    This update replaces update 2870270, which is used to improve resiliency. Also, this update includes update 2869923 and update 2908415. Additionally, the update resolves several issues that occur in the following scenario: 
    • You have a Hyper-V failover cluster.
    • The Hyper-V resources are saved in .vhd files on Cluster Shared Volumes File System (CSVFS) volumes.
    • You use a backup solution. For example, you use System Center Data Protection Manager (DPM) in the Hyper-V environment.
    • You try to perform a backup, and a snapshot is taken of the CSVFS volume.
    • The current active node encounters an error, and the cluster fails over to another node.
    • DPM may start a consistency check on the volume unexpectedly.

    I still seem to be having problems so will give this a try.

    Freitag, 3. Januar 2014 15:44
  • Hi

    Be aware that backing up a replica is NOT supported:

    http://blogs.technet.com/b/dpm/archive/2012/08/27/important-note-on-dpm-2012-and-the-windows-server-2012-hyper-v-replica-role.aspx

    The important thing to note about this is that while the DPM agent can be installed on both servers with no issues and you can backup the Primary DPM server as usual with no problems, on the Hyper-V Replica server you can enumerate the virtual machines and “may” even be able to back them up successfully, however backing up or restoring the Hyper-V replica is not supported.

    While the Patches mentioned in this article tipically solve any issue:

    http://support.microsoft.com/kb/2784261/EN-US

    http://support.microsoft.com/kb/2878635

    http://support.microsoft.com/kb/2813630

    Hope this helps


    Christian Parmigiani

    Montag, 27. Januar 2014 08:41
  • This issue has cropped up for me again as well.  Thought I had it sorted middle of last year.

    I noticed that ODX was enabled again on our HyperV servers.  Disabling this had resolved the issue previously, so I have disabled again and installed the latest hotfix.

    Not sure how ODX could have been re-enabled - Only things I can think of are either via a Windows Update, or when installing the DPM 2012 R2 agent.

    Will need to monitor the backups for a week or so before I am confident that it is resolved again.

    Montag, 10. Februar 2014 01:40
  • Are you running FEP on your Hyper-V hosts?  We have found that FEP causes Hyper-V issues if the DPMRA.exe agent is not excluded.
    Dienstag, 24. Juni 2014 20:08
  • Digging up an old thread here, but does anyone know if these issues were fixed with Server 2012 R2 and DPM 2012 R2? I had the same issues with Server 2012 and DPM 2012 R2. Raised a case with Pro Support, installed a load of hotfixes, disabled ODX, removed hardware VSS but still had the issue. Ended up scrapping the VHD backups in the end as it was proving a nightmare to manage, and resulting in a lot of sleepless nights!
    Montag, 12. Januar 2015 17:03
  • Hi Tim,

    Did you manage to get any kind of solution for this ?

    Regards,

    Pankaj Singh

    Dienstag, 30. Juni 2015 16:20
  • Hi Mike,

    Did you manage to get any kind of solution for this ?

    Regards,

    Pankaj Singh

    Dienstag, 30. Juni 2015 16:22
  • Hi Mark,

    Did you manage to get any kind of solution for this ?

    Regards,
    Pankaj Singh

    Dienstag, 30. Juni 2015 16:25
  • Hey, colleagues! Any news here?
    The same problem with 12289 16010 15090 evendt ids.

    Cluster, Win 2012R2, all updates, DPM 2012R2, all updates, No ODX, Just scrapped out Dell Hardware provider.

    Freitag, 6. Mai 2016 14:20