none
DPM 2012 SP1 Beta - Causing Server 2012 Hyper-V Cluster hang / ISCSI problems

    Question

  • Hi All,

    First of all, I know it's a beta and these are the perils of being an early adopter, but I've got a serious problem.

    I've upgraded our production Hyper-V cluster to Server 2012. The setup is a 4 node cluster running CSVs on an ISCSI SAN with MPIO via dual gigabit Ethernet networks. The SAN storage is provided by Open-E DSS7 and replicated to another server in a different building.

    Post the upgrade everything about the cluster seemed stable and to work as expected - live migrations etc all working. I then turned my attention to backups, and I discovered that Server 2012 wasn't supported by DPM. Fortunately there is a beta of DPM 2012 SP1 which adds support for Server 2012, unfortunately there is no upgrade path from the beta to RTM of SP1. Not wanting to upgrade our production DPM server to a beta, I installed a copy of DPM 2012 SP1 beta on a VM to provide a stopgap backup solution for VM level backups of certain machines that couldn't be backed up in other ways. I realise that running the backup server on the same cluster / SAN as the stuff that's being backed up is an odd thing to do, but this at least serves to provide snapshots, SAN replication provides resilience, and like I say, this is a stopgap.

    Then I started noticing problems. First symptom was that on starting / rebooting VMs, sometimes other VMs would hang for perhaps 30s - 2m, people would start complaining that SharePoint had gone unresponsive etc. However, they would come back to life in a minute or two.On a couple of occasions we came in in the morning to find a number of VMs off or paused (backups ran overnight). Both of these problems occurred only when the DPM server was turned on. I thought the issue might be general load on the SAN, having both the backup server and the machines being backed up living on the same CSV / hardware. I moved the DPM server to a different ISCSI box and put on aggressive throttling (200Mbps) to try to reduce load, but the problem continues.

    The event logs on the Hyper-V cluster suggest I/O timeouts to the SAN at the times of the backups. Lot's of event ID 1069, 1205, 1146, 1230,  (various cluster resources failed). The interesting one I think is 5120 Cluster Shared Volume 'Volume5' ('VOLUME NAME') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    Is anyone else using SP1 beta to successfully backup a 2012 Hyper-V cluster? 

    Is anyone seeing the same problem?

    Is it likely that this is a problem with SP1 beta, will it be fixed at RTM?

    Any suggestions for a stopgap solution?

    I think I might try setting up a test physical DPM server to check the issue isn't in someway related to the fact that the DPM server sits on the same cluster it's backing up. I'm also happy to consider the problem could lie elsewhere i.e. with the SAN storage (this was upgraded from v6 to v7 at the same time as the 2012 upgrade, but as soon as I tell the vendor that the problem relates to running a beta of DPM they will be pointing fingers at that.

    Thanks,

    Tim

    Thursday, November 22, 2012 1:02 PM

All replies

  • Hi,

    Other customer running Windows 2012 hyper-V cluster and DPM 2012 SP1 have reported similar problems, however it has been determined that any backup product using shadow copies results in the same errors, this isn't a DPM Sp1 issue.  The Windows 2012 cluster CSV team is investigating this problem, hopefully they will find the cause and have a fix available before SP1 is released.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Friday, November 23, 2012 4:08 AM
  • Hi Mike,

    Thanks for the information. Interesting to hear others are having problems, I had searched for quite a while but couldn't find anyone reporting exactly the same issue in the forums. 

    Is there any way I can be notified when a fix is available? Is this likely to come out in a patch Tuesday update, or will it be a hotfix? Often the patch Tuesday updates are fairly vague - just stating "Various performance and stability fixes" which makes it tricky to know if a specific issue has been addressed yet.

    I have moved the DPM server to a standalone Hyper-V host with local storage - so it is now completely independent of the systems it's backing up. So going to try running this over night tonight to see if this helps.

    Thanks,

    Tim

    Monday, November 26, 2012 11:56 AM
  • Hi,

    The investigation from the windows team is still ongoing so until they identify the problem and code up a fix I cannot say how will be made available. I'm keeping my eyes this issue so I can update this post with the final outcome.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Monday, November 26, 2012 2:10 PM
  • That would be most appreciated Mike, thanks very much.
    • Proposed as answer by cciuleanu Wednesday, February 06, 2013 12:29 PM
    Monday, November 26, 2012 2:21 PM
  • I'm encountering the same issue on my side, except that I'm using clustered Storage Spaces. I'm also noticing that VMs stop responding while the backups get up to a certain point in the backup process. I've tried to use backup serialization in DPM, it worked one day and after that failed. At first I thought it wasn't using the CSV VSS provider but after reviewing the DPM logs, it looks ok on that side.  I saw that for some reason Windows tries to cache the VHDX in memory while the backup is running. I was able to see this using RAMMap from Sysinternals. This caused problem for larger VMs (150GB+). When that was happening the hosts became more and more unresponsive and the throughput dropped dramatically when using Veeam (~1.5MB/s) In one instance, I saw one of the host BSOD (not sure yet if this is related). We have a case opened with Microsoft for this as well.

    Needless to say that this is a major pain in our migration from VMware to Hyper-V!

    Wednesday, November 28, 2012 6:58 PM
  • I'm replying to this thread because

    a) It's one of the only two threads on the whole internet that mentions "'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR"
    b) I'm getting the same symptoms.

    I can confirm that one of my nodes in my Hyper-V 2012 cluster recently experienced the following event:

    Log Name: System
    Source: Microsoft-Windows-FailoverClustering
    Event ID: 5120
    Logged: 02/12/2012 18:01:30

    Details: Cluster Shared Volume 'Volume2' ('ClusterStorage Volume 2') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    I can also confirm that I am using DPM 2012 SP1 Beta to back up this cluster.  I have been running this environment for quite some time now, and I can confirm that I've received 15 of these kinds of events (14 of which I was completely oblivious to).  What prompted me to do research this time is that I discovered that 3 of my virtual machines were in a paused state and were not available.  My other node (two node cluster) has had 5 of these events.

    As this is already in the hands of Microsoft I won't log a call but will follow this thread.  If there is any further information I can provide please ask.

    Oh yes, my primary storage is Fiber Channel, so it's not an iSCSI problem.
    Monday, December 03, 2012 8:11 AM
  • I also have this issue.  I am running a 2012 cluster on SMB3.0 storage (Microsoft scale-out file server cluster). My 2012 virtual servers usually shut down or go into a paused state. My 2008R2 servers usually hang for an extended period of time, and then become available again. Resolution on this issue would be much appreciated.

    Monday, December 03, 2012 3:56 PM
  • I've found a workaround for this I thought I'd share. It's a little convoluted but if like me not having your servers backed up was giving you sleepless nights, it might be worth it.

    Server 2012 introduces Hyper-V Replica allowing you to push an offline copy of your VMs to a remote server / site for DR purposes. This works from cluster to standalone. It's pretty simple to set up. You need a server with Hyper-V role installed to host the replicas. 

    Once your replicas are set up you can use DPM to backup the replicas. The replica VMs are turned off normally anyway so if backups do cause brief disk glitches it isn't going to interrupt any important services. My guess is this is a cluster related issue anyhow, so having the replicas on a standalone machine removes that issue. 

    HV Replica does allow hourly snapshots of the replicas, but it seems that it's not possible to change the frequency of these, so this isn't an efficient way of providing a decent retention time. For some reason, when I tried it DPM would only see the replicas to backup if snapshots were turned off.

    I've only set this up today, so can't comment on the long term reliability, but so far so good.

    Tim


    • Edited by TimBoothby Friday, December 07, 2012 12:38 PM Typo
    Friday, December 07, 2012 12:36 PM
  • I've found a workaround for this I thought I'd share. It's a little convoluted but if like me not having your servers backed up was giving you sleepless nights, it might be worth it.

    Server 2012 introduces Hyper-V Replica allowing you to push an offline copy of your VMs to a remote server / site for DR purposes. This works from cluster to standalone. It's pretty simple to set up. You need a server with Hyper-V role installed to host the replicas. 

    Once your replicas are set up you can use DPM to backup the replicas. The replica VMs are turned off normally anyway so if backups do cause brief disk glitches it isn't going to interrupt any important services. My guess is this is a cluster related issue anyhow, so having the replicas on a standalone machine removes that issue. 

    HV Replica does allow hourly snapshots of the replicas, but it seems that it's not possible to change the frequency of these, so this isn't an efficient way of providing a decent retention time. For some reason, when I tried it DPM would only see the replicas to backup if snapshots were turned off.

    I've only set this up today, so can't comment on the long term reliability, but so far so good.

    Tim


     Hi Tim

    It is sadly not supported to backup the replicas

     ref : http://blogs.technet.com/b/dpm/archive/2012/08/27/important-note-on-dpm-2012-and-the-windows-server-2012-hyper-v-replica-role.aspx


    my blog is at http://flemmingriis.com , let me know if you found the post or blog helpfull or leaves room for improvement

    Sunday, December 09, 2012 4:13 PM
  • Hi Flemming,

    Thanks for the warning. The backups appear to be running successfully, but I've not attempted to restore one. Reading that blog article it sounds like the replication process could be modifying the VHDs at the same time as DPM backing them up, so presumably they could be in a messed up state.

    Back to the original plan of cross fingers and wait then :-(

    Tim

    Tuesday, December 11, 2012 5:24 PM
  • Hi all,

    As the System Center DPM SP1 RTM is released as per this link :

    http://social.technet.microsoft.com/Forums/en-US/dataprotectionmanager/thread/35a8d34e-1891-47c4-b1f8-64f4511c14ee/

    Does anyone knows if the problem described in this post is fixed?


    MCSE, MCTS, VCP, AIS, MCITP

    Monday, December 24, 2012 11:07 PM
  • Hi,

    Some Windows 2012 code defects have been identified and the Windows team is hard at work getting them fixed, tested and eventually released. There is no eta at this time, but I do know they are a high priority, so just need to let them finish them up and get posted.

    Thanks in advance for your continued patience.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Tuesday, December 25, 2012 12:31 AM
  • Thank you Mike,

    We are looking forward for this stable version. We have already deployed to 2 clients hyperv cluster with windows 2012, and at the moment only direct guest backup they perform. And we should configure through hypervisor backup to accept the projects :)


    MCSE, MCTS, VCP, AIS, MCITP

    Tuesday, December 25, 2012 7:27 PM
  • The problem is not fixed. I have deployed DPM 2012 with SP1 and this still causes the issue seen here.

    Sounds like this is a Windows patch that Microsoft is working on. I also have deployed 2012 in production and am only doing guest-based backups now because if I do host-based it brings down most of the VMs on the cluster

    Thursday, December 27, 2012 3:04 AM
  • Glad Mike pointed this thread out to me before I did my cluster migration.  I'll be watching for updates. 
    Friday, December 28, 2012 3:16 PM
  • I'm having the same issue on one of my 2 clusters.

    Cluster1, 3x HP DL360 G7's connected to a Compellent SAN via 4GB FC & McData 4700 switches

    Cluster2, 2x Dell R620's directly connected to a Dell MD3620f via 8GB FC

    Both clusters run the same exact Windows Server 2012 Datacenter, imaged the same way at roughly the same time.  Cluster1 does not have any problems with backups, but cluster2 experiences these I/O problems after 8-24 hours of backups.  I've found that running one backup at a time manually doesn't seem to cause issues, but if I allow all of the backups to run on schedule and in parallel then Cluster2 will crash hard.

    Both clusters are being backed up by the same DPM 2012 SP1 (RTM) server.  The only real difference between them is the hardware, and the lack of a SAN Switch in Cluster2.  


    Lync/Asterisk blog: www.andrewparisio.com


    Thursday, January 03, 2013 10:24 PM
  • I'm having the same issue on one of my 2 clusters.

    Cluster1, 3x HP DL360 G7's connected to a Compellent SAN via 4GB FC & McData 4700 switches

    Cluster2, 2x Dell R620's directly connected to a Dell MD3620f via 8GB FC

    Both clusters run the same exact Windows Server 2012 Datacenter, imaged the same way at roughly the same time.  Cluster1 does not have any problems with backups, but cluster2 experiences these I/O problems after 8-24 hours of backups.  I've found that running one backup at a time manually doesn't seem to cause issues, but if I allow all of the backups to run on schedule and in parallel then Cluster2 will crash hard.

    Both clusters are being backed up by the same DPM 2012 SP1 (RTM) server.  The only real difference between them is the hardware, and the lack of a SAN Switch in Cluster2.  


    Lync/Asterisk blog: www.andrewparisio.com



     KB2791729 is available from support if you open a case that have helped me on CSV residing on iscsi , i havent seen any issues on FC (edit so much for no problems on FC) Next post is FC :)

    my blog is at http://flemmingriis.com , let me know if you found the post or blog helpfull or leaves room for improvement


    Thursday, January 03, 2013 10:54 PM
  • I'm having the same issue, aslo 2 clusters:

    cluster1 4x HP ML330 G6, 2x 8 Gbit FC Switch, HP P2000 G3

    cluster2 (testing) 2x HP ML110 G6, directly connected via 4 gbit FC to HP P2000 G3

    Sometimes some LUN disappear, or is inaccessible (and I have to switch on/off maintenance mode on this LUN), sometimes VMs on affected HyperV host pause.

    Both clusters have problems witch backup and I see these events.

    Cluster Shared Volume 'Volume6' ('HyperV Data 6') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.


    Thursday, January 03, 2013 11:07 PM
  • Just checking in, is there any eta available yet for this fix?
    Thursday, January 10, 2013 4:52 PM
  • Hi,

    I believe they are doing final validation of the fixes, so hopefully it will be released soon.  The fixes are from the Windows group, so I don't have much insight on their release schedule / plans.  When I learn more, I'll update the post again.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Thursday, January 10, 2013 5:42 PM
  • If we call into PSS is there a fix available or will they defer to just deploying guest agents until the patch is released?
    Sunday, January 13, 2013 3:32 AM
  • As of 1-11-13, the fix is not yet available to any customers except a few helping to test them.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Sunday, January 13, 2013 4:14 AM
  • I am experiencing the same issue with a Hyper-V Server 2012 cluster and DPM 2012 SP1 (with rollup 1) running in a Windows 2012 VM on the same cluster.     The cluster storage is on an iSCSI SAN however backups are stored on a separate iSCSI NAS.    This configuration worked perfectly fine when the same hardware was running Hyper-V 2008 R2 SP1 and DPM 2012 on a Windows 2008 R2 guest.

    Please make the fix available as soon as possible

    Mark

    Monday, January 14, 2013 3:48 AM
  • Hi All,

    This Windows 2012 fix was just released that should resolve two major issues that customers were reporting.

    Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
    http://support.microsoft.com/kb/2799728

    If you continue to see problems protecting Windows 2012 Hyper-V guests after installing the above hotfix, please open a support case for further investigation.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    • Proposed as answer by Wagner Polachini Tuesday, January 15, 2013 10:54 AM
    • Unproposed as answer by TimBoothby Tuesday, January 15, 2013 4:28 PM
    Monday, January 14, 2013 6:26 PM
  • Excellent news Mike. I shall try it in the morning and report back.

    Thanks for being on the ball with this.

    Monday, January 14, 2013 9:04 PM
  • Thanks Mike!
    Monday, January 14, 2013 9:23 PM
  • Thanks Mike, will try this afternoon and confirm
    Monday, January 14, 2013 9:28 PM
  • Just updated both nodes, rebooted and started running a backup of all VMs on the cluster.

    Still receiving the following on both production LUNs

    "Cluster Shared Volume 'LUN2' ('LUN2') is no longer available on this node because of 'STATUS_IO_TIMEOUT(c00000b5)'. All I/O will temporarily be queued until a path to the volume is reestablished."

    Tuesday, January 15, 2013 12:03 AM
  • I have installed the hotfix on the cluster nodes and the DPM backup appears to be working well so far however the issue can take a few hours to appear so I will report back later.   RichL_PLA, do you have any VDS/VSS providers installed on the hosts from the SAN manufacturer?    If so, these may be causing a conflict unless they are Windows 2012 certified.

    Tuesday, January 15, 2013 12:39 AM
  • I have installed the hotfix on the cluster nodes and the DPM backup appears to be working well so far however the issue can take a few hours to appear so I will report back later.   RichL_PLA, do you have any VDS/VSS providers installed on the hosts from the SAN manufacturer?    If so, these may be causing a conflict unless they are Windows 2012 certified.

    My first error came about 7-10 minutes after the backup started.

    No hardware VSS providers installed

    I also made sure to disable ODX per MS:

    After you install the hotfix, CSV volumes do not enter paused states as frequently. Additionally, a cluster’s ability to recover from expected paused states that occur when a CSV failover does not occur is improved.

    To avoid CSV failovers, you may have to make additional changes to the computer after you install the hotfix. For example, you may be experiencing the issue described in this article because of the lack of hardware support for Offloaded Data Transfer (ODX). This causes delays when the operating system queries for the hardware support during I/O requests.

    In this situation, disable ODX by changing the FilterSupportedFeaturesMode value for the storage device that does not support ODX to 1.

    Tuesday, January 15, 2013 12:42 AM
  • You could also try serializing the backups assuming that you are using DPM http://technet.microsoft.com/en-us/library/ff634192.aspx

    So far no issues on the DPM backup I am running at the moment

    Regards

    Tuesday, January 15, 2013 1:07 AM
  • You could also try serializing the backups assuming that you are using DPM http://technet.microsoft.com/en-us/library/ff634192.aspx

    So far no issues on the DPM backup I am running at the moment

    Regards


    As I understand it with CSV2, serialized backup is no longer necessary
    Tuesday, January 15, 2013 1:12 AM
  • Hi All,

    This Windows 2012 fix was just released that should resolve two major issues that customers were reporting.

    Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
    http://support.microsoft.com/kb/2799728

    If you continue to see problems protecting Windows 2012 Hyper-V guests after installing the above hotfix, please open a support case for further investigation.

    Worked for me. Thank you very much.

    Wagner M. Polachini - IT Infrastructure Analyst

    Tuesday, January 15, 2013 10:55 AM
  • Yes serialization of backups is not required on Hyper-V 2012 however it reduces storage I/O since only one VM is being backed up at a time.    The DPM backups have been successful and no errors since applying the hotfix to the hosts.
    Tuesday, January 15, 2013 12:04 PM
  • I installed the hotfix and still got STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021) on 4 out of 52 VM backups. No VM's were left in a paused state. I am using serialization and no hardware VSS providers on fiber channel storage.
    Tuesday, January 15, 2013 1:55 PM
  • I installed the hotfix and still got STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021) on 4 out of 52 VM backups. No VM's were left in a paused state. I am using serialization and no hardware VSS providers on fiber channel storage.
    No serialization here and it happened on 3 of my 21 VMs backups
    Tuesday, January 15, 2013 2:00 PM
  • In addition after applying this hotfix there seemed to be a severe memory leak on the node storage owner. Memory ballooned up from a day to day 42Gb to 128GB (99% utilization on the host)

    I stopped the backup and the memory dropped immediately. Running it again now (just a resume on a single VM) and the memory is slowly creeping back up again from 43 to 55 and climbing still.

    Tuesday, January 15, 2013 3:14 PM
  • Thanks Rich - you've put your finger on it. I've done the update and turned off ODX. I've done a number of backups successfully - there seemed to be some brief glitches in the availability of some of the VMs, but nothing crashed. 

    Then all the VMs on one of the nodes started flashing critical messages, shutting down, rebooting, migrating to other hosts etc. Looking into it, the host seemed to be out of memory, even with most of the guests offline. As with Rich, this was node was the storage owner. Cancelling the in progress backups immediately freed up the RAM.

    I agree with Rich's diagnosis - severe memory leak.



    • Edited by TimBoothby Tuesday, January 15, 2013 4:32 PM
    Tuesday, January 15, 2013 4:23 PM
  • Thanks Rich - you've put your finger on it. I've done the update and turned off ODX. I've done a number of updates successfully - there seemed to be some brief glitches in the availability of some of the VMs, but nothing crashed. 

    Then all the VMs on one of the nodes started flashing critical messages, shutting down, rebooting, migrating to other hosts etc. Looking into it, the host seemed to be out of memory, even with most of the guests offline. As with Rich, this was node was the storage owner. Cancelling the in progress backups immediately freed up the RAM.

    I agree with Rich's diagnosis - severe memory leak.


    Glad to hear second validation here as well.

    I opened a case with PSS and am awaiting a call back. I'll report my findings as soon as I know something

    Tuesday, January 15, 2013 4:25 PM
  • You can tell where my backups begun
    Tuesday, January 15, 2013 5:00 PM
  • Here is the memory free on the storage owner node. I cleared the node of all guest prior to running this test. Task manager and performance monitor don't show any particular process eating all the RAM - but something clearly is.

    Tuesday, January 15, 2013 6:03 PM
  • FYI, working with the perf team and ran RAMMap, seems to be a memory leak of sorts for the volume shadow copy process for the VM trying to be backed up

    Tuesday, January 15, 2013 8:43 PM
  • We have seen this, too.  Within our monitoring software, SQL Sentry, we noted that the memory ballooning is tied to the file cache - which I'm guessing is related to the shadow copy/vds/vss stuff.  We have just installed the released hotfix and we're working to see if the stability issues are resolved, which were a much bigger deal for us...and have left me sleep deprived.

    I've pasted a screenshot below from SQL Sentry Performance Advisor.  It shows the memory peaks with each VM being backed up on that host.

    One other thing - I had the host exhaust its memory when the pagefile was set to 4GB, but have since changed it to allow WS2012 to do whatever (system managed).  Not sure if that has helped, but can't say it has hurt either.  The host has 96GB and in VMM I had reserved 6GB...but when DPM kicked off, it just mowed right over all of it.  Sigh.
    • Edited by MarkLarma Tuesday, January 15, 2013 11:20 PM
    Tuesday, January 15, 2013 9:07 PM
  • I am experiencing the same issue with low memory on the hosts especially when a VM with a large virtual disk is being backed up by DPM.     The hotfix has stopped any errors appearing on the cluster or CSV disks now however the VMM agent service crashes on the host which has the VM being backed up due to low memory and when I tried to restart it while the backup is still running the server rebooted with bug check 0x000009e.     The host was also the CSV owner at the time.

    Can anyone confirm if the hotfix also needs to be applied to the DPM server (which is running on Windows 2012)?

    Wednesday, January 16, 2013 12:20 AM
  • Hi All,

    I've been monitoring this closely and have been told that the memory leak issue is actively under investigation by the Windows team. At this time I don't have any more information.  Thanks for sharing your experience, it helps prioritize the efforts to find a solution.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Wednesday, January 16, 2013 12:45 AM
  • Hi everybody,

    I had a case opened with Microsoft for the problem where most (if not all) the RAM gets consumed during a VM backup with DPM. I saw the same behaviour as you guys where I saw that the VHDx being backed up was cached. In the case of the large VMs, it's problematic to say the least ;-). I've been told to use DynCache to control the size of the FS cache. I have yet to try it, will do today in conjunction with the hotfix KB2799728.

    Mathieu

    Wednesday, January 16, 2013 6:20 PM
  • I received a response from my case lead this evening that the product and development team are working on a resolution. Nothing more than that for the time being
    Thursday, January 17, 2013 5:15 AM
  • After installing KB2799728, I got this console error (on all server, I applied KB). I can manage my clusters only remotly from server without KB2799728.

    I can aslo confirm memory leak when backup runs.


    Thursday, January 17, 2013 11:15 AM
  • After installing KB2799728, I got this console error (on all server, I applied KB). I can manage my clusters only remotly from server without KB2799728.

    I can aslo confirm memory leak when backup runs.



    This is a known issue with KB2750149. If you uninstall it, you will not receive that error when opening Failover Cluster Manager
    Thursday, January 17, 2013 12:19 PM
  • Would the following registry items do anything to help?

    Windows Registry Editor Version 5.00

    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management]
    "LowMemoryThreshold"=dword:00001800

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Virtualization]
    "MemoryReserve"=dword:00001800

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Virtualization\AdditionalMemoryReserve]
    "FailOverClusteringMemReserve"=dword:00001000

    Thursday, January 17, 2013 8:34 PM
  • To answer my own question...it made no difference.  Come on, MS...where's the hotfix or a workaround that we don't have to call in for?
    Thursday, January 17, 2013 9:58 PM
  • I think this is crap!

    MS put out a new product that can not be backed up. I have tried several backup programs. All with the same result, CSV reporting 0KB free and all WM´s is stopped critical. 

    Everyone told me that using DPM should be safe. It is MS product and it´s a sure thing it will work. BUT IT IS THE SAME PROBLEM!

    PUBLISH A WORKING PATCH NOW! 

    Monday, January 21, 2013 8:48 AM
  • I have tried several backup programs. All with the same result, CSV reporting 0KB free and all WM´s is stopped critical. 
    Everyone told me that using DPM should be safe. It is MS product and it´s a sure thing it will work. BUT IT IS THE SAME PROBLEM!
    It's not a DPM issue, but NTFS driver... All backup solutions must be affected due it's needed to fix a file system driver...
    Monday, January 21, 2013 8:55 AM
  • Same problem here. Fully patched 2012 cluster with DPM 2012 SP1 RTM + Rollup 1.
    Monday, January 21, 2013 9:39 AM
  • Hello Everyone,

    I continue to monitor this thread and I know how frustrating this problem is, but I can assure you that the Windows team is really hard at work fixing and testing fixes for the issues that have been uncovered.   It appears that some customers are effected more than others due to scale and various workloads placed on the cluster, windows component interoperability timing etc.  Another fix is in the works and will hopefully be released soon, however as I have said in the past, I'm not privy to the Windows team's time tables.  Thanks again for sharing your experiences, as that does sometimes help prioritize internal processes, although this one is already a very high priority. 


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Monday, January 21, 2013 11:28 PM
  • I did just try this KB that has been released. It did not resolve anything. 

    Br

    Patrik

    Tuesday, January 22, 2013 9:45 AM
  • I have tried several backup programs. All with the same result, CSV reporting 0KB free and all WM´s is stopped critical. 
    Everyone told me that using DPM should be safe. It is MS product and it´s a sure thing it will work. BUT IT IS THE SAME PROBLEM!

    It's not a DPM issue, but NTFS driver... All backup solutions must be affected due it's needed to fix a file system driver...

    Yes i know that. But first MS told us that this problem only occurred when using other backup software. 

    Is it possible to run 2012 VM´s in 2008 R2 Hyper-V Cluster? I starting to thing about reinstalling my Hyper-V servers and configure a new 2008 R2 Cluster. 

    Br

    Patrik

     
    Tuesday, January 22, 2013 11:58 AM
  • Is it possible to run 2012 VM´s in 2008 R2 Hyper-V Cluster? I starting to thing about reinstalling my Hyper-V servers and configure a new 2008 R2 Cluster. 

    Br

    Patrik

    Yes, this is fine. I have two clusters, one Windows 2012 with 39 VM's and one Windows 2008 R2 with 309 VM's, of which quite a few of them are Windows 2012.  Don't bother installing the Hyper-V components on top of them though, because the integrated components are already a newer version than what Windows 2008 R2 will install.

    Looking forward to a fix for this and a fix for other issue I won't go into before I can migrate my main cluster to Windows 2012.

    Tuesday, January 22, 2013 12:10 PM
  • Is it possible to run 2012 VM´s in 2008 R2 Hyper-V Cluster? I starting to thing about reinstalling my Hyper-V servers and configure a new 2008 R2 Cluster.
    If your VM based on VHD (non VHDX) - it's possible to migrate easy to 2008 R2 back. Even if VM configuration will be unreadable - just create new VM and assign necessary VHD-files to it.

    • Edited by AndricoRus Tuesday, January 22, 2013 12:11 PM
    Tuesday, January 22, 2013 12:10 PM
  • After changing the ODX setting i get these errors in log:

    Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    and

    Software snapshot creation on Cluster Shared Volume(s) ('\\?\Volume{846c5e2e-a28a-4610-8006-21cee18f6a27}\') with snapshot set id '{486ff065-0f08-4a7f-b3aa-4f5f5c565581}' failed with error 'HrError(0x0000139f)(5023)'. Please check the state of the CSV resources and the system events of the resource owner nodes.

    Br
    Patrik

    Tuesday, January 22, 2013 12:57 PM
  • Same issue here.

    The fix did solve the problem that VM's go into pause or stopped.

    But the memory issue is still there while running backups of big VM's (+ 400gig) using DPM.

    It eat's up all the memory of the host that is holding the CSV containing the VM.

    Cheers,

    Ramon

    Wednesday, January 23, 2013 10:58 AM
  • I just received an update from my case owner that the product team is still working this with no ETA on delivery just yet.

    I was also notified of a temporary workaround that isn't very scalable outside of smaller environments, however i am testing this now. The recommendation in the interim is to move all VMs to a single node in the cluster and also make sure that node is the owner of all CSVs, then perform the backup.

    I'm validating this now.

    Wednesday, January 23, 2013 2:42 PM
  • Interesting workaround. That sounds wrong to me though as it's the node that is the CSV owner that has the memory leak. When I tried something like this all the VMs ended up showing critical memory alerts then shutting down and moving to other nodes in a not very graceful manner.

    The workaround I've been using on a 4 node cluster is to remove all VMs from one node, make this the owner of the CSVs and then backup. All the RAM will get sucked out of this sacrificial node but the backups are working and the VMs aren't affected. 

    Tim

    Wednesday, January 23, 2013 2:55 PM
  • I too am skeptical.
    Wednesday, January 23, 2013 3:01 PM
  • After changing the ODX setting i get these errors in log:

    Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    and

    Software snapshot creation on Cluster Shared Volume(s) ('\\?\Volume{846c5e2e-a28a-4610-8006-21cee18f6a27}\') with snapshot set id '{486ff065-0f08-4a7f-b3aa-4f5f5c565581}' failed with error 'HrError(0x0000139f)(5023)'. Please check the state of the CSV resources and the system events of the resource owner nodes.

    Br
    Patrik

    Anyone else getting these kinds of errors after applying KB and changing ODX setting? 

    Br
    Patrik

    Thursday, January 24, 2013 12:25 PM
  • http://support.microsoft.com/kb/2803748/en-us

    Thursday, January 24, 2013 12:37 PM
  • Anyone else getting these kinds of errors after applying KB and changing ODX setting? 

    Br
    Patrik

    Hi Patrik,

    I've checked a couple of nodes and no, I'm not seeing those errors.

    Tim

    Thursday, January 24, 2013 2:48 PM
  • RichL_PLA,

    Does the workaround of moving all the VMs and CSVs to one node work for DPM backups?    I am too afraid to try as we don't have any host base backups since migrating the cluster to Hyper-V 2012 so the safer option at the moment is to not run the backup.

    Thursday, January 24, 2013 2:55 PM
  • I to am having the memory leak issue to the point it crashes the Host and all the VMs on that host save critical and jump ship.   Very frustrating.  I have applied KB2799728 and am now waiting on whatever the latest fix to this fiasco will be.  



    • Edited by Seth H. _ Thursday, January 24, 2013 9:01 PM
    Thursday, January 24, 2013 9:00 PM
  • Not meaning to sound pendantic, but if KB2799728 causes a memory leak, shouldn't the hotfix be removed?  Surely guaranteed memory leaks is a bigger issue than random virtual machines potentially going into a paused state during backup?

    I suppose a valid question is, are there people deploying this hotfix NOT getting the memory leak?

    Friday, January 25, 2013 9:44 AM
  • Not meaning to sound pendantic, but if KB2799728 causes a memory leak, shouldn't the hotfix be removed?  Surely guaranteed memory leaks is a bigger issue than random virtual machines potentially going into a paused state during backup?

    I suppose a valid question is, are there people deploying this hotfix NOT getting the memory leak?

    Doesn't appear to have resolved the CSV crash problem for me, seeing the issue with both DPM and Veeam. Memory leak wise we're also suffering so -2 for us!
    Friday, January 25, 2013 10:48 AM
  • The more I think about this, the more I just don't understand how this never showed up in testing.  The three bugs I've seen within the last month have been pretty big...this one, the Failover manager MSC crashing after the .net 4.5 update (there's a patch now thankfully) and another I saw posted regarding updating to the new DPM SP1 where creating another protection group blows up the MMC...I mean jeez...  I expect to see some bugs, but this level of sloppiness is not a good thing from MS.  I think we all expect better.

    On that note...I would say to the dev team, thanks for working hard on this and we look forward to the hotfix.  To the folks testing...please stay focused, hire more testers, etc.

    Friday, January 25, 2013 6:46 PM
  • We have also a case open with german premier support. On our side issue is really easy to reproduce. We tried many different settings and scenarios. Only stable config for us is at the moment:

    - disable odx  ( Set-ItemProperty hklm:\system\currentcontrolset\control\filesystem -Name "FilterSupportedFeaturesMode" -Value 1 )

    - uninstall hotfix KB2799728 to avoid memory leak

    - enable per host and lun serialization in dpm ( http://technet.microsoft.com/en-us/library/hh757922.aspx )

    These settings seem to be really stable for our cluster while doing backups. As we are at the moment in the middle of the migration from 2008r2 cluster and issue raised slowly with filling up the cluster with vm's I think the issue is heavily related to load and iops on csv volumes and storage network.

    Friday, January 25, 2013 10:19 PM
  • Are there any more information about this yet? We are in the process of just starting our image-backups on our new 2012 cluster, but fortunately ran into this thread just before starting it in production.

    We are waiting for a fix to come, and someone in this thread testing it and confirming the functionallity before we go to production.

    Tuesday, January 29, 2013 7:43 AM
  • Are there any more information about this yet?

    We are waiting for a fix to come, and someone in this thread testing it and confirming the functionallity before we go to production.

    PSS reported me today - the hotfix planned to be released in approx. 2-3 weeks: after internal testing will be completed...
    Tuesday, January 29, 2013 7:47 AM
  • just have to wait ...  because we have the same issue

    sys_admin

    Tuesday, January 29, 2013 9:58 AM
  • 6 node cluster running 88 VM's with iSCSI Storage on HP Lefthand with production workloads!

    Everything fine until we migrated heavier workloads to the cluster.

    Then.... we experienced the paused VM issue back in December.
    Then.... we applied the patch a few weeks ago and had the CSV IO Timeout issues every other day.
    Then.... we Disabled ODX yesterday and now have the memory leak issue.

    Server 2012 Hyper-V 3.0 has become a nightmare to administer with these problems.

    Come on Microsoft we need this memory leak fixed!!!



    Wednesday, January 30, 2013 5:13 PM
  • just uninstall the hotfix and do a serialized backup until the hotfix is available. Backup is a way slower but everything should work stable.
    Thursday, January 31, 2013 10:33 AM
  • This hotfix has ended up being a huge problem for us. We had no memory leak problems before installing the hotfix. We had memory leak problems once the hotfix was installed, and now we're unfortunately still experiencing the memory leaks after uninstalling. BE WARNED! DO NOT INSTALL this hotfix mentioned!!!

    I called PSS and they have nobody available to help and there is no fix. We're basically screwed until we can find a place to migrate all of these VMs to so that we can completely reinstall Server 2012 at this point. This is beyond upsetting given that this is a generally available product at this point.


    Aaron Marks

    Sunday, February 03, 2013 6:47 AM
  • I was wondering if there was more information about the following:

    (1) Has anyone experienced this problem when using guest-based backups (instead of host-based backups)?

    (2) Has anyone experienced this problem when backing up to a destination that is not within the CSV hosting the Hyper-V cluster? In other words, a separate iSCSI target for example (one not part of the cluster).

    Thank you all for this excellent thread.

    Monday, February 04, 2013 2:15 PM
  • I'd just like to add my voice to the many that are experiencing this issue.  We are having the identical problem on our Hyper-V 2012 iSCSI cluster being backed up by DPM 2012 SP1 with update rollup 1.  I installed the hotfix and immediately we started having the memory leak issue on the node that is the owner of the CSV volumes.  

    I have enabled serialized backups in DPM so we'll see how things go tonight but this is a pretty serious problem for us.

    In regards to the question RJMPhD asked, we have not experienced this problem backing up Windows 2012 Hyper-V servers that are not clustered/attached to CSV volumes.  We have two 2012 Hyper-V servers that we run development workloads on and those are standalone with standard iSCSI volumes on the same SAN as the clustered servers we're experiencing problems with.  The standalone non-clustered 2012 servers are able to be backed up without any volumes or VMs going offline.

    Monday, February 04, 2013 9:55 PM
  • In regards to the question RJMPhD asked, we have not experienced this problem backing up Windows 2012 Hyper-V servers that are not clustered/attached to CSV volumes.  We have two 2012 Hyper-V servers that we run development workloads on and those are standalone with standard iSCSI volumes on the same SAN as the clustered servers we're experiencing problems with.  The standalone non-clustered 2012 servers are able to be backed up without any volumes or VMs going offline.

    I should have clarified; I was specifically asking about the question where the Hyper-V instances are running within the CSV, but the destination of the backup is not. Regardless, it sounds like this is a very serious problem and I can only imagine how frustrating it might be.

    Wednesday, February 06, 2013 1:07 PM
  • Hello,

    I think RJMPhD, your question fits into the environment we have. We are running a Hyper-V 2012 cluster with CSV on an iSCSI SAN and using DPM 2012 SP1 for backup. DPM is a separate physical machine with local disks (not on a SAN). We are experiencing all the issues described above. We also experienced the memory leak problem from the hotfix and unfortunately I have to confirm, uninstalling it does not prevent memory leaks. We had to reinstall that node from scratch. 

    Currently we are running as suggested above with serialized DPM backup. Our SAN does not support ODX so no need to disable that.

    BTW, the script mentioned in the MS KB (http://technet.microsoft.com/en-us/library/hh757922.aspx) has a bug. It assumes the CSV name starts with "Volume". Well, our does not and I ended up with empty XML file. MS if you hear this, please fix the script (look for the line "$filelist = dir $dir\Volume*" and change it to "$filelist = dir $dir\*" ).

    Now a question to others:

    Even though we are now running serialized backup, using a CSV owner node that does not have (and never had) this dreaded hotfix installed we are still experiencing high memory consumption. While backing up all free memory on this host is taken up to 99-100%. When backup is finished memory usage drops by 4-5 GB. The VMs on this host do not seem to experience memory pressure. Is this behavior expected and if yes, can we control the amount of cache the backup of csv is using on the owner node ? 

    Wednesday, February 06, 2013 2:57 PM
  • Hello,

    I think RJMPhD, your question fits into the environment we have. We are running a Hyper-V 2012 cluster with CSV on an iSCSI SAN and using DPM 2012 SP1 for backup. DPM is a separate physical machine with local disks (not on a SAN). We are experiencing all the issues described above. We also experienced the memory leak problem from the hotfix and unfortunately I have to confirm, uninstalling it does not prevent memory leaks. We had to reinstall that node from scratch. 


    Interesting; thank you for the clarification. It sounds like you are using host-based backups --- is this true?. I wonder about the pros/cons of host-based versus guest-based backups. Specifically, I wonder how live migration interacts with host-based backups; what happens within DPM when/if a guest migrates from a particular host? (Perhaps this is too far off topic and should be its own thread.)
    Wednesday, February 06, 2013 3:04 PM
  • Interesting; thank you for the clarification. It sounds like you are using host-based backups --- is this true?. I wonder about the pros/cons of host-based versus guest-based backups. Specifically, I wonder how live migration interacts with host-based backups; what happens within DPM when/if a guest migrates from a particular host? 

    Just a quick answer so we are not cluttering this thread. Yes we do use a host based backup. For us this is the best option as our VMs are 99% development/test environments that are on separate domains, have TMGs blocking traffic, etc. and they do change quite often. As for the problem with live migration - if I try to live migrate manually while guest is backed up it fails but nothing wrong happens. I can retry it later. The VMM dynamic optimizer seems to be clever enough to detect this as I did not yet observed any fails during automatic migrations. All these are based on a fairly short term observations so I may not be getting a full picture.
    Wednesday, February 06, 2013 6:56 PM
  • I should have clarified; I was specifically asking about the question where the Hyper-V instances are running within the CSV, but the destination of the backup is not. Regardless, it sounds like this is a very serious problem and I can only imagine how frustrating it might be.

    Hi RJMPhD, sorry for misunderstanding what you were asking.  In our environment our DPM server is one of the few standalone physical servers with a directly attached SAS array were it stores the backups.  So in our case yes, the Hyper-V instances are running within the CSV volumes but are being backed up to a destination that is outside of the CSV. 

    • Edited by HorusCG Thursday, February 07, 2013 4:31 PM
    Thursday, February 07, 2013 4:31 PM
  • Hi Everyone,

    I have received an update from the Windows folks regarding a fix for the outstanding issues (including the memory leak).  They are currently in the final round of thorough testing of the new fixes and once validated will be released to the public in a new KB.  That KB will supersede and be a replacement for the original that was released a few weeks back.  Assuming testing all goes well, anticipate the fix to be available by the end of the Month, if not sooner.

    Again I would like to thank all of you for the participation on this thread sharing your experiences and workaround validation, and look forward to hearing feedback from the new fix once it's available.

    Stay tuned.... I will update the thread with the new KB number when the fix is available for download.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Thursday, February 07, 2013 5:13 PM
  • I have already exported, converted all WM´s, re-installed Hyper-V Cluster with 2008 R2 and re-configured everything. Just imported the last WM´s. Now installing DPM again to run backups. Hopefully alot better then on 2012.

    I know it is easy to complain. But i think Windows Server 2012 with Hyper-V would be great when they fixed all the problems. Also i want to say, I will never again be first to try out new MS products. I will wait about 6-12 months before trying.

    Br
    Patrik


    • Edited by boje_ Friday, February 08, 2013 11:13 AM
    Friday, February 08, 2013 11:12 AM
  • We have a smaller environment and could copy all VM's onto one Host. This doesn't seem to fix the memory leak issue though.

    Simon Holman
    Expeed Technology
    http://expeed.com.au

    Wednesday, February 13, 2013 12:18 AM
  • Hi,

    Updated fix will be made available soon to address some of the issues you face. stay tuned.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.


    Wednesday, February 13, 2013 2:59 PM
  • Hi,

    Updated fix will be made available soon to address some of the issues you face. stay tuned.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.



    I see you deleted your previous post, I'm assuming we should expect today?
    Wednesday, February 13, 2013 4:07 PM
  • That would be fantastic as I have a maintenance window tonight!
    Wednesday, February 13, 2013 9:59 PM
  • Not yet made it to the public KB - once it's available I'll provide the KB number.

    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Wednesday, February 13, 2013 10:15 PM
  • Hey Mike,

    I saw this hotfix talking about a handle leak, also related to DPM. Is this the one we are looking for?

    "Assume that you use the WmiPrvSE.exe process for performance data collection on a Windows 8-based or Windows Server 2012-based computer. In this situation, a handle leak may occur in one of the WmiPrvSE.exe instances. Additionally, Microsoft System Center 2012 features that rely on performance data (for example, System Center Virtual Machine Manager (SCVMM), Data Protection Manager (DPM), and System Center Operations Manager (SCOM) may fail." http://support.microsoft.com/kb/2790831/en-us

    Best regards,

    Hans Vredevoort
    MVP Virtual Machine
    @hvredevoort
    www.hyper-v.nu



    Thursday, February 14, 2013 8:24 AM
  • Hi Hans

    I just applied this hotfix to my 2 servers and it did not solve the problem. Hopefully the hotfix Mike is talking about is released very soon.

    Simon


    Simon Holman
    Expeed Technology
    http://expeed.com.au

    Thursday, February 14, 2013 9:05 AM
  • Thanks for trying this out Simon.

    Sure hope the real fix comes very soon!

    Best regards, Hans


    Senior Consultant and Architect Servers and Storage Solutions Nobel

    Thursday, February 14, 2013 11:05 AM
  • I was near the end of my maintenance last night when I saw your post, Hans.  Thank you so much for posting it as I'm sure you're anxious as well - but in reading it, it didn't appear to apply to this issue.  However, I will be looking into it as it nonetheless seems relevant.  Thanks!

    Thursday, February 14, 2013 3:38 PM
  • Hi Guys,

    The memory leak causing the most grief is in the Windows Cluster csvflt.sys and that is now resolved in the new fix to be released.  The fix is code complete and tested, we're just waiting for it to be published and made available to the public.... I'm hoping later today. 


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Thursday, February 14, 2013 5:24 PM
  • I really hope this gets released today and doesn't go on into another weekend without a resolution.

    Simon Holman
    Expeed Technology
    http://expeed.com.au

    Friday, February 15, 2013 11:00 AM
  • I really hope this gets released today and doesn't go on into another weekend without a resolution.

    Simon Holman
    Expeed Technology
    http://expeed.com.au


    You got that right...Mike is just being a tease with that hotfix.  :-)
    Friday, February 15, 2013 3:21 PM
  • Is this new fix supposed to supersede the original discussed hotfix? If so does that mean that we can leave the original hotfix installed and install this new one over it? Or are we going to have to uninstall the original hotfix before installing the new hotfix? Just asking so we can have this question out-of-the-way before the new hotfix is released.

    Aaron Marks

    Friday, February 15, 2013 5:42 PM
  • Hi All,

    The hold up on making the fix public is with getting the new KB article updated to include the new fixes and published.  Yes, the new fix will both supersede and replace the original fix. You can install it over the original fix, or simply install it as the only fix if installing it for the first time.  I know we have people working hard on the getting the KB finished and published, but still no solid ETA.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Friday, February 15, 2013 6:00 PM
  • @Mike, we're really struggling in waiting for this fix. I've had a case open for a few weeks now: 113020110184584

    Would it be possible for you to reach out to my case owner and let him know the new KB number for this hotfix so he can get it to me before it is released? I was previously supplied with another "private" hotfix numbered KB2791729 which didn't help the problem at all (possibly made it worse). I'm assuming that this must not be the same hotfix that you're recently mentioning. If you are willing to contact me over email to provide the new hotfix number, please contact me through my contact page on my blog: http://blog.aaronmarks.com/?page_id=50

    Thank you!


    Aaron Marks

    Saturday, February 16, 2013 9:46 AM
    • Proposed as answer by Aaron M Marks Saturday, February 16, 2013 8:41 PM
    Saturday, February 16, 2013 11:24 AM
  • Hi

    The Windows team has just released a V2 of the fix to address CSV backup issues and is available for download today.  This will address the known memory leak issue along with some other issues that were discover during testing.

    This fix Supersedes the original fix and includes all fixes contained in the original.

    Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
    http://support.microsoft.com/kb/2813630

    The Windows team is investigating other issues found during testing and not included in this release.  However they wanted to get this fix published since the memory leak issue is fixed and provide immediate relief. 

    Please make us aware of any issue you face after installing this fix and again thanks for your continued patience while we continue our scaled out testing



    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.


    Saturday, February 16, 2013 2:06 PM
  • Thanks Mike, I have applied the hotfix to the hosts and the DPM backup is running well so far without serialization.    I will report back once the backups have completed.

    Regards

    Saturday, February 16, 2013 3:06 PM
  • Hi all,

    I have applied the last fix and all available updates to both Win 2012 hyperv hosts. Then i backed up 13 VM's succesfuly, but on the CSV owner  host only, i see many of this error: "Unexpected failure. Error code: 48F@01000003". Keep in mind that this error i used to see before the fix also. I used veeam software  to backup all the vm's.


    MCSE, MCTS, VCP, AIS, MCITP

    Saturday, February 16, 2013 10:02 PM
  • I have installed the patch and so far, so good. I have gotten past the point where it would cause issues previously.

    I'll rest easy once all backups are completed.


    Simon Holman
    Expeed Technology
    http://expeed.com.au

    Saturday, February 16, 2013 10:08 PM
  • Hi all,

    The DPM backups have completed successfully with no errors on the cluster so the hotfix appears to have resolved the problem

    Regards

    Sunday, February 17, 2013 8:41 AM
  • I'm seeing the same thing here. All seems to be working happily.

    Simon Holman
    Expeed Technology
    http://expeed.com.au

    Sunday, February 17, 2013 8:44 AM
  • UPDATE: I am now seeing the error

    Details: Cluster Shared Volume 'Volume2' ('ClusterStorage Volume 2') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    I was NOT seeing this error prior to installing the hotfix above.


    Simon Holman
    Expeed Technology
    http://expeed.com.au

    Sunday, February 17, 2013 9:09 AM
  • Hi Simon,

    Are you using DPM for backups    If so is it running on Windows 2012 and have you installed the hotfix on that server as well?     I am not sure if it is required considering the DPM server does not have CSV volumes however I am running DPM 2012 SP1 on Windows 2012 with the hotfix applied and haven't experienced any issues so just wondering if the patch is also needed on the DPM server.

    Regards

    Sunday, February 17, 2013 9:35 AM
  • Same...

    Cluster Shared Volume 'DataVHD' ('DataVHD') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.


    Aaron Marks

    Sunday, February 17, 2013 10:38 AM
  • I've been experiencing this issue on a two-node cluster as well and unfortunately KB2813630 hasn't helped. During the backup window today, the various CSV errors detailed above are still occuring.

    In addition, here's the graph of available memory on one of the nodes which became the CSV owner at around 9am after the backup which started around 4am crashed the other node at that time due to it running out of memory. At this point, I cancelled every pending job for VMs over about 75Gb in size which is about the limit that can be transferred before the node runs out of memory. You see clearly how each VM is represented as a dip on the graph.

    During this backup, ODX was not disabled; I've now disabled it on both nodes and will carry out some further testing tomorrow.

    Monday, February 18, 2013 1:31 AM
  • Hi GuySmith,

    What backup software are you using and have you applied to hotfix to the backup server if it is Windows 2012?

    Regards

    Monday, February 18, 2013 2:24 AM
  • Hi Mark,

    It's DPM 2012 SP1 (4.1.3313.0) running on Windows Server 2008 R2 SP1 so no need to apply the hotfix on that side.

    Monday, February 18, 2013 2:29 AM
  • Install this update (http://support.microsoft.com/kb/2813630) on a two-node cluster. After archiving DPM 2012 SP1 abnormalities were not found.

    sys_admin


    Monday, February 18, 2013 5:43 AM
  • Install this update (http://support.microsoft.com/kb/2813630) on a two-node cluster. After archiving DPM 2012 SP1 abnormalities were not found.

    sys_admin



    As per numerous posts above, this update has been installed by a number of us (indeed, as you'll see we were waiting for it) and it doesn't seem to resolve the problem.
    Monday, February 18, 2013 8:28 AM
  • I didn't get the STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021) error until AFTER I installed 2813630.

    Before then I just had the memory leak issue.


    Simon Holman
    Expeed Technology
    http://expeed.com.au

    Monday, February 18, 2013 8:52 AM
  • We got the same auto pause error again tonight. I'm wondering at the moment if installing KB2791729 alongside KB2813630 might fix this "CSV_AUTO_PAUSE_ERROR". Anyone tried this by chance? I don't know if I"m willing to waste any more time troubleshooting this without contribution from Microsoft. 

    Did anyone else open up a case with PSS and find how utterly lacking Microsoft's support is to MS Partner's these days? I reported the complaint to my Microsoft tPAM as we've also had even worse issues with PSS for DPM. The DPM PSS team is so overloaded that you generally can't get a call back for 48-72 hours even when they tell you 2 hours. 


    Aaron Marks

    Monday, February 18, 2013 9:17 AM
  • I have enabled the CSV serialization like Stefan mentioned above so we'll see if that resolves the issue .

    Simon Holman
    Expeed Technology
    Australian Web Hosting

    Monday, February 18, 2013 10:17 AM
  • I installed the hotfix Mike posted as well as the one that Hans listed, so I had hopes things would be good....  KB2790831 and KB2813630 (v2)

    I'm also getting a ton of errors.  I tried pushing things last night with doing the backups after VMM did an optimization so that some VMs weren't on the same node as the CSV and it went to a dark and evil place.  We were able to keep things stable before with having the VMs on the same node as their respective CSVs, but this obviously isn't ideal.

    I'm parsing through the logs and seeing VSS getting access denied errors when the backups start (Event ID 8194 - VSS is the source)

     

    Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface.  hr = 0x80070005, Access is denied.

    . This is often caused by incorrect security settings in either the writer or requestor process.

    Anyone else seeing this sort of thing in the system log of the local node where you see the csv pause errors?

     

     

    Monday, February 18, 2013 4:19 PM
  • MarkLarma, I get the exact same Event ID 8194 error that you do.

    I have 4 CSV volumes on a SAS attached HP san and one volume that resides on a Server 2012 SMB3 share.

    I thougth it could have something to do with the SMB3 volume but even if I move my VHDX files from it to the SAN and remove the SMB volyme i get the same error.

     

    Monday, February 18, 2013 6:03 PM
  • MarkLarma, I get the exact same Event ID 8194 error that you do.

    I have 4 CSV volumes on a SAS attached HP san and one volume that resides on a Server 2012 SMB3 share.

    I thougth it could have something to do with the SMB3 volume but even if I move my VHDX files from it to the SAN and remove the SMB volyme i get the same error.

     

    I also have a SAS attached SAN.  Our setup is the Intel Modular Server MFS25 with four of the MFS5520VI blades. (Each has dual x5560's, 96GB, 4 Intel Gig NICs, LSI SAS HBA).  The array is the Promise E610sD with two add-on shelves.  Anyway, all firmware is current and aside from DPM this thing is rock solid.

    I perused the logs a bit more and they have several errors in there just littering it up once things start going badly.  I'd be happy to send these to Microsoft if they'd like it.  I'd imagine, ToniKo, that you're seeing the same thing (as are a lot of folks I'm guessing). 

    I will be putting the VMs on the hosts with their storage and hopefully tonight it won't have issues.  I also wish everyone else luck with this issue...and wish Microsoft would make it so we didn't need luck :-)

    Monday, February 18, 2013 6:29 PM
  • Installing KB2813630 on the hosts and the DPM server, did not solve the memory leak problem for me.

     

    I am running a 3 node cluster.

    2 hosts containing all the vm's, 1 node containing the csv's.

     

    This setup was working good for small vm's. As long I was backingup one vm at a time.

    This way the node holing only the csv's has enough free memory to fill during DPM backup.

     

    After installing KB2813630 I tried to backup one vm, which has one disk of 400G. After 30 minutes the memory usage went from 8G to 120G and I had to abort the backup, otherwise the host containg the csv's crashed, causing al vm's to go down.

     

    Ramon


    Monday, February 18, 2013 7:04 PM
  • Hello All,

    I would like to comment on this error:

    Cluster Shared Volume 'Volume2' ('ClusterStorage Volume 2') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.


    STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR is generated when csvfs filter attempts to retrieve the Copy On Write bitmap for a snapshot volume that has been cleaned up.  This error is most likely occurring on large scale hyper-v deployments and is one of the issues we discover after fixing other scale out problems addressed in the V2 fix. Due to ongoing long haul testing required to be done, we did not want to hold up V2 of the fix that we just released, so the Windows group will release a more compressive V3 patch a little later to address that and other issues found during large scale testing.

    For any customers still experiencing the same symptoms as outlined in KB2813630 after installing the fix, please check binary versions on all nodes.

    File name       File version       File size        Date        
    ======      =========   ======    ====
    Csvflt.sys     6.2.9200.20626   205,824      06-Feb-2013
    Clussvc.exe  6.2.9200.20623   7,217,152   07-Feb-2013
    Ntfs.sys       6.2.9200.20623   1,933,544   07-Feb-2013

    If Binaries are correct on all nodes, please open a support case so we can investigate the issue further.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Monday, February 18, 2013 7:58 PM
  • Mike,

    Thank you for your honesty and letting us know that MS recognizes and is working on addressing this error message.

    I checked all of those versions on all nodes and found them to be the same as the ones you mentioned. I mentioned my case number above (113020110184584). I'm not having much confidence in the support I'm receiving considering that PSS was days slower in getting back to me with the V2 fix that you posted to these forums. Even still, they only got back to me because I requested it. PSS has gone down hill these days and basically all you hear when speaking to them is how they have to go talk to premier support. As a Microsoft Partner, how are we supposed to go about getting real support. Years ago I used to get fantastic support from Microsoft. @Mike, do you have the ability to reach out to my case owner (an escalation engineer by the name of Satya) and ask if he can focus on this fix and work together with me on anything that you need.

    -Aaron


    Aaron Marks

    Tuesday, February 19, 2013 6:01 AM
  • Hi,

    I've applied KB2813630 (v2) and this does seem to have resolved the memory leak issue.

    I'm still seeing issues though. As far as I can see CSV access is disrupted shortly after initiating a backup. The canary for me seems to be the Linux guest machines. I have a CentOS based MySQL server which went almost completely unresponsive - after 10 minutes being unable to log onto it I have to hit the reset button. I had another CentOS machine which crashed and rebooted itself. 

    I suspect that the Linux machines are just more sensitive to the interruption to the CSV, and the Windows guest machines are still having issues but are handling it better. Is anyone else brave / foolish enough to be running Linux guests and are you seeing similar issues?

    I have seen event ID 5120 logged against the CSV - Cluster Shared Volume 'VolumeX' is no longer available on this node because of 'STATUS_NETWORK_NAME_DELETED(c00000c9)'. All I/O will temporarily be queued until a path to the volume is reestablished. 

    I guess I now wait for v3 of the patch.

    Tim

    Tuesday, February 19, 2013 3:40 PM
  • I am getting errors event ID 5120 and 5217 on a Hyper-V 2012 cluster with KB2813630-v2 installed on the hosts which appear on the CSV owner node shortly after the DPM backups start.     In this environment the hotfix is not installed on the DPM server virtual machine although the memory leak issue appears to be fixed.

    However on another separate Hyper-V 2012 cluster which the hotfix is installed on the hosts and the DPM server (VM) there are no cluster errors at all.     Mike, can you confirm if the hotfix also needs to be installed on the DPM server to resolve these issues? 

    Wednesday, February 20, 2013 2:37 PM
  • Is there any more information from MS about the ETA och the v3 (vFinal?) of this fix?

    Tuesday, February 26, 2013 6:38 PM
  • I have a case open with Microsoft Premier Support and so far haven't heard an ETA of a final fix...
    Wednesday, February 27, 2013 10:15 AM
  • Hi, 

    We have a similar issue like the above examples with cluster shared volumes. - Biggest issue is storage timeouts - and when we have the storage issues Windows event counters stops writing counter info to local disk in host).

    It seems that one thing makes a difference is when flow control is enabled on the switches (or on a path between the hyper-v hosts that uses CSV) it seems to affect Windows 2012 i a really bad way (Slow live migrations, CSV timeout).

    This is just a obsevation - so I think that someone else should do some testing before everyone changes network configurations :-)

    Friday, March 01, 2013 2:51 AM
  • Hi all,

    I found the CSV errors event ID 5120 and 5217 occurred because the DPM server which is running on a virtual machine in the cluster was backing up itself.    Once the DPM VM was taken out of the protection group the errors stopped appearing.   Therefore the hotfix seems to have resolved all the issues but it will be interesting to find out what fixes are included in v3 of the hotfix.

    Friday, March 01, 2013 2:59 PM
  • Is there any ETA for a new hotfix/patch?

    I get random problems backing up a file server on our w2012 cluster.

    We have tried CSV on a SAS atached storage, Iscsi storage and SMB3 share. It seems to backup small VMs fine.(200-500gb)

    But our file server with about 11TB storage on 3 different VHDX files, it cant be backed up now.

    If on CSV in DAS och ISCSI the CSV volumes dissaperas and the cluser hungs.

    In SMB3 share, the server that has that share, dismounts the whole volume, it can have something to do with IO bottle necks.

    But its starting to get frustrating to not have backup of the server, only backup we have now is shadow copyes and a replicated server within hyperv3. =/

     

    Monday, March 04, 2013 11:13 AM
  • Installing KB2813630 on the hosts and the DPM server, did not solve the memory leak problem for me.

     

    I am running a 3 node cluster.

    2 hosts containing all the vm's, 1 node containing the csv's.

     

    This setup was working good for small vm's. As long I was backingup one vm at a time.

    This way the node holing only the csv's has enough free memory to fill during DPM backup.

     

    After installing KB2813630 I tried to backup one vm, which has one disk of 400G. After 30 minutes the memory usage went from 8G to 120G and I had to abort the backup, otherwise the host containg the csv's crashed, causing al vm's to go down.

     

    Ramon


    It seems, I was also missing an update for the DPM Agent on the hosts. After updating this, the memory leak was solved.
    Thursday, March 07, 2013 8:43 AM
  • Is there any ETA for a new hotfix/patch?

    I get random problems backing up a file server on our w2012 cluster.

    We have tried CSV on a SAS atached storage, Iscsi storage and SMB3 share. It seems to backup small VMs fine.(200-500gb)

    But our file server with about 11TB storage on 3 different VHDX files, it cant be backed up now.

    If on CSV in DAS och ISCSI the CSV volumes dissaperas and the cluser hungs.

    In SMB3 share, the server that has that share, dismounts the whole volume, it can have something to do with IO bottle necks.

    But its starting to get frustrating to not have backup of the server, only backup we have now is shadow copyes and a replicated server within hyperv3. =/

     

    Hi there, did you apply KB2813630-v2 - and did it not resolve the issue?

    Thursday, March 07, 2013 10:41 PM
  • Same problem, I have applied this hotfix but unfortunately didn't solve the issue. I have also opened support case with MS but so far no answer.
    Friday, March 08, 2013 11:34 AM
  • Same Problem. Hotfix KB2813630-v2 on all nodes applied. This happend while backup runs: 

    Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer available on this node because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    Friday, March 08, 2013 4:15 PM
  • Is there any ETA for a new hotfix/patch?

    I get random problems backing up a file server on our w2012 cluster.

    We have tried CSV on a SAS atached storage, Iscsi storage and SMB3 share. It seems to backup small VMs fine.(200-500gb)

    But our file server with about 11TB storage on 3 different VHDX files, it cant be backed up now.

    If on CSV in DAS och ISCSI the CSV volumes dissaperas and the cluser hungs.

    In SMB3 share, the server that has that share, dismounts the whole volume, it can have something to do with IO bottle necks.

    But its starting to get frustrating to not have backup of the server, only backup we have now is shadow copyes and a replicated server within hyperv3. =/

    Hi there, did you apply KB2813630-v2 - and did it not resolve the issue?

    Yeah, i double checked the files and they are there =/

    Csvflt.sys     6.2.9200.20626   205,824      06-Feb-2013
    Clussvc.exe  6.2.9200.20623   7,217,152   07-Feb-2013
    Ntfs.sys       6.2.9200.20623   1,933,544   07-Feb-2013

    Friday, March 08, 2013 9:11 PM
  • Does this issue also occur with non clustered Hyper-V 2012 servers? we have been experiencing issues since setting up our Hyper-V servers with local storage where machines will pause during backup operations (Veeam 6.5).

    We were recommended to install KB2791729 which we obtained from Microsoft but we held off installing as it was unreleased and not yet public and were concerned about possible side effects.

    Does this KB2813630-v2 patch replace KB2791729 ?

    Thanks

    Tuesday, March 12, 2013 1:20 PM
  • Add me to the list of having this issue.  I installed the KB2813630 fix when it came out a couple weeks ago, and the last two nights have had my cluster go down.  It worked OK until Sunday night.  I also have an MS case open on this.
    Tuesday, March 12, 2013 1:30 PM
  • We have apply KB2813630-v2 to all our Hyper-V 2012 cluster nodes and still we are getting events 5120 and 5142 when backuping with DPM 2012.
    Tuesday, March 12, 2013 4:04 PM
  • -Jasse-, we get same errors


    sys_admin

    Thursday, March 14, 2013 5:58 AM
  • We have apply KB2813630-v2 to all our Hyper-V 2012 cluster nodes and still we are getting events 5120 and 5142 when backuping with DPM 2012.
    Has there been an update since Tuesday, are you running CU1 as well? 
    Thursday, March 14, 2013 9:46 PM
  • I am also having the same issue. It is not always caused by DPM starting a backup. Today, I just went to the VMM server and tried to connect to console of a VM running on one node. That node started these same signature i/o issues. Hard boot of the node gets it back online. Come on MS. Help us out!
    Friday, March 15, 2013 8:24 PM
  • Similar problem is here in even Japanese Environment.

    Before applying KB2813630.

    • CSV disks of 4 hosts connected by Fibre-SAN seems no problem during backup.
    • Some guest's data disk disappeared. There is no problem about system disk. When rebooting guest machine, data disk was come back.
    • virtual DPM server's data disk which is target of backup data always disappeared, so that backup always failed.

    After applying KB2813630

    • CSV disks of 4 hosts connected by Fibre-SAN seems no problem during backup.
    • Some guest's backup was successed.
    • Some guest's state had changed to power off.

    KB2813630 changed problem better, but not a perfect solution. More hotfix is needed.

    Monday, March 18, 2013 3:14 AM
  • I've been stalking this thread since it was started.  We have been battling this issue since DPM was in CTP1.  The latest advice we have been given is to disable TRIM which we have done in our lab and production environments (fsutil behavior set disabledeletenotify 1).  We continue to have issues after making this change.  We've had a case open on this for roughly 5 months...what a nightmare.  I'll update as we receive information.

    STATUS_CONNECTION_DISCONNECTED(c000020c)
    STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)

    Tuesday, March 19, 2013 4:20 PM
  • I had the same issue and now I got some help from Microsoft. In my case it was a problem with ODX.

    First we tried to installing the latest hotfix for ODX (KB2796995) and rebooting the cluster nodes.

    Details regarding the ODX Hotfix :

    As per the research team this issue occurs because the copy engine incorrectly initializes regular copy chunks. Therefore, the copy engine restarts the entire copy process for the file when nonzero bytes are copied through the ODX. When the copy engine restarts, the destination file size is incorrectly set if all the following conditions are true:
    • The copy type is noncached.
    • Nonzero bytes are copied through the ODX.
    • The file size is not aligned to a sector boundary.

    But that did not do the trick for me so we disabled the ODX by changing the "FilterSupportedFeaturesMode value for the storage device that does not support ODX to 1. and rebooted the clusternodes.

    Location : HKLM\System\Current Control Set \Control \FileSystem\FilterSupportedFeaturesMode

    Now Everything works fine..


    Peo

    Wednesday, March 20, 2013 1:02 PM
  • Certain clusters I have setup experience no errors when the DPM backups run and others do.    They are built identically with similar hardware and I have also tried disabling ODX however it hasn't made any difference.    The VMs and CSVs remain online however events 5120 and 5217 appear on the cluster within the first minute of the DPM backups running.     CSV serialisation is not setup and the default 3 MaxAllowedParallelBackups is set on the DPM servers.

    Friday, March 22, 2013 10:04 AM
  • We were seeing the same on one cluster, events 5120 and 5217 being logged on the cluster within a minute or so of DPM backups running.

    Last night however, 5142 was logged repeatedly:

    Cluster Shared Volume 'Volume3' ('Cluster Disk 4') is no longer accessible from this cluster node because of error 'ERROR_TIMEOUT(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.

    A lot of the VMs that were coexisting on the volume died. There are 20 at the moment. Some become unresponsive, some blue screened, some had weird symptoms like couldn't login, Hyper-V reporting unable to connect to configuration storage etc.

    Can confirm the updated versions of Csvflt.sys, Clussvc.exe, Ntfs.sys are on each node (currently 6 node cluster, FC IBM DS3524 storage).

    There's nothing really relevant in the cluster logs, I see this for the same 5 VMs repeatedly:

    000006c0.00001abc::2013/03/22-03:13:58.915 INFO  [RCM [RES] SCVMM VMNAME embedded failure notifciation, code=0 _isEmbeddedFailure=false _embeddedFailureAction=0

    Tuesday, March 26, 2013 9:11 PM
  • How did you determine it was a problem with ODX?

    As another TechNet user so eloquently put it: If Windows 2012 Hyper-V is supposed to be the game changer MS say it is, I don't want to play anymore.

    Tuesday, March 26, 2013 9:27 PM
  • Add me to the list of having this issue.  I installed the KB2813630 fix when it came out a couple weeks ago, and the last two nights have had my cluster go down.  It worked OK until Sunday night.  I also have an MS case open on this.

    The temporary workaround of my case was also disabling ODX.  We still see the event 5120 messages, but haven't had the cluster go down.  Also eagerly waiting for a permanent fix.  If there is good news, I suppose, it's that MS is getting flooded with this problem (per the team lead of the support team), and therefore is a high priority.
    Wednesday, March 27, 2013 6:36 PM
  • That is, if we can actually suppose such a thing :( Did you restart your cluster nodes after disabling ODX? I'll do the same on our clusters, we can't afford for them to keep going down.

    Wednesday, March 27, 2013 9:04 PM
  •  I can confirm disabling ODX is temporary fix, although I still see 5120 events, but for now VMs remain online. Also I did restart my hosts after I applied KB2813630  
    Thursday, March 28, 2013 10:37 AM
  • Microsoft - Any updates when the final fix will be available? Current status? Please keep us informed!

    Friday, March 29, 2013 11:23 AM
  • Last I heard was mid May :-(
    Monday, April 01, 2013 2:08 PM
  • Im getting the same warning. But with non of the mentioned KB installed. The backup are being done, and non of the VMs are in stopped or paused state.

    Cluster Shared Volume 'DISK1' ('DISK1') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    Thursday, April 18, 2013 8:51 AM
  • Also in the same boat with these errors, DPM 2012 SP1 UR2 + Windows 2012, 10 node cluster using CSV.

    We are currently migrating from 2008R2 cluster to 2012 so this is quite scary. Already had to fix 2 VM's which couldn't start.


    • Edited by -DeNMaN- Friday, May 03, 2013 1:02 AM
    Tuesday, April 23, 2013 3:19 AM
  • Im just wondering when the failures occur, does that only happen when all of the hosts in the cluster have a VM that they are hosting?

    Zarko

    Tuesday, April 23, 2013 6:33 AM
  • Hi all

    I do have quite the same problem, with one small difference, but first the details

    2 Node Cluster with Hyper-V-Server 2012
    HP SAN connected over iSCSI
    CSV 2TB MPIO connected

    We do have for some ProjectLabs quite different Private Clouds on the Cluster, all backed up with DPM 2012 SP1.

    The time the Backup happens we also get the 5120 Event, but, only the Windows 2012 VM's are going off, Windows 2008 R2 and Linux VM's are not going off.

    I did try the Hotfix mentioned in the Thread, no luck

    Thanks,

    Tom

    Friday, May 03, 2013 8:32 PM
  • We are seeing this error too. I january only on ISCSI devices, where the hotfix worked, odx disabling. 

    Now at another cluster with Fibre Channel, we see the error again. Hotfix and ODX disable did not do the trick. 
    The only difference here is, we use dpm sp1, at the other cluster it was another backup solution (using VSS). 

    So I hope we get any news on this. 

    Thanks
    Patrick 

    Tuesday, May 07, 2013 2:03 AM
  • Last I heard was mid May :-(

    Any update from Microsoft on when this patch will be releases. Is this still on for mid may as per Marcus' post?

    Tuesday, May 07, 2013 10:31 AM
  • I'm also having this problem. The latest hotfix and disabling ODX did not solve anything. This is really frustrating.  I'm using serialized backups with dpm 2012 sp1. 
    Sunday, May 12, 2013 5:21 PM
  • Still have the problem.  Running a production environment on a RTM product, with no ability to back it up without crashing servers and corrupting databases.  No big deal.
    Tuesday, May 14, 2013 5:21 PM
  • I agree with Pete, we were really hoping to see something on patch Tuesday.
    Tuesday, May 14, 2013 5:29 PM
  • ok, its been Mid-May in Australia for 12 hours now!!! Where's my fix :)

    Marcus Krämer, where did you hear this from? Seems more and more likely this is going to be an SP1 fix.

    Wednesday, May 15, 2013 1:53 AM
  • Hi Paul,

    have a look at this Article, the Hotfix was released today and it seems to solve the Problems. I've installed the Patch already via CAU and did not receive any Errors since now.

    http://support.microsoft.com/kb/2838669

    Lets hope the MS finally got it now.

    I'll update you when i receive any Errors.


    • Edited by Hummeldum Wednesday, May 15, 2013 10:12 AM Forget to paste Link ;)
    Wednesday, May 15, 2013 10:12 AM
  • I have been following this blog for a few months now. I opened a case with Microsoft last week after our main cluster failed with the same errors mentioned above. The technician just emailed me with the following fixes that were released today. I plan to install them in the next week or two when we have a maintenance window. Here is the information:

    Virtual machine enters a paused state or goes offline when you try to create a backup of the virtual machine on a CSV volume in Windows Server 2012:

    http://support.microsoft.com/kb/2824600

    Update that improves cluster resiliency in Windows Server 2012 is available

    http://support.microsoft.com/kb/2838669/EN-US

    You cannot add VHD files to Hyper-V virtual machines in Windows Server 2012

    http://support.microsoft.com/kb/2836402/EN-US

    Windows 8 and Windows Server 2012 update rollup: May 2013

    http://support.microsoft.com/kb/2836988



    Wednesday, May 15, 2013 4:37 PM
  • Hi Micheal,

    just for your information:

    So, in my opinion, to solve the problem http://support.microsoft.com/kb/2838669/EN-US is enough.

    Wednesday, May 15, 2013 5:31 PM
  • I was encountering the 2 of the issues described in KB2838669.

    Before this KB, I was getting Failover Clustering timeout errors once a week when my DPM starts its snapshots.
    Yesterday I've installed this KB on 1 of my node, and things goes wrong : I've been encountering Failover Clustering 8 times in only 5 hours starting from the beginning of my DPM snapshots. Worst ? All my virtual machines hosted on this node crashed ( which was not the case when I had some failover clustering errors before ).

    Weirdest thing ? All my DPM snapshots were successful anyway !!

    So result of the KB ? I shouldn't have installed it :/

    I'm running my nodes on Win Srv 2012, and my DPM server is runnung DPM 2012 SP1. The only hotfix I installed before on my HyperV hosts is  kb2813630.



    • Edited by tena6ous Thursday, May 16, 2013 7:28 AM
    Thursday, May 16, 2013 7:25 AM
  • I'm experiencing a memory leak that I think is related to this thread, but I would like some feedback on what others are experiencing.  I have a 2 node Hyper-V 2012 cluster (full install) and I'm using DPM 2012 SP1 to back it up.  On the node that owns the CSV, there is an increase in memory that seems to coincide with my backups for time and amount of data transferred.  For large backups like Exchange, this fills up the server's memory and will crash the cluster if left alone.  The memory does not become available after the backups complete.  If I change the owner node on the CSV, the memory clears up immediately and I can even move the CSV back without issue.

    There may also be a small memory leak that is not related to the backup times, but dissipates when I change the CSV owner.

    I've installed all available updates on the two host servers (including those released yesterday) as well as hotfixes KB2813630-v2 and KB2838669.  I've also disabled ODX and serialized the backups.

    I'm not seeing related errors in Failover Cluster Manager, but I'm watching the servers like a hawk and changing the CSV owner node as needed to clear up the memory leak.

    My storage device is an EqualLogic PS6100X with the latest HIT Kit (4.5) installed.

    Is this what others are experiencing?  Any thoughts?

    This thread has been very helpful and I've been following it very closely for the past week or so!  Thank you all for your input! ^_^

    Thursday, May 16, 2013 4:34 PM
  • Hi Paul,

    have a look at this Article, the Hotfix was released today and it seems to solve the Problems. I've installed the Patch already via CAU and did not receive any Errors since now.

    http://support.microsoft.com/kb/2838669

    Lets hope the MS finally got it now.

    I'll update you when i receive any Errors.


    Hi Hummeldum,

    I tried a CAU generate update preview list and couldn't find the 2838669 update anywhere and the May CU doesn't seems to include it according to the KB...

    Did you applied the update manually?


    David

    Friday, May 17, 2013 4:08 AM
  • Hi Hummeldum,

    I tried a CAU generate update preview list and couldn't find the 2838669 update anywhere and the May CU doesn't seems to include it according to the KB...

    Did you applied the update manually?


    David


    David

    Friday, May 17, 2013 4:09 AM
  • I'm experiencing a memory leak that I think is related to this thread, but I would like some feedback on what others are experiencing.  I have a 2 node Hyper-V 2012 cluster (full install) and I'm using DPM 2012 SP1 to back it up.  On the node that owns the CSV, there is an increase in memory that seems to coincide with my backups for time and amount of data transferred.  For large backups like Exchange, this fills up the server's memory and will crash the cluster if left alone.  The memory does not become available after the backups complete.  If I change the owner node on the CSV, the memory clears up immediately and I can even move the CSV back without issue.

    There may also be a small memory leak that is not related to the backup times, but dissipates when I change the CSV owner.

    I've installed all available updates on the two host servers (including those released yesterday) as well as hotfixes KB2813630-v2 and KB2838669.  I've also disabled ODX and serialized the backups.

    I'm not seeing related errors in Failover Cluster Manager, but I'm watching the servers like a hawk and changing the CSV owner node as needed to clear up the memory leak.

    My storage device is an EqualLogic PS6100X with the latest HIT Kit (4.5) installed.

    Is this what others are experiencing?  Any thoughts?

    This thread has been very helpful and I've been following it very closely for the past week or so!  Thank you all for your input! ^_^

    We have almost the exact same setup and problem.  Windows 2012 cluster, 7 nodes running primarily SQL VM’s.  Dell Blade servers and EqualLogic PS6110XV with HIT Kit 4.5.  We have installed all the hotfixes including KB2838669, and disabled ODX as well.  A backup job triggers the memory leak, but not all the time.   Using rammap we can see the VM’s show up and never release the memory.  We will max out 256GB of memory in hours sometimes.  When I move the CSV to another node the problem follows the CSV.  My only fix is to reboot the node having the problem then move the CSV back.  We have put in 80 hours with MS so far on this.

    Using Veeam instead of DPM.

    This morning in veeam I disabled using Dell Equallogic VSS HW provider and now only using MS CSV Shadow copy.  I am going to see if that helps.



    • Edited by awinstead Friday, May 17, 2013 2:35 PM
    Friday, May 17, 2013 2:34 PM
  • I'm experiencing a memory leak that I think is related to this thread, but I would like some feedback on what others are experiencing.  I have a 2 node Hyper-V 2012 cluster (full install) and I'm using DPM 2012 SP1 to back it up.  On the node that owns the CSV, there is an increase in memory that seems to coincide with my backups for time and amount of data transferred.  For large backups like Exchange, this fills up the server's memory and will crash the cluster if left alone.  The memory does not become available after the backups complete.  If I change the owner node on the CSV, the memory clears up immediately and I can even move the CSV back without issue.

    There may also be a small memory leak that is not related to the backup times, but dissipates when I change the CSV owner.

    I've installed all available updates on the two host servers (including those released yesterday) as well as hotfixes KB2813630-v2 and KB2838669.  I've also disabled ODX and serialized the backups.

    I'm not seeing related errors in Failover Cluster Manager, but I'm watching the servers like a hawk and changing the CSV owner node as needed to clear up the memory leak.

    My storage device is an EqualLogic PS6100X with the latest HIT Kit (4.5) installed.

    Is this what others are experiencing?  Any thoughts?

    This thread has been very helpful and I've been following it very closely for the past week or so!  Thank you all for your input! ^_^

    We have almost the exact same setup and problem.  Windows 2012 cluster, 7 nodes running primarily SQL VM’s.  Dell Blade servers and EqualLogic PS6110XV with HIT Kit 4.5.  We have installed all the hotfixes including KB2838669, and disabled ODX as well.  A backup job triggers the memory leak, but not all the time.   Using rammap we can see the VM’s show up and never release the memory.  We will max out 256GB of memory in hours sometimes.  When I move the CSV to another node the problem follows the CSV.  My only fix is to reboot the node having the problem then move the CSV back.  We have put in 80 hours with MS so far on this.

    Using Veeam instead of DPM.

    This morning in veeam I disabled using Dell Equallogic VSS HW provider and now only using MS CSV Shadow copy.  I am going to see if that helps.



    I'm glad to know that I'm not the only one having issues with this!  It sounds likes we're seeing slightly different symptoms, but likely from the same problem.  That might be due to the differences in scale as we are a pretty small outfit.

    In my environment, moving the CSV to a new owner frees up the memory immediately.  At first this went pretty smooth, but now the CSV often goes completely offline briefly and sometimes causes my VMs to crash in the process.  After the move, though, the memory immediately starts growing again.

    For me it's not necessarily tied to backups, but to high disk activity (SQL, Exchange, DFS). If I can keep the VM and CSV on the same node, it seems happy. The fact that I only have two nodes might explain why moving the CSV sorts things out for me.

    I haven't tried disabling the hardware VSS writer yet, but I thought about trying that. I'd love a fix that would let me use all the new features, but for now I'd be content with a workaround that would let me sleep through the night without needing to get up and check on my servers!

    In the meantime, I'll be keeping a close eye on this thread.

    Friday, May 17, 2013 8:23 PM
  • we had also quite some time invested with premier support. we got it stable some time ago.

    try these settings and see if it also works for you:

    - disable odx

    - disable trim

    - use software vss

    - disable nic optimization (offloading,vmdq, etc.)

    - use lun and host serialization on dpm

    - use dpm 2012sp1 cu2

    yeah you are back in 2005 :-) but for us it worked and it really sucks if csv's are not stable.


    stefan

    Saturday, May 18, 2013 7:50 PM
  • Thank you Stefan, that gives me a few things to try that I haven't tried yet.

    I haven't opened a case with Microsoft yet, partly because I'm getting the impression that they have not yet been able to actually solve the problem (as opposed to providing workarounds).  For anyone who has worked with premier support, have they given you any indication on a timeline for these problems to be fixed?

    Also, can anyone confirm if downgrading to 2008 R2 would clear this up?  It would be a pretty big project to rebuild my guests and I'm not sure if that would be better than a stripped down 2012 or not.

    Wednesday, May 22, 2013 6:09 PM
  • I have applied KB 2838669 hotfix along with the latest Windows updates to the hosts and DPM server (for good measure) however the same error events 5120 and 5217 are occurring on the cluster when the backup first starts.   No further issues with memory leaks or virtual machines pausing however this was resolved after applying KB 2813630 when it was released.
    Thursday, May 23, 2013 12:17 PM
  • Jumping on:

    I manage a Hyper-V cluster, 6 nodes (HP bl460 g8's connected through iSCSI to Lefthand P4800 storage) 64GB each, good validation reports, fully up to date, carrying about 60 VM's.

    I experienced the 5120/5217 "STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)" and "snapshot set id [..] failed with error 'HrError(0x0000139f)(5023)" since first working with DPM 2012 SP1 to backup the VM's. Hotfix KB2813630 did not help, and neither does the latest KB2838669. In fact, this last released hotfix "resiliency improvement" actually seems to increase the number of errors during backup. Looks like the bug is still here.

    The DPM replicas seem to be OK and the VM's do not crash nor enter unwanted (paused, incomplete) states.

    The errors also appear in our test cluster (2 nodes connected using fibre, no iSCSI)

    I'm thinking about opening a support call, unless I know this issue is recognised by Microsoft and it's still being worked on. Does anyone know, or could a Microsoft engineer keep us posted in this thread?

    Thursday, May 23, 2013 9:29 PM
  • I can only suggest that you do open a case. The symptoms can be the same but the cause different. In my case the support guys told me that the May update might not fix the issue for me and they were right. The update did help a lot since the CSV will no longer go offline or a machine is crashing during backup. I still get the CSV auto pause errors and some backups fail but the stability of the virtual machines doesn't seem to be affected anymore. I did some extended xperf tracing after applying the hotfix for Microsoft so they can figure out what’s wrong with my CSV. Your issue could be different that mine, so if you really would like this to be fixed open a case! I'm very unhappy the quality of the "product" Hyper-V CSV and urge you all to open support cases with Microsoft in order for them to finally get this right. I really don't understand why 2012 CSV was not rock solid from the beginning since 2008 R2 CSV also was plagued with stability and backup issues...

    Friday, May 24, 2013 9:35 AM
  • Hello all,

    I have also lots of issues with CSVs and also with DPM.

    First I was thinking that the CSV hung because, removed the agent and applied all existing patchs. Now CSV is stable (FC Lun zoning was wrong and only half hosts were able to contact the lun directly, others were redirecting using cluster network, but nothing pointing out that, even Test-Cluster that was showing full green success test for cluster disks). I re-install the agent and the issue come back with VM backups, but no more CSV paused.

    I opened a call to Microsoft support, asking me to apply these patchs using the LDR branch (QFE):

    http://support.microsoft.com/kb/2838669/EN-US

    http://support.microsoft.com/kb/2795944/EN-US

    http://support.microsoft.com/kb/2837407/EN-US (?).

    For installing the LDR: http://social.technet.microsoft.com/wiki/contents/articles/3323.how-to-forcibly-install-the-ldr-branch-from-a-particular-hotfix-package.aspx

    Didn't have time to apply the LDR branch yet (should have been done with CAU hotfix plugin, but actually, the file version is from GDR and not LDR).

    Edit: BTW, this is not the subject, but do you also get VMM service crashed when configuring VMM continuous protection in DPM ?
    (Set-DPMGlobalProperty -KnownVMMServers vmmserver01.sogeti. local + DPM-VMM Helper Service configuration)

    Guillaume




    • Edited by Guigui38 Friday, May 24, 2013 10:47 AM
    Friday, May 24, 2013 10:39 AM
  • Correction: cluster was stable week, it just hang...
    Friday, May 24, 2013 11:48 AM
  • we had also quite some time invested with premier support. we got it stable some time ago.

    try these settings and see if it also works for you:

    - disable odx

    - disable trim

    - use software vss

    - disable nic optimization (offloading,vmdq, etc.)

    - use lun and host serialization on dpm

    - use dpm 2012sp1 cu2

    yeah you are back in 2005 :-) but for us it worked and it really sucks if csv's are not stable.


    stefan

    Stefan, would you mind telling me what hardare you have?  Specifically what type of SAN?
    Friday, May 24, 2013 8:54 PM
  • It's a bit early to say, but my testing seems to show that my memory problems may be tied to dynamic volumes.  I had major memory leaks every night when my system state backups kicked off (agent installed within VM guest) that corresponded to the amount of data being backed up.  I created fixed size volumes on my EqualLogic SAN and moved the biggest offenders over; so far I've not encountered this memory leak again.

    I do see other, slower memory leaks throughout the day on different VMs.  When I move my two dynamic volumes from one host to the other, the memory frees up immediately.  I do not seem to have this issue with VMs on the fixed volumes.

    After reading Stefan's post, I decided to read up a bit on TRIM.  That's when I got the idea that the problem could be a sort of conflict between TRIM and dynamic volumes.  I can't say for sure, but things are starting to look up for me.  If I can stabilize everything using fixed volumes, I might even be bold enough to try re-enabling ODX and non-serialized backups.

    Here's hoping my luck's changed!

    • Proposed as answer by JeanLouis Wednesday, May 29, 2013 5:20 PM
    • Unproposed as answer by JeanLouis Wednesday, May 29, 2013 5:21 PM
    Wednesday, May 29, 2013 1:43 AM
  • Hi All,

    I been on with MS support about this issue for roughly 2 months now, unfortunately we still haven't got to the bottom of it yet.

    This is the latest hotfix for error 5120 V-3 http://support.microsoft.com/kb/2824600?wa=wsignin1.0

    Wednesday, May 29, 2013 5:29 PM
  • I too have been fighting issues with CSV, Windows 2012 Cluster, DPM, merging snapshot trees and general performance problems for several months now. I've worked with PSS on several of the issues and applied the hot fixes as they have come out and still things aren't working as expected. I did however move every resources to a single node on the cluster and that has at least made me stable. I can now merge in the background, do DPM backups and the performance doesn't go into the toilet. Still waiting on a more robust solution but glad that I can at least sleep at night.  
    Wednesday, May 29, 2013 7:39 PM
  • Hello all,

    as I see a lot of people are following this thread sharing their own experience. This is good and I hope this helps MS solving this problem.

    I think that will be helpful to clarify that last and most updated fix to this problem is http://support.microsoft.com/kb/2838669/EN-US.

    KB2838669:

    • Includes any other previously relased hotfix for CSV backup problems (27997282801054, 2796995, 2813630, 2824600)
    • Updates all involved OS files (currently files updated are Csvflt.sys, Clussvc.exe, Csvfs.sys, Fssagent.dll, Kernelbase.dll, Ntfs.sys, Rdbss.sys, Srv2.sys, Kernelbase.dll. More details in KB article)
    • Is the only KB related to CSV included in "Recommended hotfixes and updates for Windows Server 2012-based Failover Clusters", http://support.microsoft.com/kb/2784261/en-us

     

    In my opinion the first action to solve this problem is to apply KB2838669.

    If the issue persists I suggest other two steps:

    1. If you installed a VSS HW Provider on your host force DPM agent to not use it. You can set the registry key UseSystemSoftwareProvider as described in http://support.microsoft.com/kb/2462424/en-us
    2. Enable per node and per CSV backup serialization as you were using DPM 2010. The procedure is described in http://technet.microsoft.com/en-us/library/hh757922.aspx

    Try step 1 first and evaluate results, if it solves do not serialize your backups. Parallel backup should be faster and configuring CSV serialized backup is not funny.

    Last consideration: this thread started with problems in an iSCSI environment and many people have Dell EqualLogic, still iSCSI storage. Among those who still have this problem I'd like to know how many have EqualLogic or other iSCSI solutions and how many have a FC solution. Unfortunately I can't count you.

    Alberto

    Thursday, May 30, 2013 10:53 AM
  • Dell EqualLogic, iSCSI storage here. Installed KB2838669 and did not help. I disabled VSS HW provider a few days ago and that seems to have helped. Waiting on results from Microsoft so I can share with Dell.  Still have Trim and ODX disabled.  I will be enabling that again in a few weeks.

    My backups run much slower since disabling VSS HW provider, plus i can’t use a proxy, so I don’t use the host resources.  At least things seem to be stable!


    Thursday, May 30, 2013 2:56 PM
  • Hello,

    Equallogic here (PS6000, FW 6.0.2 and HIT 4.5)

    I started another 'lab exercise' this morning ...

    • uninstalled the Equallogic HIT from one of my clusters.
    • Setup MPIO manually
    • deployed a new DPM and spawned 20 test VMs
    • set my DPM to allow 5 parallel Hyper-V backups
    • created 2 protection groups 10 and 13 VMs ... 3 old VMs were left on the cluster

    ---

    So far it looks good, five VMs can be backed up simultaneously

    The time it takes to complete the backup is acceptable for a 1Gb network

    I will let this setup run over the weekend and see if it is stable.

    ---

    All servers got the latest Windows updates and KB2838669 is installed on the nodes


    This posting is provided "AS IS" with no warranties.

    Thursday, May 30, 2013 3:50 PM
  • Dell MD3620i using iSCSI here

    Thinking of reinstalling the OS on the host computers without Dell MPIO drivers myself.


    • Edited by brock_paul Thursday, May 30, 2013 5:17 PM
    Thursday, May 30, 2013 5:07 PM
  • Hello and good morning,

    after I removed all third party software yesterday, I still get Error 5120 and Error 5217 in cluster manager.

    At the same one of the hosts shows VSS Error 12293 in application log. The other host from my two node cluster doesnt show any errors.

    The VSS Error seems to occur at the end of the backup job.


    This posting is provided "AS IS" with no warranties.

    Friday, May 31, 2013 7:42 AM
  • Hello all,

    The cluster errors 5120 and 5217 seem to occur on either fibre channel or iSCSI SAN environments.     We are using IBM DS3950 (FC+iSCSI), DS3524 (FC) and DS3000 (iSCSI) SANs on separate clusters, all experience the same issues when DPM starts the initial snapshot of the VMs.    KB 2838669 has otherwise stabilised the other issues with memory leaks and VMs pausing and CSVs going offline during backups.

    Friday, May 31, 2013 3:47 PM
  • After having my setup run ove