none
DPM2012 drops the CSV cluster during backup operations RRS feed

  • Question

  • Hi,

    We have a 2 node Hyper V cluster using DPM 2012 to backup the Hyper V virtual machines. One of our nodes continues to drop the cluster during backups with the following error:  

    Event ID 5120

    Cluster Shared Volume 'Volume1' ('CSV1') is no longer available on this node because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    Previously I thought this was due to CSV serialization issues however this has now been resolved. Is there any knowledge out there to resolve this. We are not using hardware provided snapshots and would like to get DPM working with software snapshots in the first instance.

    Wednesday, October 17, 2012 9:58 PM

Answers

All replies

  • Hi,

    It sound like your cluster networking configuration / infrastructure has a problem.  The error C0000020C = "The specified network name is no longer available."

    Please review best practice guidelines for cluster networking configuration

    System Center Data Protection Manager 2010 Hyper-V protection: Configuring cluster networks for CSV redirected access
    http://support.microsoft.com/default.aspx?scid=kb;EN-US;2473194

    http://blogs.technet.com/b/dpm/archive/2010/12/09/system-center-data-protection-manager-2010-hyper-v-protection-configuring-cluster-networks-for-csv-redirected-access.aspx

     


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Wednesday, October 17, 2012 10:20 PM
    Moderator
  • I can agree. I got these kind of disconnects in combination with "small" network issues.
    At the end it was a mixture of some problems with the result you describe.

    broken network port, wrong IP adress on a machine and SRV I/O shouldn't activated on 2012 without support... especially on broken ports ... ;)
    Thursday, October 18, 2012 7:02 PM
  • Hi,

    I allready see this kind of issue with misconfigured network card in a MSCS cluster (Like configure the same gateway on every nic).

    Do you use iSCSI or FC Attachement ?

    Ced


    Friday, October 19, 2012 2:58 PM
  • Hi Thanks for the replies,

    We use iSCSI for storage, here is the output of our cluster network config after modifications as per above links. I ran a backup tonight and the same error appeared. and servers on Node 1 where rebooted. 

    Name                                                                     Metric                                    Role
    ----                                                                     ------                                    ----
    Cluster HeartBeat                                                          1100                                       1
    CSV Cluster                                                                 800                                       1
    DMZ                                                                        1200                                       1
    Host Access Cluster                                                       10000                                       3
    iSCSI                                                                     10100                                       0
    Live Migration                                                              900                                       1

    Only the DMZ and Host access cluster has default gateways configured. (they have different gateways)

    The other thing that I am not sure on is that the Cluster heartbeat network is cabled directly between the 2 nodes in the cluster.. ie port 4 of node A is connected directly to port 4 of node B. The problem I see with this is that of oen node drops the cluster heartbeat network is lost on both nodes.

    Any ideas? im pretty sure theres a config problem somewhere just trying to work through it. All our network interfaces are 1GB we have 8 network cards in each host configured as follows

    2 - iSCSI

    1 - DMZ network

    1 - Internet network

    1 - CSV Cluster network

    1 - CSV cluster heartbeat

    1 - Internal Lan

    1 - Live Migration

    Sunday, October 28, 2012 11:19 AM
  • <Snip>

    One of our nodes continues to drop the cluster during backups with the following error: 

    Event ID 5120

    Cluster Shared Volume 'Volume1' ('CSV1') is no longer available on this node because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    <snip>

    So, only one node has this problem?

    What happens if you manually place the CSV in redirected mode when all the vm's are running on that node?


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Sunday, October 28, 2012 2:58 PM
    Moderator
  • hi Mike,

    Thanks for the feedback, I would love to perform this test however it does take all our services offline if the node was to fall over. Is there any other tests you could recommend that may have a smaller impact. we could place non criticle servers on the node but not our whole farm.

    thanks

    Monday, October 29, 2012 12:11 AM
  • Hi Mike,

    I moved all servers onto the second node (which up to this point didn't have any drops) during backup operations the node lost connectivity to the CSV cluster and all servers where restarted on that host.

    I feel that we are making some progress however something is still not right...

    Tuesday, October 30, 2012 1:18 PM
  • Yes,  In a properly configured 2-node cluster (with properly sized, speed and class of hardware) you should be able to take one node down and the other node be able to handle the workload including redirected mode during backups.  It could be that you need more CSV disks to spread the IO out so only a few guests are in redirected mode at one time.  Anyway you look at this problem, it all boils down to reliable network connectivity and having uninterrupted access to your iSCSI storage.  It could be time to take some network traces (including iscsi network) and see where we're breaking down that causes the cluster to fail.

    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Tuesday, October 30, 2012 5:44 PM
    Moderator
  • Hi please find steps below taken to resolve issue,

    1. changed network binding order to:

    Internal LAN
    Cluster Heartbeat
    CSV
    Live Migration
    iSCSI

    2. Updated drives for network cards to latest

    3. Added additional A/V exclusions for

    c:\windows\cluster
    q:\

    4. Applied following hotfixes

     http://support.microsoft.com/kb/2637197
     http://support.microsoft.com/kb/2639032
     http://support.microsoft.com/kb/2684681
     http://support.microsoft.com/kb/2687646

    Tuesday, November 20, 2012 10:02 PM