locked
Missing a VM in Failover Cluster Manager RRS feed

  • Question

  • We did eventually cut the power when shutting down the second of the two hosts in the cluster, because it got stuck at some stage during shutdown. The reason we shut down was that there were problems piling up while we moved VMs between the two nodes.

    After boot, we no longer see one of the virtual machines in the Failover Cluster Manager. In Hyper-V Manager we see it, and it runs just fine.

    How can we add the missing VM back in Failover Cluster Manager? Is there a simple way?

    All the servers are Win 2008 R2.

    Maybe a simple question - I don't know - but we just don't have enough knowledge right now, nor has searching the net resulted in anything useful. Appreciate a bit of help holding our heads above water while we learn to swim.

     


    Bent Tranberg
    Wednesday, January 4, 2012 9:30 PM

Answers

  • Have you tried to import the machine manually? I mean "configure service or application" -> "virtual machine" and then you have to have your VM listed.
    • Marked as answer by Bent Tranberg Thursday, January 5, 2012 1:44 PM
    Thursday, January 5, 2012 10:34 AM
  • Please check this error.

    Error 05.01.2012 10:25:15Microsoft-Windows-FailoverClustering 1049 IP Address ResourceCluster IP address resource 'Cluster IP Address' cannot be brought online because a duplicate IP address '10.10.1.16' was detected on the network.  Please ensure all IP addresses are unique.

     

    Now, can you report us where the VHD and config file of the MAIL VM are located ?

    If they are located under the CSVs, just i think that the cluster lost the config of this machine.

    The solution is:

    1- Shutdown the MAIL VM

    2- Go to the cluster console, right click the cluster name, Add a service or application, choose Virtual Machine then select the MAIL VM.

    3- It will be brought highly available, you can start it. 

     


    Regards, Samir Farhat Infrastructure Consultant
    • Marked as answer by Bent Tranberg Thursday, January 5, 2012 1:49 PM
    Thursday, January 5, 2012 12:37 PM

All replies

  • The following three workarounds sync the correct VM state with the cluster:

    1. In Hyper-V manager, resume/start the VM in a saved state. Then, manually save the VM in Hyper-V Manager. Most of the time, this triggers the cluster to show the true VM state.
    2. The second workaround is similar. Instead of stating the VM manually in a saved state, shut it down in Hyper-V Manager. At times, this releases the VM's hung state within Failover Cluster Manager.
    3. The third option is more involved. using Sysinternals Process Monitor to locate the VMWP.exe process associated with the troublesome VM. By killing this process, the VM will crash and restart on another cluster node --syncing the VM state in Failover Cluster Manager. It's not the best option, but sometimes a hammer is necessary. It also beats having to kill other cluster services that affect every VM on a node.

     

    follow the below link for more information:

    http://searchservervirtualization.techtarget.com/tip/Clustering-problems-with-Hyper-V-VM-configuration-files-VM-states


    Gopi Kiran
    Wednesday, January 4, 2012 9:48 PM
  • Thanks so far. Give us some time to go through that.

    In the meantime...

    My boss insisted I post this picture to explain the situation even more clearer, and I think perhaps that might be a good idea. We see Mail_M in Hyper-V Manager, but the SCVMM Mail Resources is gone from the list of Services and applications in Failover Cluster Manager.


    Bent Tranberg
    Thursday, January 5, 2012 8:10 AM
  • Bent,

    could you post us the logs from  the event viewer


    Gopi Kiran
    Thursday, January 5, 2012 9:35 AM
  • You mean what I see in Cluster Events in Failover Cluster Manager? I saved it, opened it in Event Viewer, and saved it as tab delimited text. Is that ok?

    I have removed the text after the event IDs whenever the text was exactly identical to the preceding event with the same event ID, to shorten my post. I also cut away a long list of repetitive events where you see the ellipsis'.

     

     

     

    Level Date and Time Source Event ID Task Category

    Error 05.01.2012 10:25:15 Microsoft-Windows-FailoverClustering 1205 Resource Control Manager The Cluster service failed to bring clustered service or application 'Cluster Group' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.

    Error 05.01.2012 10:25:15 Microsoft-Windows-FailoverClustering 1069 Resource Control Manager Cluster resource 'Cluster IP Address' in clustered service or application 'Cluster Group' failed.

    Error 05.01.2012 10:25:15 Microsoft-Windows-FailoverClustering 1049 IP Address Resource Cluster IP address resource 'Cluster IP Address' cannot be brought online because a duplicate IP address '10.10.1.16' was detected on the network.  Please ensure all IP addresses are unique.

    Error 05.01.2012 10:25:07 Microsoft-Windows-FailoverClustering 1069

    ...
    ...
    ... 

    Error 05.01.2012 09:24:43

    Microsoft-Windows-FailoverClustering 1205

    Error 05.01.2012 09:24:43 Microsoft-Windows-FailoverClustering 1069

    Error 05.01.2012 09:24:43 Microsoft-Windows-FailoverClustering 1049

    Error 05.01.2012 09:24:35 Microsoft-Windows-FailoverClustering 1069

    ...
    ...
    ... 

    Error 04.01.2012 14:04:33

    Microsoft-Windows-FailoverClustering 1205

    Error 04.01.2012 14:04:33 Microsoft-Windows-FailoverClustering 1069

    Error 04.01.2012 14:04:33 Microsoft-Windows-FailoverClustering 1049

    Error 04.01.2012 14:04:25 Microsoft-Windows-FailoverClustering 1069

    Error 04.01.2012 14:04:25 Microsoft-Windows-FailoverClustering 1049

    Critical 04.01.2012 13:59:40 Microsoft-Windows-FailoverClustering 1135 Node Mgr Cluster node 'VSH2' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

    Error 04.01.2012 13:06:52 Microsoft-Windows-FailoverClustering 1205

    ...
    ...
    ... 

    Error 04.01.2012 12:00:38

    Microsoft-Windows-FailoverClustering 1205

    Error 04.01.2012 12:00:38 Microsoft-Windows-FailoverClustering 1069

    Error 04.01.2012 12:00:37 Microsoft-Windows-FailoverClustering 1049

    Error 04.01.2012 12:00:25 Microsoft-Windows-FailoverClustering 1069

    Error 04.01.2012 12:00:25 Microsoft-Windows-FailoverClustering 1049

    Critical 04.01.2012 11:54:36 Microsoft-Windows-FailoverClustering 1146 Resource Control Manager The cluster resource host subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually due to a problem in a resource DLL. Please determine which resource DLL is causing the issue and report the problem to the resource vendor.

    Error 04.01.2012 11:54:36 Microsoft-Windows-FailoverClustering 1230 Resource Control Manager Cluster resource 'SCVMM AP Configuration' (resource type '', DLL 'vmclusres.dll') either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor.

    Critical 04.01.2012 11:49:35 Microsoft-Windows-FailoverClustering 1146

    Error 04.01.2012 11:49:35 Microsoft-Windows-FailoverClustering 1230

    Critical 04.01.2012 11:44:33 Microsoft-Windows-FailoverClustering 1146

    Error 04.01.2012 11:34:23 Microsoft-Windows-FailoverClustering 1205

    Error 04.01.2012 11:34:23 Microsoft-Windows-FailoverClustering 1069

    Error 04.01.2012 11:34:23 Microsoft-Windows-FailoverClustering 1049

    ...
    ...
    ... 

    Error 04.01.2012 11:24:22

    Microsoft-Windows-FailoverClustering 1069

    Critical 04.01.2012 11:24:22 Microsoft-Windows-FailoverClustering 1564 File Share Witness Resource File share witness resource 'File Share Witness' failed to arbitrate for the file share '\\sm\QuorumFolder_Do_Not_Delete'. Please ensure that file share '\\sm\QuorumFolder_Do_Not_Delete' exists and is accessible by the cluster.

    Error 04.01.2012 11:24:21 Microsoft-Windows-FailoverClustering 1069

    Warning 04.01.2012 11:24:21 Microsoft-Windows-FailoverClustering 1562 File Share Witness Resource File share witness resource 'File Share Witness' failed a periodic health check on file share '\\sm\QuorumFolder_Do_Not_Delete'. Please ensure that file share '\\sm\QuorumFolder_Do_Not_Delete' exists and is accessible by the cluster.

    Error 04.01.2012 11:23:39 Microsoft-Windows-FailoverClustering 1069 Resource Control Manager Cluster resource 'SCVMM vs2005Bent' in clustered service or application 'SCVMM vs2005Bent Resources' failed.

    Error 04.01.2012 11:19:39 Microsoft-Windows-FailoverClustering 1069

    Error 04.01.2012 11:19:19 Microsoft-Windows-FailoverClustering 1069

    Critical 04.01.2012 11:05:36 Microsoft-Windows-FailoverClustering 1135 Node Mgr Cluster node 'VSH2' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

    Critical 04.01.2012 10:48:51 Microsoft-Windows-FailoverClustering 1135

    Critical 04.01.2012 10:41:47 Microsoft-Windows-FailoverClustering 1135

    Error 04.01.2012 10:33:51 Microsoft-Windows-FailoverClustering 1205

    Error 04.01.2012 10:33:51 Microsoft-Windows-FailoverClustering 1069

    Error 04.01.2012 10:33:51 Microsoft-Windows-FailoverClustering 1049

    ...
    ...
    ... 

    Error 04.01.2012 08:33:10

    Microsoft-Windows-FailoverClustering 1069

    Error 04.01.2012 08:33:10 Microsoft-Windows-FailoverClustering 1049

     


    Bent Tranberg
    Thursday, January 5, 2012 10:22 AM
  • Have you tried to import the machine manually? I mean "configure service or application" -> "virtual machine" and then you have to have your VM listed.
    • Marked as answer by Bent Tranberg Thursday, January 5, 2012 1:44 PM
    Thursday, January 5, 2012 10:34 AM
  •  

    ru running  Exchange 2010/Exchange 2010  Service Pack 1 ??

     

    This can be done using  the Fail over Cluster Manager GUI :fix is to simply disable and enable  “Allow clients to connect through this network” on the affected cluster network.

     


    Gopi Kiran
    • Edited by Gopi Kiran Thursday, January 5, 2012 11:41 AM
    Thursday, January 5, 2012 11:38 AM
  • Please check this error.

    Error 05.01.2012 10:25:15Microsoft-Windows-FailoverClustering 1049 IP Address ResourceCluster IP address resource 'Cluster IP Address' cannot be brought online because a duplicate IP address '10.10.1.16' was detected on the network.  Please ensure all IP addresses are unique.

     

    Now, can you report us where the VHD and config file of the MAIL VM are located ?

    If they are located under the CSVs, just i think that the cluster lost the config of this machine.

    The solution is:

    1- Shutdown the MAIL VM

    2- Go to the cluster console, right click the cluster name, Add a service or application, choose Virtual Machine then select the MAIL VM.

    3- It will be brought highly available, you can start it. 

     


    Regards, Samir Farhat Infrastructure Consultant
    • Marked as answer by Bent Tranberg Thursday, January 5, 2012 1:49 PM
    Thursday, January 5, 2012 12:37 PM
  •  

    ru running  Exchange 2010/Exchange 2010  Service Pack 1 ??

     

    This can be done using  the Fail over Cluster Manager GUI :fix is to simply disable and enable  “Allow clients to connect through this network” on the affected cluster network.

     


    Gopi Kiran

    We run Microsoft Exchange Server 2010, version 14.0.639.21, according to "Programs and Features" in CP. So there is no SP at all I believe.

    http://social.technet.microsoft.com/wiki/contents/articles/exchange-server-and-update-rollups-builds-numbers.aspx

    I am not sure I understand. Is this disable/enable thing you mention a fix for the duplicate IP address problem?


    Bent Tranberg
    Thursday, January 5, 2012 2:34 PM
  •  

    ru running  Exchange 2010/Exchange 2010  Service Pack 1 ??

     

    This can be done using  the Fail over Cluster Manager GUI :fix is to simply disable and enable  “Allow clients to connect through this network” on the affected cluster network.

     


    Gopi Kiran

    We run Microsoft Exchange Server 2010, version 14.0.639.21, according to "Programs and Features" in CP. So there is no SP at all I believe.

    http://social.technet.microsoft.com/wiki/contents/articles/exchange-server-and-update-rollups-builds-numbers.aspx

    I am not sure I understand. Is this disable/enable thing you mention a fix for the duplicate IP address problem?


    Bent Tranberg

    Brain,

    If u have any issues in the future with duplicate IP address/IP Address Resource Availability.

    Follow this Article.. Based on the Event ID : you can find the Resolution. Hope this helps..

     Microsoft Exchange Server 2010 Service Pack 1 (SP1) :There are some important fixes in SP1
    Gopi Kiran
    Thursday, January 5, 2012 4:44 PM
  • Thank you, Andrea, that finally solved it! This is exactly the kind of thing we were looking for, but failed to find, since we are rather unfamiliar with the terminology, concepts, and details of this virtualization technology. We'll learn.

    Thanks to all that replied. We will work on the other issues you pointed out.


    Bent Tranberg
    Thursday, January 5, 2012 7:53 PM
  • In Win 2012

    In Fial Over Cluster Manager

           Right Click Cluster Name -> Roles -> Configure Roles-> Next -> You can select "Virtual Machine" under the Select Role.

           Check if the missing VM is listed and add it...

    Hope you are smiling now..!

    • Proposed as answer by Geejacob Monday, December 24, 2012 7:58 AM
    Monday, December 24, 2012 7:57 AM
  • Hi Gopi Kiran

    recently I have faced a similar situation like in my 4 node cluster(FCM) one VM got failed it was not came online after multiple times, attempts and trails (Like move VM to another node, restarting all nodes and stopping the VM and restarting. nothing was success). when I check in HyperV there is no VM with that Name, so I have started digging in CSVs in one of the CSV there is a folder with that name but no confg files in Virtual Machine folder, but all VHDs and Pass-through are remain there with no effect. I am sorry to not able provide Screen shots I hope I have explained issue clearly

    Can I get a better and faster solution, because issue occurrence is very high and for few VMs impacts like any thing


    • Edited by parven001 Friday, November 7, 2014 2:20 PM
    Friday, November 7, 2014 2:19 PM