none
Quick Migration takes forever on Hyper-V with Failover Cluster Feature (Offline Pending) RRS feed

  • Question

  • Hi.  I am having a problem in our customer in which I have this configuration:

    I copied the description on the following post, as I am having exactly the same problem:
    http://social.technet.microsoft.com/Forums/en-US/winserverhyperv/thread/2a1825f3-7a1c-466c-b08b-7388928bb8b6

    In my specific case, my configuration is this:
    I have a five node WS 2008 SP2 Enterprise x64 Failover cluster. All of them have HyperV role and Failover Clustering feature installed and several VMs configured for HA. Migration from any server to any other fails everytime. The VM goes to a saved state and the VM configuration stays in Offline Pending state forever. The Disk resouce is online also.

    When I try to put the disk resource for this VM offline, it takes forever and never goes offline.  I can't simulate a failure neither.  I can't start the VM after it is on the "saved" state, even when this host has the cluster disk online.

    Using Fibre Channel for Storage, each VM has its own LUN. MPIO is configured on all hosts, and I already tested with and without the two paths availables (just trying to troubleshoot the problem).

    One virtual network is configured with the identical name on all the five hosts.  All clusters tests pass without any failure.  The cluster is a node-majority cluster, because I have 5 and is recommended.

    When I installed everything, it was working as expected, this behavior began later, after a week or so.  All hosts and VM's are up-to-date, via Windows Update.

    In the post, the user reinstalled the host, but in my case, reinstalling is not a good option, as my customer is already in production and it is too difficult to do it.  Besides, I am not sure if the problem is going to come back after reinstalling!!??

    Any help will be greatly appreciated.

    Regards,

    Jose Angel Rivera.
    Tuesday, August 25, 2009 3:26 PM

Answers

All replies

  • Hello,

    Does this happen on multiple virtual machines, or just one?

    Thanks,

    Nathan Lasnoski
    Wednesday, August 26, 2009 3:41 AM
  • This happens on all of the virtual machines, even if I create a new one.
    Wednesday, August 26, 2009 12:46 PM
  • Are you able to move storage between the servers independent of Hyper-V?  I've seen issues like this when I've had problems with the SAN handing off drives to other hosts in the cluster.
    Wednesday, August 26, 2009 3:35 PM
  • Looks like something is stuck.  I found out a way to actually move the desired VM, but it is using the forced mode:

    For example, if I want to move VM1 from Host1 to Host2, I have to do this:

    Go to the Failover Cluster Management console, under "Services and Applications", I choose VM1 and then I choose "Move virtual machine to another node".  Then I choose Host2 as my destination.  The problem of the "Offline Pending" is back, then I go to Task Manager and I finish the "clussvc.exe" process (Microsoft Failover Cluster Service) and then the VM is migrated to the desired host.

    This is not a solution for my problem, but at least a partial one until I get better results or until anyone can help me with the real solution.
    Wednesday, August 26, 2009 4:39 PM
  • Are there any errors in the event log when the failover is stuck in "offline pending".  Also, have you seen this?

    http://blogs.msdn.com/robertvi/archive/2009/03/25/during-quick-migration-the-virtual-machine-configuration-resource-stays-in-offline-pending.aspx

    • Marked as answer by balboa41 Thursday, August 27, 2009 2:40 PM
    Wednesday, August 26, 2009 5:25 PM
  • My problem is solved, at last!!!

    It was related to the issue described on the post "http://blogs.msdn.com/robertvi/archive/2009/03/25/during-quick-migration-the-virtual-machine-configuration-resource-stays-in-offline-pending.aspx ", that you suggested before.  I carefully checked, and, in some of the hosts, the path of the symlinks were pointing to the local (c:\) drive.  Whenever I tried to move a VM from that host, the original host copied the .XML file with the local path on the others, and, I assume, that was the problem that became a big one after all.

    The missing explanation on the post is that you must use command prompt in order to be able to see the files on "C:\ProgramData\Microsoft\Windows\Hyper-V\Virtual Machines", as normally you can't see all of them through Windows Explorer.  Then, you must issue a "DIR" command.  You should open the command prompt in administration mode, so that you can delete the files that are causing all the problem.

    Now to the recommendations: 

    Hyper-V or Failover Cluster Management should not let you make a VM highly available if the associated configuration files are on a local path.  If you create a new VM, you *must* set the path for the snapshot and .vhd files to be on the SAN, on the same LUN, and if you make it Highly Available (manually, through Failover Cluster Management) and don't do this and put the files on the c:\ drive, then you get a warning but it lets you create it anyway.  At the end, this results in a craziness, as if you try to migrate this bad-configured-VM to another host, then the whole cluster will have problems when migrating VM's.  That is not good in any aspect, as you can imagine, because if you are running, lets say, 20 VMs on each host, and you are forced to reboot one of the hosts, then you can imagine all the inconveniences that you will have.  Normally, a user who knows all of what I explain on this post, will put a VM and it associated files on the same drive in the SAN, but a novice or someone that never had this problem, can make the wrong configuration and there is not a way to fix this easily.

    It is good to have this issue well documented, at least, or fixing it, as it is very important to have a working Cluster after you spend a lot of money on hardware and technical work to make it work...it should be problem-free, not just a warning, but an error or at least it should stop the user from trying to migrate the wrong configured VM, and explain the issue further.  A Windows server that is configured as a cluster should not let you do this kind of mistakes, and affect the whole migration system, which is one of the most important features for the cluster, and leaving a VM on a state that you can't just do anything to restart it without affecting all of the others.

    I hope this can help others having this issues.

    Thanks,

    Jose Angel Rivera
    Thursday, August 27, 2009 2:40 PM