none
All VMs pause when certain nodes own the CSV RRS feed

  • Question

  • Hi.

    So ive added 2 nodes to a 6 node Server 2016 Hyper V Cluster. Hardwarewise they are the same servers (dell 730s). At first all looked fine, VMs running on those nodes, can live migrate from and to with no issues. But when one of these two nodes get ownership of the CSV Volume on which the VHDs of the VMs reside, all VMs on the entire cluster stop. Cluster validations returns only minor warnings due to updates. i had pending updates on the cluster when i added these nodes - I updated the two additional nodes when they werent part of the cluster yet and the plan was to do a CAU run when they nodes have joined the cluster. But then it fell flat when one node went into maintanance and switch CSV Ownership over to one of the new nodes).Since then i tested this on the other node as well (on a weekend) and the same happens here.

    Can these updates actually be the problem or is there anyother place I need to look into?

    Thursday, November 21, 2019 2:51 PM

All replies

  • Hi,

    Thank you for posting in forum!

    1. Please provide me with more details:
    1. )What is the warning in cluster validation outcome?
    2. )Please post the logs in event viewer regarding to failover cluster.
    1. I recommend you to evict the newly added nodes, and add them again after the cluster has been updated. And then you can test again to see if the same issue occurs.

    Hope this can help you. Please let us know if you would like further assistance.

    Best Regards,

    Lily Yang


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.


    Friday, November 22, 2019 7:22 AM
  • If by "pending updates" you mean that you have run the update but not rebooted the system, then you are in a bit of limbo.  If you haven't rebooted, you have partially applied updates.  Once you have applied an update, you should reboot as soon as possible, particularly in a clustered environment.  You should not be trying to do major maintenance with a cluster in a partially updated state - that is operating in a completely untested environment.

    tim

    Friday, November 22, 2019 1:04 PM
  • Hi,

    Just checking in to see if the information provided was helpful. 

    Please let us know if you would like further assistance.

    Best Regards,

    Lily

    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Monday, November 25, 2019 2:10 AM
  • Hi,

    Was your issue resolved? 

    If you resolved it using our solution, please "mark it as answer" to help other community members find the helpful reply quickly.

    If you resolve it using your own solution, please share your experience and solution here. It will be very beneficial for other community members who have similar questions.

    If no, please reply and tell us the current situation in order to provide further help.

    Best Regards,

    Lily

    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Wednesday, November 27, 2019 2:30 AM
  • hey. sorry for the late reply, I was on sick leave last week.

    So first of all i have now all hosts patched up to date.

    I readded the servers to the cluster and this seemed to have solved the issue for one of the nodes. The other one still has the problem.

    I created another test LUN on the Storage and also have a few VMs running on the internal S2D Volume. These all work without an issue on the remaining problematic host. Also the witness LUN does failover to this host no problem. It is just this one LUN that is giving me headaches.

    When this host has the LUN all the VMs stop, but I can still access the contents. Both from the owning node and from all other nodes.

    Seems very weird to me.

    Currently the Cluster validation only returns minor warnings but I havent run the storage specific tests obviously. I will run these on the weekend and see what they come up with.

    Tuesday, December 3, 2019 11:46 AM