none
CSV - CLUSTERING Hper-V Hosts! RRS feed

  • Question

  • I have a 4 node cluster of Hyper-V Hosts Win2012R2 with Clustered Shared Volumes

    I have around 22 VMs spread across these nodes sitting on the CSVs

    Every now and then VMs get in failed state and I have to COLD Boot my Hosts to get them back online

    Is it because of CSV....? I have a VM which is on C Drive and it remains fine, no issues never

    Please advice, whats the best way for VMs to be highly available going Win2019

    Thanks a lot

    PS I have a 1TB SAN


    SV

    Wednesday, August 21, 2019 2:01 PM

Answers

  • Hi,

    Yes, you are right.

    In a cluster without CSV, only one node can access a disk LUN at a time, so multiple disks are required for migration.

    As a contract, CSV is the best way for High Availability of many clustered role.

    With Cluster Shared Volumes, storage is simplified because multiple nodes can access the same disk at once and fewer overall disks are needed. CSV can also reduce potential disconnection time when performing a live migration of VMs. Cluster Shared Volumes offers resiliency benefits by creating multiple connections between nodes and the shared disk, meaning that if one part of a network goes down, communication can be accomplished through another part of the network.

    In addition, just want to confirm the current situation. Any question or concern?

    Highly appreciate your effort and time. 

    Best regards,

    Michael


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    • Marked as answer by vai2000 Thursday, August 29, 2019 3:18 PM
    Tuesday, August 27, 2019 9:06 AM
    Moderator

All replies

  • What SAN do you use?

    Does each node has redundant paths to SAN?

    What is the state of CSV, when VMs are failed?


    Wednesday, August 21, 2019 3:40 PM
  • Hi,

    Thanks for your question.

    May I ask more information about your current situation, and try the following troubleshooting to find more clue for this issue?

    1)Any error message shown in the cluster events or in the event viewer regarding cluster?

    2)Also any resources failed or any error under failed clustered VMs?

    3)Please Check if there’s any resource under this clustered VM is offline, and Show Critical Events on this resource.

    4)Are all the VMs in the failed state?

    5)Please also Check the current hosted node encountered a low resource issue (high CPU or low memory) or other system issues.

    6)I agree with LaMerk, please check the storage CSV and the connectivity between nodes and the stoarge. Is there redundant paths from nodes to SAN?

    7)When we check CSV used for the VMs, we’ll need to Collect event logs about this disk on the owner node. We need first to remove this disk form the CSV in to available storage, then set it into maintenance mode. Then we can check the event viewer on the CSV owner node, if there’s logs regarding disk or File system.

    8)In addition, please Check the VM’s configuration files and if the VM can access to and read its VHD on the shared storage.  

    Highly appreciate your effort and time. If you have any question or concern, please feel free to let me know.

    Best regards,

    Michael


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    Thursday, August 22, 2019 8:08 AM
    Moderator
  • Whats the recommended advice? 

    1. Should I carve out LUNs for each VMs? problem was VM used to bloat and run out of space, thus we choose CSV as big fat space.

    CSV doesn't seem to be promising though over the period of time.

    How does MSFT recommends?

    Thanks a lot


    SV


    • Edited by vai2000 Thursday, August 22, 2019 5:26 PM
    Thursday, August 22, 2019 5:25 PM
  • We are seeing these errors...

    Cluster Shared Volume 'Volume1' ('Cluster Disk 2 - CSV') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.
    Cluster Shared Volume 'Volume1' ('Cluster Disk 2 - CSV') has entered a paused state because of '(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    Cluster Shared Volume 'Volume1' ('Cluster Disk 2 - CSV') has entered a paused state because of '(c00000b5)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    Thanks


    SV

    Thursday, August 22, 2019 5:36 PM
  • Hi SV,

    In your troubleshooting please also check the space on the CSV . Normally your threshold is 80% if it goes above

    that then you VMs pause because the CSV will enter paused state.A Cluster is a machine with so many moving parts so we really have to check bit by bit even the obvious things.

    How many fibre switches do you have in your config?


    Kassoka

    Thursday, August 22, 2019 5:54 PM
  • Thanks, though is CSV is the best way for High Availability?

    back in the days we used to have individual LUNS for all resources.


    SV

    Thursday, August 22, 2019 7:24 PM
  • Hi,

    Yes, you are right.

    In a cluster without CSV, only one node can access a disk LUN at a time, so multiple disks are required for migration.

    As a contract, CSV is the best way for High Availability of many clustered role.

    With Cluster Shared Volumes, storage is simplified because multiple nodes can access the same disk at once and fewer overall disks are needed. CSV can also reduce potential disconnection time when performing a live migration of VMs. Cluster Shared Volumes offers resiliency benefits by creating multiple connections between nodes and the shared disk, meaning that if one part of a network goes down, communication can be accomplished through another part of the network.

    In addition, just want to confirm the current situation. Any question or concern?

    Highly appreciate your effort and time. 

    Best regards,

    Michael


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    • Marked as answer by vai2000 Thursday, August 29, 2019 3:18 PM
    Tuesday, August 27, 2019 9:06 AM
    Moderator
  • Hi,

    Just checking in to see if the information provided was helpful. Please let us know if you would like further assistance.

    Best Regards,

    Michael


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    • Marked as answer by vai2000 Thursday, August 29, 2019 3:18 PM
    • Unmarked as answer by vai2000 Thursday, August 29, 2019 3:18 PM
    Thursday, August 29, 2019 9:41 AM
    Moderator
  • Michael, Thanks a lot for your input

    We have bad experience with CSV as most of the time disk pauses and causes total havoc, its not graceful...

    In old school way if I had independent LUNs for each VM ( i dont need live migration) I just did failover if one node goes bad to the other node.

    I have great experience with single LUNs for each of my clustered resources & now we have bought a new SAN which can expand LUN without tearing it all up (old style) i can expand my lun if my vM is hogging more space

    Thanks again


    SV

    Thursday, August 29, 2019 3:18 PM
  • Hi SV,

    Thanks for your reply. 

    Actually, the option that independent LUNs for each VM is also feasible. CSV has advantages in some aspects.

    Anything else we can do for you? If you need further assistance, please feel free to let me know.

    Highly appreciate your effort and time. 

    Best regards,

    Michael 


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    Friday, August 30, 2019 3:33 AM
    Moderator