none
CSV STATUS_IO_TIMEOUT When starting a specific VM RRS feed

  • Question

  • Hi, 

    I have the strangest problem with a customers infrastructure that I can't seem to get to the bottom of.

    Server OS Windows Server 2019 Standard

    Two Servers in the cluster accessing the CSV on a Dell EMC SC277984 via iSCSI over a 10GB back bone.

    Basically long story short, but when rebooting (the act of powering it on) a specific VM it causes the CSV to drain and produces a myriad of errors and mayhem, it takes the entire estate down for about 15 minutes whilst everything begins to resume and power up again.

    This error is found in the event logs.

    Cluster Shared Volume 'Volume1' ('Cluster Disk 2') has entered a paused state because of 'STATUS_IO_TIMEOUT(c00000b5)'. All I/O will temporarily be queued until a path to the volume is reestablished.

    I have tried rebooting the SAN, rebooting the hosts running a Cluster Validation Test (passes with no errors) 

    I'm keen to try removing the role itself from the cluster and recreating the VM in hyper-V from scratch and re-attaching it's original disks but haven't had the opportunity to try this yet.

    Wednesday, October 2, 2019 2:12 PM

All replies

  • Hi,

     

    Thanks for your question.

    Based on my experience, please try the following steps for your issue troubleshooting to see if it helps.

     

    1)Please check the state of the CSV resources and check which resource under the CSV is offline. Right click  and show critical events to diagnostic the accident.

     

    2)Meanwhile, we continue to collect the system logs in the event viewer both on the nodes.

     

    3)Please also check the network connectivity between nodes and cluster shared storage. We can refer to this blog (https://techcommunity.microsoft.com/t5/Failover-Clustering/Troubleshooting-Cluster-Shared-Volume-Auto-Pauses-8211-Event/ba-p/371994), Due to one of common auto-pause reasons is STATUS_IO_TIMEOUT, because of intra-cluster communication over the network.  This is happening when SMB client observes that an IO is taking over 1-4 minutes (depending on IO type). If IO times out then SMB client would attempt to fail IOs to another channel in multichannel configuration or if all channels are exhausted then it would fail IO back to the caller.

     

    So, we can focus on the SMB events on nodes and stoarge server. Please check if any error message the event logs SMBclient and SMBServer as below.

     

    Applications and services logs > Microsoft > Windows > SMBclient

    Applications and services logs > Microsoft > Windows > SMBServer

     

     

    4)Regarding shared storage issue, we assure that the problematic CSV can re-online. If not, we remove CSV to become into avaiable storage and set it into maintainence mode. Then switch to the storage server which owns the disk.

     

    On the storage server, please check the disk which created virtual disk the cluster use in the diskmamagement console.

     

    5)Meanwhile, we suggest to patch your nodes with the latest update.

     

    6)In addition, here’s a similar thread as your situation,please check it to see if it helps.

    https://social.technet.microsoft.com/Forums/en-US/4f935ec9-f39a-4a4e-a083-2bd763dd58c6/how-to-fix-event-id-5217-and-5120-when-backup-up-a-vm-on-csv-with-dpm?forum=dpmhypervbackup

    https://techcommunity.microsoft.com/t5/System-Center-Blog/Support-Tip-Hyper-V-hosts-fail-and-log-Event-ID-5120-when-being/ba-p/348021

     

    Hope above information can help you.

     

    Highly appreciate your effort and time. If you have any question or concern, please feel free to let me know.

     

    Best regards,

    Michael


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    Thursday, October 3, 2019 9:40 AM
    Moderator
  • Hi,

    How are things going on? Was your issue resolved?

    Please feel free to let me know if you need further assistance.

    Best regards,

    Michael


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    Thursday, October 10, 2019 2:20 AM
    Moderator
  • Hi,

     

    Just checking in to see if the information provided was helpful. Please let us know if you would like further assistance.

     

    Best Regards,

     

    Michael


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    23 hours 41 minutes ago
    Moderator
  • Hi

    I'm taking down the infrastructure on Wednesday evening and will be running cluster validation with the CSV in maintenance mode, and I plan on rebuilding the VM that's causing the issue.

    Regards,

    Carl 

    23 hours 19 minutes ago