Clustered disks goes to failed state during storage failover RRS feed

  • Question

  • Clustered disks goes to failed state during storage failover

    Q:Other than this reg key are there any other settings where we can increase the time before windows / cluster declares the disk dead?

    OS: Windows Server 2016
    Failover Cluster installed
    MPIO is NOT installed
    Cluster validation results: No failures
    Nodes in cluster: 2
    Note the disk is NOT the quorum disk

    Hardware: VMWare Virtual Machine
    Hardware OS: ESXI 6.7U3
    Shared Disk: Physical Mode RDM
    vSphere SCSI Controller: VMWare ParaVirtual
    vSphere SCSI Bus Sharing: Physical
    VMs on seperate hosts

    Storage: HPE 3PAR-9450
    Storage Type: FC SAN

    We are using HPE Peer Persistence to replicate the LUN between one 3PAR-9450 and another of same type.  When we attempt to fail over the RDM most of the time but not always we get the following error.

    Cluster resource 'Cluster Disk 1' of type 'Physical Disk' in clustered role '<redacted>' failed. The error code was '0x80070490' ('Element not found.').

    Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

    The ESXi hosts pickup the failed over RDM LUN and the datastore that the VMDK files reside on without issue.  LUN ID and NAA are consistently presented between the two storage arrays.

    If I do a cold boot of both windows VMs the cluster can be brought back online.

    • Changed type mtrohde Thursday, October 1, 2020 1:20 PM
    Thursday, October 1, 2020 1:19 PM

All replies

  • As an aside there was no way to 'High Availability (Clustering)' as the desired forum, only Technet Sandbox Forum was a choice for me.
    Thursday, October 1, 2020 1:22 PM