Hyper-V服务器与存储的连接问题 RRS feed

  • 问题

  • 生产环境:2台HP服务器(win2008)使用cluster连接作为Hyper-V服务器;虚拟机使用Snapdrive连接NetApp存储中的2个LUN,一个400G(VM的OS盘),一个1000G(VM的Data盘)。

    今天突然出现VM自动重启的情况,经检查发现其中一台服务器的‘storage\disk managment'中缺少了一个磁盘,在'storage\snapdrive\disk'选项下有一个LUN的连接丢失,尝试连接时报错:'The selected LUN is already mapped to the local host' 。部分system log如下:

    Event ID: 1038  Ownership of cluster disk 'Disk G:\' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration.

    Event ID:1069  Cluster resource 'Disk G:\' in clustered service or application 'a3d8511b-6232-44a9-9c47-5e65851e2e09' failed.

    Event ID:61110  ONTAP DSM was unable to communicate with the logical unit on DSM ID 03000102.  The DSM will attempt a fail-over.  The data section of this log entry contains the NTSTATUS code.

    Event ID:15   The device, \Device\Harddisk3\DR3, is not ready for access yet.

    Event ID:61034   The multipath logical unit /vol/vol3/qtree2/{15489d81-9dc6-4a97-82fb-10e7a8c40d34}.rws on storage system CN-COQ-Storage2 disconnected.

    Clust Enent也有很多相同的error:

    Event ID:5121   Cluster Shared Volume 'VHDOS' ('Disk F:\') is no longer directly accessible from this cluster node. I/O access will be redirected to the storage device over the network through the node that owns the volume. This may result in degraded performance. If redirected access is turned on for this volume, please turn it off. If redirected access is turned off, please troubleshoot this node's connectivity to the storage device and I/O will resume to a healthy state once connectivity to the storage device is reestablished.



    2014年7月22日 10:02


  • 你好,

    感谢你的回复!重启节点确实可以解决问题。但我想知道问题的root cause,能否解释一下?

    我猜测是有瞬间的FCP reset而导致snapdrive重新连接失败,所以需要重启节点来恢复snapdrive的连接;但由于有MPIO存在,因此在FCP reset出现的时候系统寻找到了其它路径恢复了数据访问,因此并没有出现长时间的访问中断。但检查交换机的log又是正常的,所以是什么原因造成了这次cluster和存储连接的中断是我最关心的问题。

    2014年7月24日 8:24