CSV stuck in Redirected Access

Answered CSV stuck in Redirected Access

  • Monday, December 17, 2012 1:59 AM
     
     

    Hello everybody,

    I've searched topics about redirected access but still can't fix my issue as what I got.

    I have a small Hyper-V cluster: Two IBM X3850X5 with Win2008R2 Enterprise, EMC Clariion CX4-240 FC Storage and FC switches. Only Hyper-V, SCVMM 2012 and multipath application (PowerPath) are installed on the servers. There were two VM on node A and one on node B.

    I've created a CSV disk and a quorum disk. They worked normally for two weeks untill last Saturday CSV became Online(Redirected Access) on node A. Both VM failed over to node B. The root cause is still unknown.

    I tried to turn off redirected access, pause/resume node A and rebooting. None works.

    In multipath application, both CSV and witness disks are Alive. In storage managment, both servers and both LUns present in the same storage group. 

    Both disks present in Disk Management. But they just stuck Offline on node A. Their staus in Node B are Reservered.

    Concerning partion corruption, I didn't Online the disks in Disk Management. What shall I do now?


    • Edited by Ge_Feng Monday, December 17, 2012 2:11 AM
    • Edited by Ge_Feng Monday, December 17, 2012 2:12 AM
    • Edited by Ge_Feng Monday, December 17, 2012 5:32 AM
    •  

All Replies

  • Monday, December 17, 2012 4:15 AM
     
     
    I would start with root cause. If you don't know what happened last Saturday, it's possible that the issue is still affecting the cluster. Once you determine root cause, troubleshooting will be easier.
  • Monday, December 17, 2012 4:29 AM
     
     

    Good point and I'm really with Ted. But the EMC vendor just said:" Hey! Since the storage, multipath application and even Windows Disk Management shows shared disk alive, there's no issue with us." At the other hand, we didn't find any hardware warning on the front panel or event log. I'll call IBM hotline but I don't expect too much from them. 

  • Monday, December 17, 2012 4:38 AM
     
     
    Have you checked the cluster logs and/or the server event logs for entries that could help you tell where the problem came from?
  • Monday, December 17, 2012 5:46 AM
     
     

    Just searched System Logs and Application Logs and found something.

    At local time 9:22 AM, the multipath application reported an error:

    EMC PowerPath Error: emcp_xcryptd: Error: Failed to start the daemon with err 0

    Then it was shutdown:

    EMC PowerPath Info: Service: Service SHUTDOWN Event Ended at 1:22:40 AM

    And the critical events for the issue CSV disk shows that redirected access started from 9:30 AM.

    Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer directly accessible from this cluster node. I/O access will be redirected to the storage device over the network through the node that owns the volume.

    I also found another pair of error logs of PowerPath and CSV disk earlier around 8:45 AM. So these events are related. Will contact storage vendor again.

  • Monday, December 17, 2012 4:00 PM
     
     Answered
    Yeah that sounds like a good path. See what they come back with. You could also run a cluster validation to check and make sure the cluster services are seeing everything as healthy. Let us know how it goes with the storage vendor.
    • Marked As Answer by Ge_Feng Wednesday, December 19, 2012 1:34 AM
    •  
  • Wednesday, December 19, 2012 1:33 AM
     
     Answered

    Wulaaaaa it's resolved. Let me make the story straighter:

    Incident:

    At local time 2012-12-15 9:22 AM, the multipath application reported an error on one node:

    EMC PowerPath Error: emcp_xcryptd: Error: Failed to start the daemon with err 0

    Then the PowerPath service was shutdown:

    EMC PowerPath Info: Service: Service SHUTDOWN Event Ended at 1:22:40 AM

    After a few minutes, the node couldn’t reach the shared disk:

    Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer directly accessible from this cluster node. I/O access will be redirected to the storage device over the network through the node that owns the volume.

    In Windows Disk Management and EMC PowerPath, we still saw the shared disk normally. But Failover Cluster Manager couldn’t access the disk from issue node. The disk status became Online(Redirected access). Cannot stop redirected access manually. Pause/Resume failover node didn’t work. Rebooting didn’t work.

    Root Cause:

    According to EMC documents, the PowerPath error code was generated because its Encryption with RSA component didn’t work without sufficient configuration. In my case, the most strange thing is that Windows Disk Manager and EMC PowerPath still saw the shared disk on the issue node but Failove Cluster Manager didn't. 

    Resolution:

    Issue was resolved after removing the Encryption with RSA component from PowerPath and rebooting server as required. The issue shared disk became Online in Failover Cluster Manager.

    The RSA Encryption module is installed on another node, too. This Saturday I will live migrate all VM and remove the module to ensure it won't happen again.

    • Marked As Answer by Ge_Feng Wednesday, December 19, 2012 1:34 AM
    •  
  • Wednesday, December 19, 2012 2:51 AM
     
     
    Glad to hear you got it working! Thanks for posting the solution.
  • Sunday, February 24, 2013 10:01 AM
     
     

    Dear Ge_Feng

    We were installing the complete installation of Power Path. until we came up on this article. Our issue is solved, so far so good. 

    Thanks a Million to post this. I was Scratching my head for over 2 days, thinking whether it is the Storage or Microsoft Cluster,

    Cheers !

    Lewis


    Lewis

  • Sunday, May 05, 2013 9:44 PM
     
     

    Hi Ge_Feng,

    Just to confirm that this solution works for PowerPath 5.5.SP1 build 512 on Server 2008 R2 against a CX310 EMC Clariion.

    I was lucky enough to use the proper google-fu combination and came to this post on my third search...I have deactivated the RSA Encryption as suggested and everything is working like a charm now.

    Regards.


    JSP