none
Cluster Validation fails - SCSI 3 Persistent Reservation

    Question

  • I have Dell Blade PEm600 and PRM610 connecting to an Equallogic PS 4000.

     

    They all iscsi boot off the storage. When running the cluster validatioin, dell advise to

    1. Bring the w2k8-cluster-witness disk online, initialize it and then set it back offline
    2. Bring the w2k8-cluster-data disk online, initialize it and then set it back offline.

    Then run the cluster validation, if I do this the SCSI 3 Persistent reservations pass.

    But if I run validate cluster after the cluster has been created, it fails the SCSI 3 persistent test.

    The error in the cluster report is:

    Failed to access cluster disk 0 from node server1.play.net status 31
    Cluster Disk 0 does not support Persistent Reservations. Some storage devices require specific firmware versions or settings to function properly with failover clusters. Please contact your storage administrator or storage vendor to check the configuration of the storage to allow it to function properly with failover clusters.

    I have upgraded the PS4000 firmware to the latest editio, no change.

    Has anyone else see this problem. Is this how you  have performed your validations?

    Thanks

     

    Nyobi

    Friday, May 14, 2010 12:31 PM

Answers

  • Hi Nyobi

    you will need to take the cluster validation report along with "storage validation log from all nodes" to dell and ask them to investigate this.

    validation should pass at all times for supportability.

     


    Gaurav Anand Visit my Blog @ http://itinfras.blogspot.com/
    Tuesday, May 18, 2010 5:58 AM

All replies

  • Hi Nyobi

    you will need to take the cluster validation report along with "storage validation log from all nodes" to dell and ask them to investigate this.

    validation should pass at all times for supportability.

     


    Gaurav Anand Visit my Blog @ http://itinfras.blogspot.com/
    Tuesday, May 18, 2010 5:58 AM
  • During validation of a Windows 2008 cluster you might encounter the in the subject mentioned failure. Here's more information on what this SCSI-3 Persistent Reservation is and why Windows 2008 Clusterings wants it:

    http://www.servercare.nl/Lists/Posts/Post.aspx?ID=71

     

    Cheers,

    Sylvia - MSFT

    This posting is provided "AS IS" with no warranties, and confers no rights.”

    Tuesday, May 18, 2010 2:11 PM
  • For those who may be struggling with this type of problem.

    After raising a support call with Dell Equallogic, applying several Microsoft patches and doing a firmware update on the Equallogic. The problems were still in residence...

    So I broke out the trusty Wireshark and low and behold I have massive packet loss between my servers and the Equallogic....

    So after more investigation, I find out that the switches our company has just bought and implemented aren't on the Switch Support Matrix for Equallogic.

    So we have another support call raised with Brocade regarding the switches as they state they do support iscsi etc...etc....

    They are being reasonably attentive and are coming onsite to do further invesigations and hopefully we will have some time of solution.

    I will advise when we get a resolution for this issue.

    Moral of the story:

    Don't let non technical people design solutions and purchase hardware for said solutions.

    Or

    Be thankful for fools, they create unending amounts of work, which keep me employed. :)

    Thursday, July 22, 2010 1:26 PM
  • I'm having the same problem with my Equallogic PS6000.  Support have told me the same thing, update the firmware and install ms patches.  I've installed patches and will be upgrading my firmware as soon as possible. 

     This link explains the same problem and states that Equallogic works....

     http://blogs.technet.com/b/askcore/archive/2009/04/15/windows-2008-failover-cluster-validation-fails-on-validate-scsi-3-persistent-reservation.aspx

     I'm currently using the Starwind target software which is working in my lab environment however I'd rather not have to purchase the full version if I can avoid it.

     Will let you know if I find a solution

     

    Tuesday, August 3, 2010 1:41 PM
  • I've updated the firmware on our PS6000 SAN to the latest version and the problem still exists for me. 

    Have you had any luck your end with the switches?

    Thursday, August 19, 2010 4:41 PM
  • By "latest version" do you mean EqualLogic firmware 4.3.6?   EqualLogic support has told us that this version is necessary to resolve some persistent reservations issues.

    Martin

    Thursday, August 19, 2010 6:50 PM
  • Yes I upgraded to 4.3.6 last weekend (the upgrade only works properly via serial connection due to a bug btw) still having the same problem.  The exact error I get from the cluster validation tool is as follows.

    Disk bus type does not support clustering, disk partition style is MBR, disk partition type is BASIC   Bus type is SCSI

     

    Friday, August 20, 2010 9:51 AM
  • Another update.

     

    Well I still have the same problem. I have been working with Dell Equallogic constantly for several months now and we still can't resolve the SCSI Persistent reservation problem.

    The problem only occurs if I boot the blades up off the Storage in (Diskless mode). Boot the blades off local storage and creating the cluster works every time. I thoughly tested the Cluster validation tool both before cluster creation and after cluster creation, from multiple nodes before and after resource failovers etc....

    However as soon as I boot off the storage, the validation tool fails. I have reduced the components down too:

     

    1. 1 switch (Brocade Fast Iron X series 424, which is supposed to be supported by Dell)

    2. 1 NIC in each blade for iscsi, 1 NIC on the storage

    3. No Dell HIT tools

    4. No Microsoft MPIO

    And it still fails.

    I have spent countless hours in Webex sessions with Dell Technicians and they can't understand why it fails. Dell have released new critical firmware update for the Storage and Dell HIT tools today. I am going to give that a go. The final step is to get Dell in onsite.

    When we resolve it...I will let you all know the fix.

     

    Wednesday, September 29, 2010 10:54 AM
  • Another update :)

     

    Dell have obtained a Brocade Switch in the X-series, which is on the supported list and are attempting to build the environment to reproduce the problem.

    I am attempting to obtain a Cisco switch and test.

    Results pending.

    Monday, October 4, 2010 6:34 PM
  • We now have a resolution/work around for the problem.

    The root cause of the problem is the iqn name that is passed to the Equallogic. If you boot diskless and the first 34 characters of your IQN names are identical then you will have this problem with Windows 2008 R2 Datacentre Edition, for clustering.

    Somewhere between the Microsoft ISCSI Initiator and the Broadcom driver/Boot ROM the iqn name is being concatenated to 34 characters. So when you attempt to do a Cluster Validation the requests for PR going to the Storage are essentially seen to be coming from the same host by the Equallogic, as the first 34 characters of the iqn names are identical.

    So, the workaround is to make sure your IQNs are unique within the first 34 characters. I have adjusted mine to reflect the server names.

    The IQN name field is supposed to accept 255 characters. It is unclear at the moment if the problem is a bug in Microsoft or Broadcom. But given the time taken to prove to Dell there actually was a problem....I am happy to go with the workaround.

    For anyone else having this problem. I hope it helps.

    Cheers

    Oh...Note: once you change your IQNs in the Broadcom ROM and in the Microsoft Iscsi Initiator, you will have to remove any iscsi devices already mounted....especially from the "Favourites" list as it stores the IQN used in the original connection here. You can then reconnect to the devices.

    On the equallogic I checked the IQN reservations, by puttying to the storage node and running the following commands.

    su ec bash

    (becareful what you do here, as it has elevated privileges and you could really break your storage.)

    iscsi_test -i -check

     

     

    Wednesday, November 3, 2010 1:29 PM
  • I disabled all kind of Offloading TCP and UDP in the properties of the network card

    Deselected Client for Microsoft Networks and File and print sharring

    Enabled Jumbo frames and voila

    Works fine now

    Friday, November 1, 2013 4:09 PM