none
Resources in Failover clustering randomly go offline RRS feed

  • Question

  • We are using a 4 node Windows 2012 Microsoft cluster for our file servers in our environment.

    Third time in this month we have noticed that some disks went offline in the Cluster Admin console.

    Failover did not happen and the resources were not available until we had to manually get the resources online.

    Shall send a screenshot in the next message as i cant do it here.

    Note: Storage is HPE 3Par. We are using Thin provisioning on the disks and the disks keep filling up to result in auto extend. I would also like to know if there is a windows config option to tell windows not to offline the lun when the 3par NAKs the IOP.

    The event logs in the cluster have a following sequence for the disks going offline.

    Warning Event id 51 was reported last night at - An error was detected on device \Device\Harddisk18\DR18 during a paging operation.

    Error event id 150 was triggered at - Disk 18 has reached a logical block provisioning permanent resource exhaustion condition.

    Eventually it resulted in event error 1038 and 1069 - Ownership of cluster disk 'XXXPrd7' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration.

    Shall appreciate inputs. Thanks.

    - Shailesh

    Thursday, November 28, 2019 11:36 PM

All replies

  • As we could see below the cluster shows up as partially running.
    Friday, November 29, 2019 12:04 AM
  • Hi,

    Thanks for posting in our forum!

    >>I would also like to know if there is a windows config option to tell windows not to offline the lun when the 3par NAKs the IOP.

    What is the mean of NAKs the IOP?

    >>As we could see below the cluster shows up as partially running.

    I cannot see any screenshot of your cluster.

    Regards,

    Daniel


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact

    Friday, November 29, 2019 7:05 AM
    Moderator
  • "Storage is HPE 3Par. We are using Thin provisioning on the disks and the disks keep filling up to result in auto extend. I would also like to know if there is a windows config option to tell windows not to offline the lun when the 3par NAKs the IOP."

    Sounds like a question to ask of HPE.  If HPE is 'blocking' IO to its system while it is auto-extending, only they can address that.  Clusters have to have access to their shared storage.  Anything that interrupts that is considered a major issue.  HPE is not new to clusters.  They most likely have some way to resolve this.


    tim

    Saturday, November 30, 2019 2:48 PM
  • Thank you for the inputs. Have involved HPE.

    It was was my first post and it did not accept the screenshot.

    Anyways i have another imp question of why the failover did not happen as 2 of the 8 disks had were offline for the same File server. And i understand it was due to dependencies set incorrectly. Was unable to find any best practices article with reference to setting up dependencies for a file cluster.

    Please let me know if my following understanding is correct:

    - In the cluster admin console, in the file server properties dependencies tab, i need to add all the shared volumes as dependent resources with AND as a logical operator.

    - And ensure that in properties of the shared volume, only the root drive is added as a dependency.

    Thanks

    - S

    Monday, December 2, 2019 12:33 AM
  • We need to understand how you are configuring your shares.  A share automatically configures a dependency upon the disk on which the share is created.  Problems come into play if you try to create two different shares on the same volume.  If you do that, you have different 'macro' resources (the offered share) dependent upon the same 'micro' resource (the underlying disk).  Generally speaking, if you configure shares properly, the cluster will create the proper dependencies automatically.

    tim

    Monday, December 2, 2019 2:01 PM
  • Thanks and Yes. The shares are not configured appropriately.

    For some reason i am not able to add the images to explain it better. Still will try it again.

    Also for the server in question the shares are mounted on the same volume. See fig below.

     

    And the issue that we faced was the failover did not happen due to the dependencies and a particular application that we have requires all these volumes to be online.  The state is in the image below:

    Tuesday, December 3, 2019 6:26 AM
  • No images.

    Do you have different shares defined on the same volume?  If you have defined two file share roles on the cluster, and they both reference the same disk resource, that creates dependencies between the two roles.


    tim

    Tuesday, December 3, 2019 2:01 PM
  • Thanks Tim for your time but this is really unfortunate. I wanted to give you a clear picture so I tried to explain in detail with a long post with images, invested my 20 minutes, and when i tried to post it, again I got an error that i cannot post images till my account is verified. I tried to look up and it talks about a thread to plead please verify my account. The MS account i am using here is the one i used to set up Azure etc. And why let me insert the images in first place then to error out. I will have to fend for myself i guess.

    - S

    Tuesday, December 3, 2019 10:32 PM
  • See https://social.microsoft.com/Forums/en-US/66736ec9-2874-4b1c-8cd0-7736689f1859/how-to-verify-technet-account-to-post-questions-with-pictures for information on how to expedite the verification of your account.


    tim

    Wednesday, December 4, 2019 1:37 PM
  • On it Tim. Once i have the details shall send the details. Thanks.

    - S

    Thursday, December 5, 2019 5:04 AM