none
How to remove warning regarding use of No Majority (Disk Only) Quorum model

    Question

  • Hi

    Before installing the first of a number of WS2008 Failover clusters, I thought long and hard about the pros and cons of the different quorum models available. My company also have a large enterprise SAN with a completely redundant SAN Fabric. In short; it not a "single point of failure" ....

    After mature considerations I have concluded that the No Majority (Disk Only) Quorum model is the best way to go. 

    From my point of view, the Majority quorum model is not the best option, mainly because loosing the majority, then you loose the whole cluster.
    With the No Majority (Disk Only) we have to loose the entire SAN or all of the SAN Fabric, - and in the end I have to conclude that it is more likely that we actually loose all power or connectivity to an entire data center that loosing the SAN. In fact, the SAN is properly the last man standing in case of a major disaster...

    But now I am facing a warning (At the Quorum Configuration node) on all my WS2008 clusters, telling that the quorum disk is a single point of failure for the cluster ... call me sensitive, but that is just plain annoying !

    How can this warning be removed ? Does anyone know about a patch or Registry value that sort of "acknowledge" that you have considered the situation thoroughly, and are aware of the official recommendation, but the Disk Only quorum model is actually the best option in your case ?

    Thanks!


    Jakob Nielsen
    JAK
    Wednesday, November 4, 2009 6:23 PM

Answers

  • There is no way to remove this warning, other than to change the quorum model to a recommended quorum model.

    If you are running a cluster with an even number of nodes, I don't see a good reason why you would choose "Disk Only" quorum model over the "Node+Witness Disk" quorum model. Take a 2-node cluster with a Node+Witness model. You have the total of 3 votes (1 for each node plus 1 for the witness) and you only need 2 votes to maintain the cluster. Since you have a reliable SAN, you'll always have this single vote available, which means that you need to have at least one of the nodes available in order to have a functional cluster...no nodes = no cluster, which of course is the same as your Disk Only quorum.

    The benefits of this new quorum model occur when you do actually have troubles connecting to the witness disk. SAN certainly protects the disk, but what if you have some sort of filesystem corruption or some other issue accessing the volume/filesystem on the disk? In this case, you cluster would fail due to issues connecting to this disk, and you would not be able to bring the cluster back online until this was corrected if you are using the Disk Only quorum model. If you were using the Node+Witness quorum models, the disk would simply be one vote that is unavailable, but since you have the nodes online, the cluster would remain online.

    The problem can get bigger if you're using bigger clusters. I'll give a real world example where this Disk Only quorum model can cause major issues. I have a customer that chose to implement the "Disk Only" quorum model for their 8-node cluster. This customer experienced some network driver issues on the node that owned the quorum disk which is caused the other nodes to lose network connectivity to quorum owner. This caused 7 of their 8 cluster nodes to terminate the cluster services at the same time on random intervals due to this network glitch. If they were using the Node+Witness quorum model, they only would have lost that single vote when the node lost network access.

    If you are running a cluster using an odd number of nodes, I actually do agree with you that the Disk Only quorum model can make more sense. So personally, I would just add/remove a node and go with a Disk+Witness model.

    Hope this helps.


    Visit my blog about multi-site clustering - http://msmvps.com/blogs/jtoner
    Wednesday, November 4, 2009 7:34 PM
    Moderator

All replies

  • Hi Jakob,

    I think you also have to take the integrity of the disk into account not only the availability. Experiences from 2000 and early 2003 Clusters have shown, that a disk corruption is critical to the cluster even if the disk is physical available.
    To brink the clusterb online wasn't "a piece of cake" as well.
    This was the reason why the disk/node majority was introduced.

    This is not answering your questions, but maybe helpful for your risk consideration.

    Bye
    ThorstenWujek
    Wednesday, November 4, 2009 6:43 PM
  • Hi Thorsten

    Thanks for your reply,

    The Clus DB is present at the C-drive on each cluster node, in the Registry on each cluster node and finally present at the Quorum disk (when using the Disk Only model - not at a Witness Fileshare).

    Corrupting the Quorum disk is very easy to fix and such rare issues can be fixed very fast. I must admit, that besides in troubleshooting labs, I have never seen og ever experienced this situation myself.

    Off cause you should exclude the Quorum disk from antivirus scanning ... but when all that is done, I do not really see what all the fuss is about .... ?


    Regards


    Jakob Nielsen
    Wednesday, November 4, 2009 7:00 PM
  • There is no way to remove this warning, other than to change the quorum model to a recommended quorum model.

    If you are running a cluster with an even number of nodes, I don't see a good reason why you would choose "Disk Only" quorum model over the "Node+Witness Disk" quorum model. Take a 2-node cluster with a Node+Witness model. You have the total of 3 votes (1 for each node plus 1 for the witness) and you only need 2 votes to maintain the cluster. Since you have a reliable SAN, you'll always have this single vote available, which means that you need to have at least one of the nodes available in order to have a functional cluster...no nodes = no cluster, which of course is the same as your Disk Only quorum.

    The benefits of this new quorum model occur when you do actually have troubles connecting to the witness disk. SAN certainly protects the disk, but what if you have some sort of filesystem corruption or some other issue accessing the volume/filesystem on the disk? In this case, you cluster would fail due to issues connecting to this disk, and you would not be able to bring the cluster back online until this was corrected if you are using the Disk Only quorum model. If you were using the Node+Witness quorum models, the disk would simply be one vote that is unavailable, but since you have the nodes online, the cluster would remain online.

    The problem can get bigger if you're using bigger clusters. I'll give a real world example where this Disk Only quorum model can cause major issues. I have a customer that chose to implement the "Disk Only" quorum model for their 8-node cluster. This customer experienced some network driver issues on the node that owned the quorum disk which is caused the other nodes to lose network connectivity to quorum owner. This caused 7 of their 8 cluster nodes to terminate the cluster services at the same time on random intervals due to this network glitch. If they were using the Node+Witness quorum model, they only would have lost that single vote when the node lost network access.

    If you are running a cluster using an odd number of nodes, I actually do agree with you that the Disk Only quorum model can make more sense. So personally, I would just add/remove a node and go with a Disk+Witness model.

    Hope this helps.


    Visit my blog about multi-site clustering - http://msmvps.com/blogs/jtoner
    Wednesday, November 4, 2009 7:34 PM
    Moderator
  • It has happend in productive environments several times, and you loose your whole cluster, with the risk hitting your SLAs. But if it works for your SLAs than it is the right choice.

    Regards

    Thorsten
    ThorstenWujek
    • Proposed as answer by Frank_Be Wednesday, December 17, 2014 4:30 PM
    Wednesday, November 4, 2009 7:39 PM
  • I would agree with John T. And to give you another example of how Disk Only can affect a cluster:

    Disk Only still has a single point of failure... namely the file system itself

    corrupt that, and your cluster is down hard

    Do the same with Node+Witness Disk, then only the Witness Disk will fail to come online, but your cluster is still operational


    Go with Node+Witness Disk if you have an even number of nodes !

    rgds,
    edwin.
    Thursday, November 5, 2009 11:16 AM
    Moderator