none
Windows 2008 R2 Cluster refuses to start on one node.

    Question

  • Hi,

    I'm having an intresting error on a SQL Cluster

    System is 2*Windows 2008 R2 SP1 & SQL 2008 R2 with A/P-cluster configuration.

    After a crash which happened by unknown reasons one of the nodes refuses to start. Eventlog gives me the following error:

    Eventid: 1574 - The failover cluster database could not be unloaded. If restarting the cluster service does not fix the problem, please restart the machine.

    The KB-articles about it hints that I should restart the server. And that has been done. I even had the whole cluster down and started them in diffrent orders without any success.

    Furthermore when I look into the HKLM-keys of the registry i can not find the "cluster" entry. Ive been pondering with the idea of exporting the "HKLM\cluster"-key from the living cluster node.

    The cluster log clearly gives indications that its something fishy.

    ---------------

    00000ad0.0000093c::2011/09/07-13:04:58.867 INFO  [NETFT] Disabling IP autoconfiguration on the NetFT adapter.
    00000ad0.0000093c::2011/09/07-13:04:58.867 INFO  [NETFT] Disabling DHCP on the NetFT adapter.
    00000ad0.0000093c::2011/09/07-13:04:58.867 DBG   [NETFT] Disabling DHCP on NetFT interface name ethernet_11.
    00000ad0.0000093c::2011/09/07-13:04:58.867 INFO  [CS] Starting DM
    00000ad0.0000093c::2011/09/07-13:04:58.867 INFO  [DM] Node 1: Reading quorum config
    00000ad0.0000093c::2011/09/07-13:04:58.867 DBG   [DM] Unloading Hive, Key \Registry\Machine\Cluster.restored, discardCurrentChanges true
    00000ad0.000005b4::2011/09/07-13:04:58.867 INFO  [CS] Disabling connection security.
    00000ad0.00000960::2011/09/07-13:04:58.867 DBG   [NETFTAPI] received NsiAddInstance  for 169.254.1.91
    00000ad0.0000093c::2011/09/07-13:04:58.867 INFO  [DM] Key \Registry\Machine\Cluster.restored does not appear to be loaded (status STATUS_OBJECT_NAME_NOT_FOUND(c0000034))
    00000ad0.0000093c::2011/09/07-13:04:58.867 WARN  [DM] Node 1: Failed to unload restored hive from the registry with error STATUS_INVALID_PARAMETER(c000000d)
    00000ad0.0000093c::2011/09/07-13:04:58.867 INFO  [DM] Node 1: loading local hive
    00000ad0.0000093c::2011/09/07-13:04:58.867 ERR   [DM] Node 1: failed to unload cluster hive, error 2.
    00000ad0.0000093c::2011/09/07-13:04:58.867 ERR   Hive unload failed (status = 2)
    00000ad0.0000093c::2011/09/07-13:04:58.882 DBG   Hive unload failed: set netft heartbeat interval to 900 seconds
    00000ad0.0000093c::2011/09/07-13:04:58.882 ERR   Hive unload failed (status = 2), executing OnStop
    00000ad0.0000093c::2011/09/07-13:04:58.882 INFO  [DM]: Shutting down, so unloading the cluster database.
    00000ad0.0000093c::2011/09/07-13:04:58.882 INFO  [DM] Shutting down, so unloading the cluster database (waitForLock: false).
    00000ad0.0000093c::2011/09/07-13:04:58.882 WARN  [DM] Trying to Unload when no Hive is loaded, ignored
    00000ad0.0000093c::2011/09/07-13:04:58.882 ERR   FatalError is Calling Exit Process.


    MCITP, MCP, VCP, AASE & Insane!
    Wednesday, September 07, 2011 1:18 PM

Answers

  • On the problem node, inspect %systemroot%\cluster directory and see if the file 'clusdb' exists.  If not, you can copy the file from a working node.  If you do this, you will need to delete any clusdb.X.container and clusdb.blf files before starting that node.

    Thanks


    Chuck Timon Senior, Support Escalation Engineer (SEE) Windows Beta Engineer Microsoft Corporation
    Thursday, September 08, 2011 4:18 PM

All replies

  • how about to evict the problematic node and then add it back to the cluster?
    Thursday, September 08, 2011 8:50 AM
  • On the problem node, inspect %systemroot%\cluster directory and see if the file 'clusdb' exists.  If not, you can copy the file from a working node.  If you do this, you will need to delete any clusdb.X.container and clusdb.blf files before starting that node.

    Thanks


    Chuck Timon Senior, Support Escalation Engineer (SEE) Windows Beta Engineer Microsoft Corporation
    Thursday, September 08, 2011 4:18 PM
  • I try to copy the clusdb-file but get error-message telling the file is in use. Do I need to stop the cluster service to be able to copy it or is there any other nifty way around it?


    MCITP, MCP, VCP, AASE & Insane!
    Monday, September 12, 2011 12:17 PM
  • Need to stop the cluster service and unload the hive using regedit.

    Chuck


    Chuck Timon Senior, Support Escalation Engineer (SEE) Windows Beta Engineer Microsoft Corporation
    Monday, September 12, 2011 12:18 PM
  • Did copying this file over fix the issue?
    Thursday, October 20, 2011 2:22 AM
  • I had the following issue on a Windows 2008 cluster and the filecopy worked.

    Interesting to know what caused this issue....

    Tuesday, March 13, 2012 12:57 AM