none
Unable to manually failover groups RRS feed

  • Question

  • We are currently running a two node cluster on Windows Server 2003 and are unable to manually move groups from one node to another.  However, if we shut down the node that is currently hosting the groups, all of the groups will successfully fail over to the second node.  The groups contain SQL Server 2005 resources.

    When we try to move the group manually, it appears that all of the resources move over, but then the SQL Server and SQL Server Agent fail to start.  These are the errors that appear in the event viewer:

    [sqsrvres] OnlineThread: Error 5b4 bringing resource online.
    [sqsrvres] OnlineThread: ResUtilsStartResourceService failed (status 5b4)

    Those errors just indicate a timeout error though.  I'm not sure that a solution would be to increase the timeout period though.  When shutting down the node hosting the resources, they all immediately failover to the second one.

    Any ideas would be appreciated.
    Tuesday, March 9, 2010 3:52 PM

All replies

  • Hi,
    Has anything changed with the cluster recently, i.e. IP Addresses, service account details?

    Thanks,Andrew
    Tuesday, March 9, 2010 4:48 PM
  • Check the dependencies for the SQL Engine.  Agent is dependent on Engine so that is likely why it won't start.

    Did you by chance use any existing domain groups such as Domain Admins as your security groups during installation?  That can mess up a SQL failover cluster.


    Geoff N. Hiten Principal Consultant Microsoft SQL Server MVP
    Tuesday, March 9, 2010 5:57 PM
    Moderator
  • Nothing has changed on either server as far as network, account or cluster configurations go.  However, Windows updates were installed a couple weeks ago.  I'm not sure if the problem existed before that point since we just discovered it about a week ago.  The manual failovers did work in the past though.

    It does make sense that the SQL Server Agent isn't starting since it is dependent on the Engine, but that wouldn't explain why the Engine isn't starting.

    The SQL Server services are running off of an account we created in Active Directory and the Domain Admins group was removed from each server after they were added to the domain.

    The part I find really strange is that the resources actually move nodes and start up, but only as long as the node they are initially running on is shut down and the resources are forced to move.  It is only when I go into cluster administrator and tell the group to move from one node to another that it doesn't work.

    Thanks for the help guys!

    Tuesday, March 9, 2010 6:31 PM
  • What errors do you see in the SQL Server Errorlogs?
    This posting is provided "AS IS" with no warranties, and confers no rights.
    My Blog: http://troubleshootingsql.wordpress.com
    Twitter: www.twitter.com/banerjeeamit
    SQL Server FAQ Blog on MSDN: http://blogs.msdn.com/sqlserverfaq
    Thursday, March 11, 2010 9:38 AM
  • What were the groups you assigned to SQL when SQL was installed to the cluster.  The accounts aren't as critical as the groups.  You should see them as security principals in the Logins list.

    Geoff N. Hiten Principal Consultant Microsoft SQL Server MVP
    Friday, March 12, 2010 6:15 PM
    Moderator
  • Running into a similar issue...

     

    Have a SQL 2000 cluster on Win 2003 SP2.  Networking requested change of IPs on the host (physical servers) and the cluster IP.  Which all seemed to work fine folling Kb230356 & Kb244980.  Resource IPs all came online, but SQL server, Agent, and Fulltext services get hung at Online Pending...

     

    App event logs gives two notable errors:

    [sqsrvres] OnlineThread: ResUtilsStartResourceService failed (status 5b4)

    [sqsrvres] OnlineThread: Error 5b4 bringing resource online.

     

    An able to bring databases online from commandline via sqlservr.exe -c -s SQL1 or SQL2 if that helps...

     

    Help!

    Thursday, July 28, 2011 12:47 PM
  • Solution for me was to change the Pending timeout for each SQL server instance from 180 to 360 sec.  It seems that now it takes close to 5 mins to come online...

    Reference I used to resolve my issue: http://www.sqldbadiaries.com/2010/12/15/one-more-day-with-the-sql-server-cluster-resource-not-coming-online

     

     

    Thursday, July 28, 2011 1:25 PM
  • Actually I agree with Geoff that (for me) it sounds like your dependency is out, however I would point towards your shared storage. Ensure that your SQL Service in the cluster group has dependencies set for all your Shared Storage assigned to the group. My view is that shutting down the other node is removing reservations on the disks.

    Unfortunately for your Cluster we are talking about win2003 and sql2005 so we are not talking about persistent reservations and disk arbitration is a little flaky at times.

    I'm also interested in how your storage is being provisioned to your cluster. Are we talking about DAS by any chance?

     

     


    Regards,
    Mark Broadbent.

    Contact me through (twitter|blog)

    Please click "Propose As Answer" if a post solves your problem
    or "Vote As Helpful" if a post has been useful to you
    Sunday, August 14, 2011 5:46 PM