locked
SQL server 2012 cluster failover takes a long time to failover RRS feed

  • Question

  • I have a question about what to expect as a normal failover time for SQL server 2012 clustering. My environment is as followings:

    2 - windows 2008 R2 ent.
    2 - SQL server 2012 Std. (installed as an A/P cluster)
    SAN storage for the SQL install locations.

    Without any real load on the server the failover takes about 21 seconds. I have tested a manual failover with the "Target Recovery Time (Seconds) set to 0 and 1. I have run some updates to a test table to see if anything would increase (or decrease) the amount of time it takes to failover. But everything I do, I get about 20-21 seconds for the failover process to complete.

    Monday, July 15, 2013 8:13 PM

Answers


  • "Target Recovery Time (Seconds)" influences just how often checkpoints will be performed. It helps recovery process to finish within some time limits, but does not help with long transactions. It is not directly related to clustering.
    Tuesday, July 16, 2013 7:41 AM

All replies

  • Hello,

    What does the cluster.log say for the time it takes to bring up all of the resources in the resource group (i.e. is it taking 21 seconds in the log or is it taking 5)? How are you measuring the time?

    -Sean


    Sean Gallardy | Blog | Twitter

    Monday, July 15, 2013 8:41 PM
    Answerer
  • The startup time of SQL Server is highly dependent on the size of the log files.  On startup, it must read the entire log looking for transactions which need to be rolled forward/back.

    Please check the SQL Server log file.

    • Proposed as answer by Fanny Liu Wednesday, July 17, 2013 3:10 AM
    Monday, July 15, 2013 8:47 PM
  • Cluster failover is a process that is never instantaneous. Even the process to begin it can take about 10-30 seconds. So, your failover time is average, even better than average. That time can be much much more increased, because all uncommitted transactions must be rolled back. If you have large and long transactions it can take minutes. Also, all connections to the database are temporarily dropped. So, it requires careful design and avoiding long transactions if you have HA SLA with high number of nine's. Just making something clustered is not enough for high number of nine's.
    • Proposed as answer by Fanny Liu Wednesday, July 17, 2013 3:10 AM
    Monday, July 15, 2013 9:44 PM
  • How much shared  drives on SAN are allocated to your cluster sometimes due to SAN issue,this necessarily not be your case, it takes some time for drives to come up and so takes some time for cluster to come Up completely.Your SQl server depends on shared drives so it the delay is from SAN side services will show delay while coming up.

    So if you can share logs during startup exact cause can be filtered.I also I am not sure if 21 sec time is  ok to live with during failover.failover also depends on your complexity of cluster.


    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

    Tuesday, July 16, 2013 6:23 AM
  • I have a question about what to expect as a normal failover time for SQL server 2012 clustering. My environment is as followings:

    2 - windows 2008 R2 ent.
    2 - SQL server 2012 Std. (installed as an A/P cluster)
    SAN storage for the SQL install locations.

    Without any real load on the server the failover takes about 21 seconds. I have tested a manual failover with the "Target Recovery Time (Seconds) set to 0 and 1. I have run some updates to a test table to see if anything would increase (or decrease) the amount of time it takes to failover. But everything I do, I get about 20-21 seconds for the failover process to complete.

    Try to isolate which part of the failover is taking long.

    Disks taking long to failover - Check with SAN admin

    SQL taking long to come online - Check log files, VLF's

    Is the failover process taking long or is it taking long to come online on the other node?

    Tuesday, July 16, 2013 7:20 AM

  • "Target Recovery Time (Seconds)" influences just how often checkpoints will be performed. It helps recovery process to finish within some time limits, but does not help with long transactions. It is not directly related to clustering.
    Tuesday, July 16, 2013 7:41 AM
  • only a couple of seconds for the drives to flip over. I am trying to get the SQL service to come up faster on the second node.

    Tuesday, July 16, 2013 3:20 PM
  • thanks!
    Tuesday, July 16, 2013 3:20 PM
  • Thanks, I figured before testing that I could get it to failover in whatever amount of seconds I put in this field. Wasn't the case. 21-22 seconds is too long of a time for my environment to be without the SQL server.
    Wednesday, July 17, 2013 2:14 PM
  • Under all circimstances with clusting or mirroring you will have a small outage.  The outage generally is 10-30 seconds depending on your configurations.

    You might want to look at failover cluster with availablity groups.

    http://msdn.microsoft.com/en-us/library/ff929171.aspx

    Wednesday, July 17, 2013 3:32 PM
  • I would also check out this white paper.

    http://msdn.microsoft.com/en-us/library/jj215886

    Wednesday, July 17, 2013 3:48 PM