none
Time to manually failover? RRS feed

  • Question


  • When I force Mirroring to fail over while watching the Mirroring Monitor I notice something interesting. Before I failover, the servers are synchronized and there is zero unsent log and zero unrestored log. When I failover, the mirror changes to "restoring", stays there for about four minutes, switches to "Principal, restoring", and then the unsent log and unrestored log numbers jump to values in the tens of megabytes, and we spend time restoring.

    We wait about five minutes between requesting the failover and having the servers back up again, answering requests.

    Why does failover take so long? We've been thinking that failover shouludn't take so long because the servers think they're synched. What causes the log spooling spike? Is the state refelcted by the mirroring monitor inaccurate?

    Thursday, March 8, 2007 7:09 PM

All replies

  • Can you give us more information on your configuration?  What is the operating mode?  Do you have transactions running when you manually fail it over?
    Saturday, March 10, 2007 11:50 PM
    Moderator
  • Hi, Michael!

    We're using SQL Server 2005 Standard Edition across two machines, each with two dual-core Opteron processors. Each machine is running Windows Server 2003 Win64, and has 16 gigs of memory.

    We're running the the "high availability" operating mdoe.

    We certainly observe this behaviour when there are transactions running and we manually fail-over. I can't remember for sure if we also observe the behaviour I've described when we don't have transactions running at the point of the failover, but I believe we do.

    .B ekiM

    Sunday, March 11, 2007 4:35 AM
  • I've found out that we've got the involved servers set to use a non-zero check pointing interval. We've got it set to checkpoint every ten minutes.

    We have observed the unexpectedly slow fail over with no transactional load on the servers.
    Monday, March 12, 2007 6:59 PM
  • Manual failover is not going to be the same as a fast failover.  For a manual failover via ALTER DATABASE there are a variety of synchronizations and catchups that occur to make the operation kinder to the connected users. 

     

    For a fast failover (kill process or break network connection), the only real reason I would expect to have a significant amount of time is if there is a redo backlog.  There are performance counters to monitor that so you should be able to track what is going on during the time interval on the mirror.

    Wednesday, March 14, 2007 6:19 PM
  • Hi Peter!

    Thanks for your response. Should we then simply take the server down instead of  failing over with the Mirroring Manager when we want to do maintenance?

    .B ekiM
    Wednesday, March 14, 2007 8:00 PM
  • That might actually work out faster in many cases.

     

     

    Wednesday, March 14, 2007 9:05 PM