locked
AG testing scenario RRS feed

  • Question

  • Hi,

    We have setup an test SQL 2016 AG with a 2 nodes in sysc mode and have a single DB. Now we would like to run a some tests. When we disable the Primary Node nic we can see that AG move from the node1 to node2 and our application has a short freeze but after 10 second can be used.

    my question,

    Does shutdown the SQL service on primary node or shutdown the server that is primary node is allowed in SQL AG?

    Thanks


    Shahin

    Monday, February 24, 2020 3:26 PM

Answers

  • I did kill the SQL service from services console. Problem is when kill the SQL server on Node1 when the Node1 is primary, the AG wont move to the Node2 and Node2 says is resolving. at the same time the failover console show the error:

    The Cluster service failed to bring clustered role 'SQLAG' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

    I did run the get-resource when above failed and see this:

    When do the same test and kill the SQL on Node2 (when is primary) AG moves to the Node1 and no problem.


    Shahin

    Problem has been resolved. I change the Max failures from 1 to 10 in 6 hours and now everything works.


    • Edited by Shahin Tuesday, February 25, 2020 12:43 PM Problem solved
    • Marked as answer by Shahin Tuesday, February 25, 2020 12:43 PM
    Tuesday, February 25, 2020 11:46 AM

All replies

  • You can shut off or stop SQL service on primary.

    Before the secondary can  be ready to act as primary, it must apply the logs it received FROM PRIMARY. You may want to monitor this metrics.

    Also, when you are shutting off the server, what’s the workload like? If it's running larger workload, that could be one reason.

    Otherwise, 10 second for a single DB, sounds little higher than normal but it is normal to see an application blip when SQL service rolls from one replica to another.



    Hope it Helps!!


    • Edited by Stan210 Monday, February 24, 2020 3:50 PM
    Monday, February 24, 2020 3:49 PM
  • Thaks fpr the reply,

    Both Dbs are in sync, but when shutdown the SQL service on the primary and try to open theSSMS with the listener get this error


    Shahin

    Monday, February 24, 2020 7:29 PM
  • can you connect to the secondary directly and check the AG status

    Hope it Helps!!

    Monday, February 24, 2020 7:36 PM
  • Administrative Tools -> Failover Cluster Manager -> Services and applications -> Other Resources -> -> Click Enable auto-start.

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Tuesday, February 25, 2020 5:29 AM
  • Does shutdown the SQL service on primary node or shutdown the server that is primary node is allowed in SQL AG?

    Yes, it's allow to shutdown the primary node. There are two scenarios for shutdown database ,Clean and Unexpected:

    Clean shutdown occurs the primary does not change the synchronized state in the cluster registry.  Whatever the current synchronization state is at the time the shutdown was issued remains sticky.   This allows clean failovers, AG moves and other maintenance operations to occur cleanly.

    Unexpected, can’t change the state if the unexpected action occurs at the service level (SQL Server process terminated, power outage, etc..).   However, if the database is taken offline for some reason (log writes start failing) the connection to the secondary(s) are terminated and terminating the connection immediately updates the cluster registry to NOT SYNCHRONIZED.  Something like failure to write to the log (LDF) could be as simple as an administrator incorrectly removing a mount point.  Adding the mount point back to the system and restarting the database restores the system quickly.

    More detail, you can refer to this article: How It Works: Always On–When Is My Secondary Failover Ready?


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, February 25, 2020 5:57 AM
  • I did run a different test and it looks like that when Primary Node2 I can shutdown the SQL service and the AG mode to the Node1.

    But when Primary is Node1 and shutdown the SQL service on this Node the AG wont move to the Node2 and when connect directly to the Node2 I can see that AG says Resolving:


    Shahin

    Tuesday, February 25, 2020 10:37 AM
  • Hi Uri,

    I cannot find the option you have mentioned!


    Shahin

    Tuesday, February 25, 2020 10:47 AM
  • How are you shutting down, via SSCM ?. If you really want to mimic disaster situation KILL the SQL Server process on Primary from task manager and then see if failover happens or not. A proper shutdown should not be seen by WSFC as failover event 

    Cheers,

    Shashank

    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it

    My TechNet Wiki Articles

    MVP

    Tuesday, February 25, 2020 11:36 AM
  • I did kill the SQL service from services console. Problem is when kill the SQL server on Node1 when the Node1 is primary, the AG wont move to the Node2 and Node2 says is resolving. at the same time the failover console show the error:

    The Cluster service failed to bring clustered role 'SQLAG' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

    I did run the get-resource when above failed and see this:

    When do the same test and kill the SQL on Node2 (when is primary) AG moves to the Node1 and no problem.


    Shahin

    Problem has been resolved. I change the Max failures from 1 to 10 in 6 hours and now everything works.


    • Edited by Shahin Tuesday, February 25, 2020 12:43 PM Problem solved
    • Marked as answer by Shahin Tuesday, February 25, 2020 12:43 PM
    Tuesday, February 25, 2020 11:46 AM