locked
issue FE server freezes\hungs RRS feed

  • Question

  • appreciate your assistance here

    we have this pool of 10 FEs SFB 2015 servers, an incident took place where one of the servers had the OS hung\freeze, the server was pingable but we could not RDP on it nor use any remote commands to reboot it.
    (OS hung due to faulty application installed on the server, so there were no event logs logged in the windows but on the lync application level it only logged https requests not responding)

    in this case the impact was that the users registered on this server as primary registrar could not sign-in, as there was no failover to secondary registrar.

    the issue here is that

    1.we could not detect the issue as nothing reported by SCOM, and that the general checks and the windows fabric all seemed ok

    2.  there was no failover to secondary registrar (or promotion off secondary reg to primary reg)  as it was not detected as 'broken' server- as its pingable but cannot RDP- hence no sign-ins as server is not responding to sign-in requests.


    issue was only discovered by users complaints.

    any idea how can we tackle the 2 points

    1. how to discover the issue in early stages

    2.why isnt it failing over to secondary registrar ? and if it works as designed how can I force failover - or take out the server -to tackle the issue asap?

    thanks in advance





    • Edited by Doody88 Monday, June 11, 2018 2:37 PM
    Monday, June 11, 2018 10:18 AM

All replies

  • Hi Doody88,

    Did you enable the remote management and Remote desktop like the following screenshot?

    Please run the Get-CsManagementStoreReplicationStatus in the SFB management shell,check the repulation status.

    1. how to discover the issue in early stages

    No.i canot find a effective way to find this issues in early stages,you just deploy SFB monitoring ,check the monitor report and collect the users’ feedback.

    2.why isnt it failing over to secondary registrar ? and if it works as designed how can I force failover - or take out the server -to tackle the issue asap?

    Yes ,you could change the registrar poolstate manually,bu this cmd Set-CsRegistrarConfiguration,you could refer to the following link.

    http://masteringlync.com/2013/11/04/understanding-pool-failover-more-than-you-wanted-to-know-most-likely/

    Note: Microsoft is providing this information as a convenience to you. The sites are not controlled by Microsoft. Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there. Please make sure that you completely understand the risk before retrieving any suggestions from the above link.


    Best Regards,
    Leon Lu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Tuesday, June 12, 2018 5:48 AM
  • Hello Leon-Lu,

    yes the remote management and desktop are enabled.

    for point number 2, I did not mean 'pool' failover, let me try to explain, so the routing group is assigned to 3 servers :{primary registrar, secondary registrar, secondary idle}, so my case is that the primary registrar had the OS hung\freeze - in the matter that it did not respond to https requests from other servers, so yes it was pingable but cannot RDP on it- so the users who are registered on this server as primary registrar would not be able to sign-in.

    my question why these impacted users did not failover to the secondary registrar? how can we force the users if they failed to sign-in to a certain server to sign-in on their secondary registrar?

    P.S when I checked the PoolFabric it was ok, which means it didnot detect that there was an issue with this server, and the only solution for it was to reboot the server

    Wednesday, June 13, 2018 8:40 AM
  • Hi ,

     

    Please run the Get-CsPoolFabricState,check the three FE servers’ s Health and status,if you have one server hang,check the another two servers.their health and status must be Ok and Up



    Best Regards,
    Leon Lu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Thursday, June 14, 2018 10:31 AM
  • Hello Leon,

    thanks for answering and your efforts.

    will yes I checked the pool fabric indeed and it showed that all servers, including the hung one, are in a normal state - OK and UP.

    but still the server was still hung and users were not able to sign-in, and it was not detected as you can see.

    so any clue how to force this hung server to step down and for the secondary to take over - other than restarting it?

    Thursday, June 14, 2018 11:44 AM
  •  

    Hi ,

     

    You could run the following cmd in other FE server, This cmdlet moves all services to other Front End Servers in the pool, and takes this server offline.

    Invoke-CsComputerFailOver -ComputerName <Front End Server was crash or hang >



    Best Regards,
    Leon Lu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Friday, June 15, 2018 10:42 AM
  • Hi,

     

    Are there any update for this issue, if the reply is helpful to you, please try to mark it as an answer, it will help others who has similar issue.


    Best Regards,
    Leon Lu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Friday, June 15, 2018 11:41 AM
  • Hello Leon, 

    I appreciate your input, I will check and certainly give you feedback :)

    Tuesday, June 19, 2018 11:25 AM
  • Ok, waitting for your update.

    Best Regards,
    Leon Lu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Monday, July 2, 2018 8:45 AM