none
Strange issue in Node Failover RRS feed

  • Question

  • Wa have a three node Lync 2013 Enterprise front-end pool. For webServices we use a LoadBalancer, but for SIP we use DNS Load Balancing. While patching the servers we ran in to a strange issue.

    First Node:

    We first patched the first node, by stopping the Lync Services on that node by the following command

    Stop-CSWindowsService -GraceFull

    Then we wait untill all services have been cleanly stopped, we patched the server (Windows and Lync CUP). Rebooted the server, and start the Lync Services again. No complaints.

    Second Node:

    Same Procedure, no complaints

    Third Node:

    Same procedure, lots of complaints...

    Some people start loosing the connection and fail to reconnect, this untill we start the services again on the third node. Normally the clients should be redirected to the other front-end servers, but that isn't happening.

    We have checked and double checked our DNS, and everything seems to be OK.

    The SRV record _SipInternaltls points to sip.sipdomain.com which is an a record that resolves to each node in the pool. The pool is also registered in DNS (A record) with the IP address of each node. lyncdiscoverinternal.sipdomain.com is pointing to the Load Balancer.


    Answers provided are coming from personal experience, and come with no warranty of success. I as everybody else do make mistakes.

    Thursday, September 26, 2019 1:16 PM

Answers

  • Hi Killerbe,

    Most of the work for the pool failover involves failing over the Central Management store, if it is required. This is important because the Central Management store must be functional when the pool’s users are failed over.

    You could check if the replication status of CMS is True when the third Node did these operations.

    In addition, you could review the Event log in front end server and analyze the error message.


    Best Regards,
    Sharon Zhao


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.

    Friday, September 27, 2019 6:00 AM
    Moderator

All replies

  • Was able to retrieve a client log file and saw that the client is redirected from the first node to the home server which happens to be the third node (which was down at that moment), hence why the connection is failing. Why are the other nodes unaware that the 3rd node is offline?

    Answers provided are coming from personal experience, and come with no warranty of success. I as everybody else do make mistakes.

    Thursday, September 26, 2019 1:54 PM
  • Hi Killerbe,

    Most of the work for the pool failover involves failing over the Central Management store, if it is required. This is important because the Central Management store must be functional when the pool’s users are failed over.

    You could check if the replication status of CMS is True when the third Node did these operations.

    In addition, you could review the Event log in front end server and analyze the error message.


    Best Regards,
    Sharon Zhao


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.

    Friday, September 27, 2019 6:00 AM
    Moderator
  • Hi Killerbe,

    Is there any update on this case?

    Please feel free to drop us a note if there is any update.

    Have a nice day!


    Best Regards,
    Sharon Zhao


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.

    Wednesday, October 2, 2019 7:21 AM
    Moderator