locked
Issues with Windows Fabric. RRS feed

  • Question

  • Hello,

    Quick Background, We have a supported but non recommended configuration where we have 2FE and 2BE servers, along with 1 single instance of SQL. 

    I know that this is non recommended and that we should have 3 for proper quorum but we don't and as much as I have tried its not changing anytime soon.  (Mostly because we have setup side by side install of Skype 2015 which has been setup with 3 FE  2BE and Mirrored SQL). 

    We have been getting issues on lync system with errors 22/23/24,  When this happens we check Get-cspoolfabricstate,  each time we have checked the Replica Instances for MCUFactory Services show as follows

    Address:FE1 - Primary :3 Secondary :3

    Address:FE2 - Primary :3 Secondary :3

    From everything I know that is incorrect, they shouldn't have the same number,  or is it actually correct?

    I ask because the recommendation to fix this was to try a reset-cspoolregistrarstate -restettype quorumlossrecovery

    and if that failed then to shut everything down and bring back up again from complete powered down state.

    well command made no change.  So shut down all servers, bring up 1st Front end server and then get

    Address:FE1 - Primary :6 Secondary :0

    Wait about 10 minutes for everything to settle, Bring up 2nd server wait about 10 minutes and it goes to

    Address:FE1 - Primary :4 Secondary :2

    Address:FE2 - Primary :2 Secondary :4

    Then about 10/15 minutes later it goes to

    Address:FE1 - Primary :3 Secondary :3

    Address:FE2 - Primary :3 Secondary :3

    Windows fabric shows errors 18527 Inbuild FT is in quorum loss state

    Windows fabric also shows errors 18433, Failed to send deletereplica to node with an is not exact match fault

    I have also seen 18432 Inbuild failoverunit of namingservice:Namingstoreservice

    Any ideas on if this is a case of Quorum is broken, in which case what can we do to resolve because we always end up at 3-3 other services are fine.

    One recommendation is to run a full reset of cspoolregistrarstate but docs say don't do that without opening a call with Microsoft so thought I would come here and see if anyone has any ideas before I do that :)



    Friday, July 3, 2020 10:33 AM

All replies

  • Hi Nicholas_Farmer,

    You could check the replication status of servers by this command: Get-CsManagementStoreReplicationStatus.

    If the status is False, you can try to force the replication by this command: Invoke-CsManagementStoreReplication.

    The Full Reset will rebuild the local Skype for Business server databases and the process is long and resource-intensive. If this is your product environment, you must be very careful about this operation.

    For reference, you can read this article: https://lyncdude.com/2014/12/29/simple-understanding-of-lync-windows-fabric-failover/.

    Hope it will be helpful.

    Note: Microsoft is providing this information as a convenience to you. The sites are not controlled by Microsoft. Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there. Please make sure that you completely understand the risk before retrieving any suggestions from the above link.


    Best Regards,
    Sharon Zhao


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.

    Monday, July 6, 2020 6:02 AM
  • Hi,

    We are aware that doing a full reset is a long and intensive task with associated risks,  I have also fully read all articles on understanding lync-windows fabric failover. 

    Instead of a full reset we attempted a system reset of quorum over the weekend and whilst running the command just timed out without completing,  the score on mcu factory has since stayed at 4-2 and 2-4. We have different errors relating to fabric appearing now, but Quorum is at least reporting correctly.

    Thanks.

    Monday, July 6, 2020 9:11 AM
  • Hi Nicholas_Farmer,

    Do you deploy pool pairing in your environment?

    If not, it recommends you configure it for disaster recovery. For more details, please refer to this article: https://docs.microsoft.com/en-us/skypeforbusiness/plan-your-deployment/high-availability-and-disaster-recovery/disaster-recovery.


    Best Regards,
    Sharon Zhao


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.

    Friday, July 10, 2020 9:29 AM