locked
Bug/design flaw in SfB desktop client registration during DR? RRS feed

  • Question

  • Hi all,

    I'm wondering if anyone noticed the same odd (and quite annoying) SfB desktop client behavior during pool failover. This was observed during managed DR test.
    Consider following scenario:
    - two enterprise pools located in two physical locations
    - HLB configured to send traffic to primary pool servers, and if they are down, secondary pool servers
    - complete failover invoked when both pools are online (federation reroute, CMS failover, pool failover, RGS export/import)
    - at this stage all functionality restored at 2nd pool and no strange behavior occurs
    - 1st data center gets completely isolated on network level
    - HLB traffic reaches secondary pool servers

    From that moment on, every time you sign in to SfB desktop client, it takes exactly 2 minutes. You sign out, and signing in back takes 2 minutes again.
    Went thru log files with Snooper and found out server discovery worked fine - lyncdiscover.domain.com (HLB) hits 2nd pool web services. Yet, what is returned to SfB desktop client (remote) is 1st pool Edge access FQDN, which is obviously down. 2nd pool FrontEnds should be aware of this cause 1st pool PoolState=FailedOver as seen in Get-CsRegistrarConfiguration cmdlet output.
    SfB client is hardcoded to try this for two minutes and then it tries 2nd pool Edge FQDN.

    This didn't affect mobile clients, they registered instantly.
    I believe mobile clients use the same preferred discovery method (lyncdiscover.domain.com for remote clients), but not sure whether they register via Edge or directly with FrontEnd via RP (UCWA application on IIS)?

    Anyone ever performed similar DR test?

    thanks,
    Tuesday, September 11, 2018 8:53 PM

All replies



  • This didn't affect mobile clients, they registered instantly.
    I believe mobile clients use the same preferred discovery method (lyncdiscover.domain.com for remote clients), but not sure whether they register via Edge or directly with FrontEnd via RP (UCWA application on IIS)?

    Yes, SFB Mobile clients use the same discovery method, no matter internal users or external users sign in ,SFB moblie client will access the reverse proxy ,then to the IIS in the FE servers. It will not access the Edge server, it is different from your desktop client.

    We recommend that you pair two data centers in the same world region, with high-speed links between them. If not, it incur higher data loss if there is a disaster, because of latency in data replication.



    Best Regards,
    Leon Lu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Wednesday, September 12, 2018 7:26 AM
  • hi Leon,

    thanks, we didn't have problems with mobile clients.
    We also don't have a problem with latency (1ms) and data replication between these two paired pools.

    When primary DC goes down, and we perform failover to secondary DC, this will reoccur.
    2 minutes wait time during sign-in process for SfB desktop client remote users.
    I didn't check whether this affects internal clients or IP phones.

    Either our deployment is misconfigured, or it's a bug, or this is a design flaw.
    I realize users are still homed on 1st pool, but since its PoolState=FailedOver I wouldn't expect 2nd pool web services to send clients to 1st pool Edge servers which have next hop association to 1st pool FrontEnds that were taken out of service.


    Wednesday, September 12, 2018 1:31 PM
  • Hi 485 Ambiguous,

    You could install a fiddler, check the network package, if SFB client will still connect to the primary pool, then connect time out, re-connect to the backup pool.


    Best Regards,
    Leon Lu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Monday, September 17, 2018 10:22 AM
  • Hi ,

     

    Do you have any updates? If the reply help to you, please mark the reply as answer.


    Best Regards,
    Leon Lu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Monday, September 17, 2018 10:27 AM
  • I don't think I need another network trace to confirm what has happened.
    And certainly we won't invoke failover just to collect the log files..

    We have reviewed and communicated to users what will be the experience before, during and after failover and failback.
    One of the sources of information was 
    https://docs.microsoft.com/en-us/skypeforbusiness/plan-your-deployment/high-availability-and-disaster-recovery/user-experience
    This attempted registration post-failover with primary pool Edge servers is nowhere documented.

    Section "User experience during failover", states "When the user logs back in, they will log in to the backup pool.".
    I don't think it's correct statement.
    Tuesday, September 18, 2018 6:53 AM
  • Hi ,

    We did not receive bug about this issue.

    I still suggest you to analyze this network package.


    Best Regards,
    Leon Lu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Friday, September 28, 2018 9:47 AM
  • have a look at 52:50-53:55 minute.

    This is exactly what we have in place, yet the user experience is not seamless.

    Simple URLs and lyncdiscover are taken care of in case of DR when 1st DC is down.
    Lyncdiscover CAN always be resolved be client.
    Microsoft advertises running in DR scenario as seamless for clients which is false statement.

    Let me repeat - remote users Sfb desktop app is slamming for two minutes against 1st Pool Edge (which is dead) until it times out and tries 2nd pool Edge. It gets 1st Edge pool FQDN from 2nd FE pool web services (lyncdiscover).
    This behaviour is not experienced with SfB mobile app (also locating services via lyncdiscover) running in DR mode so I'm making assumption this is a bug or design flaw in SfB/SfB desktop app.

    I'm not saying anyone is aiming to run in DR mode (pool failover) for extended period, but if this is the experience, Microsoft should update the documentation.


    Tuesday, October 2, 2018 4:13 PM