locked
Pool failover does not work. Need help. RRS feed

  • Question

  • This is the second time I am trying to configure pool failover. Not sure what I am doing wrong. The first time I tried with a Standard FE pools, now it is an Enterprise FE pools.

    In Topology I configured Pool Resiliency with Automatic Failover and failback 30 seconds for both and published topology.

    After I run Install-CsDatabase and Step2 in Deployment wizard, on all servers I started Backup service and run Invoke-CsBackupServiceSync for both pools. When I run Get-CsBackupServiceStatus command, everything looks good.

    I created and configured additional SRV record that points to the second pool with Height and Weight 10.

    When I stop all services on Pool1, my client disconnects and never reconnects back. Some time later, after very long wait, it will prompt me for a password, but I cannot sign in anyway. Failover does not work.

    I am afraid to run Invoke-CsManagementServerFailover command, because with my first time (As I stated above I tried it before), this command completely destroyed my deployment. I had to restore DBs from backup and lots of headache with everything else.

    Can someone help? What am I doing wrong?


    Thank you. Eric.

    Monday, June 6, 2016 4:15 PM

All replies

  • Which pool is your cms management store at?

    Here is a good article about pool failover during DR: https://technet.microsoft.com/en-us/library/jj204678%28v=ocs.15%29.aspx?f=255&MSPPError=-2147217396

    If pool with CMS fails, CMS has to be moved to backup pool.

    Is your Edge pool also in failover?

    Did you test on internal or external network?


    Please mark as helpful if you find my contribution useful or as an answer if it does answer your question. That will encourage me - and others - to take time out to help you. Thank you! Off2work

    Monday, June 6, 2016 4:54 PM
  • The pool that I stopped services on has CMS. It is not failing over and user is logged out.

    At this time I have only one Edge server and both pools use this server.

    I saw the article before and, when I run tests, everything looks good, but I worry about running test failover. The user would not stay logged in, so I believe something is wrong.

    I test on internal network.


    Thank you. Eric.

    Monday, June 6, 2016 5:24 PM
  • Hi,

    then you have to do as described in the article above and create cms store on backend pool.


    Please mark as helpful if you find my contribution useful or as an answer if it does answer your question. That will encourage me - and others - to take time out to help you. Thank you! Off2work

    Monday, June 6, 2016 5:47 PM
  • Ok, I finally figured out the article.

    I do not need to failover the Edge server.

    Steps 1, 2, and 3 are good.

    Step 4 tells me what to do in case if I use Lync 2010 or 2013.

    Step 5 tells me what to do if I have Mirror SQL. I do not have it.

    Step 6 tells me what to do if CMS is on Lync 2010 - N/A

    Step 7 tells me how to failover the pool.

    The problem is that client does not stay logged in, but it should. I tried to do failover with my previous deployment and it destroyed my SfB deployment completely. I do not want to do it until I figure out why the client would not stay logged in. Maybe this will solve my problem with pool failover.

    Can you help?


    Thank you. Eric.


    • Edited by KPABA Monday, June 6, 2016 6:23 PM
    Monday, June 6, 2016 6:17 PM
  • Hi,

    then you have to do as described in the article above and create cms store on backend pool.




    I do not use Mirror database. That is not applicable for my situation.

    Thank you. Eric.

    Monday, June 6, 2016 6:18 PM
  • Hi , 

    When you stop the services in the main pool and simulate the pool failover scenario--- Pool Failover does not happen automatically , it need to be initialized using invoke-cspoolfailover and then only the clients will sign in to the back up pool temporarily till you bring back the main pool using  invoke-cspoolfailback.

    https://technet.microsoft.com/en-us/library/jj205189.aspx


    Linus

    Monday, June 6, 2016 7:00 PM
  • I thought that user should stay logged in. Or at least should be able to login back. When I test it, it signs me out and I cannot sign in.

    Thank you. Eric.

    Monday, June 6, 2016 7:05 PM
  • THe user will not be stayed login , User will get signed out and will be able to sign in back again once you run the invoke-cspoolfailiover cmndlt. 


    Linus

    Monday, June 6, 2016 7:22 PM
  • If user is not signed in and I have to manually run the failover, why is there an option to automatically to fail over to another pool?

    I always thought that user should be able to use Lync client, but with some limitations.


    Thank you. Eric.

    Monday, June 6, 2016 7:25 PM
  • Yes , They have it documented like  that but ideally its not an automatic scenario. pasting the info from technet 

    The Automatic failover and failback for Voice option and the associated time intervals in Topology Builder apply only to the voice resiliency features that were introduced in Lync Server 2010. Selecting this option does not imply that the pool failover discussed in this document is automatic. Pool failover and failback always require an administrator to manually invoke the failover and failback cmdlets, respectively.

    https://technet.microsoft.com/en-us/library/jj204773(v=ocs.15).aspx


    Linus

    Monday, June 6, 2016 7:54 PM
  • Hi Eric,

    I have run many DR tests in the past and have not had any major issues. in theory if the deployment is sound and both pools have been technically validated, this should just work.

    Here's what happens:

    - When a pool fails after a short period users will sign out

    - Once the automatic fail over threshold is met, the client will be able to sign in to the backup pool. This should happen automatically. Make sure you pool is working and users can sign in under BAU, and make sure your DNS records are correct

    - At this point users are in a limited functionality mode and you have the option to failover the pool

    - When you fail over the pool there are a few things you need to consider so make sure you read up on the docs

    - Once you have failed over, user functionality should be restored

    Here is a summary of the process and links to TechNet docs - https://ucgeek.co/2016/06/summary-pool-failover-failback-skype-business/


    Andrew Morpeth
    Lync and Skype for Business Consultant (NZ)
    Blog: http://ucgeek.co
    Twitter: @AndrewMorpeth

    Please remember, if you see a post that helped you please click "Vote As Helpful" and if it answered your question, please click "Mark As Answer"

    Andrew Morpeth MVP

    Monday, June 6, 2016 8:44 PM
  • Andrew, thank you for your reply.

    What does, "Users can sign in under BAU" mean? What is 'BAU'?

    I believe that the DNS records are correct, but users cannot login after the failure. I will double check everything in the morning.

    The article that you sent me and the one above: https://technet.microsoft.com/en-us/library/jj204678%28v=ocs.15%29.aspx?f=255&MSPPError=-2147217396, are very similar, but the Microsoft article says to run step 6 if it is on Lync 2010, but your article just say to run it. Should I run it or not?

    The autodiscover record for clients is lyncdiscoverinternal.domain.local and it points to the primary pool. You said: "Once the automatic fail over threshold is met, the client will be able to sign in to the backup pool", but how will it work if the record points to the failed pool?


    Thank you. Eric.

    Tuesday, June 7, 2016 12:48 AM
  • Hi KPABA,

    Base on my understanding, if it is Skype for Business Server 2015\Lync Server 2013 in your environment, you can follow both TechNet link and Andrew Morpeth link above.

    As the link for Andrew Morpeth is for Skype for Business Server 2015, if it is Lync Server 2010, you need to follow the step 6 in TechNet link above.

    Best Regards 


    Please remember to mark the replies as answers if they help, and unmark the answers if they provide no help. If you have feedback for TechNet Support, contact tnmff@microsoft.com.

    Eason Huang
    TechNet Community Support

    Tuesday, June 7, 2016 10:42 AM
  • Step 4 of Andrew's article says:

    Check the CMS status – if ActiveMasterFQDN and ActiveFileTransferAgents are empty then will will need to fail over the CMS
    • Get-CsManagementStoreReplicationStatus -CentralManagementStoreStatus

    When I stop services and run "Get-CsManagementStoreReplicationStatus -CentralManagementStoreStatus" command, ActiveMasterFQDN and ActiveFileTransferAgents are empty. Does it mean that I have to run CMS failover? But, at the same time, Microsoft says that I have to run CMS failover only if CMS is on Lync 2010.


    Thank you. Eric.

    Tuesday, June 7, 2016 1:53 PM