locked
Distribution Point Offline - how long before SCCM client connects to alternative DP? RRS feed

  • Question

  • Hi, I'm doing some testing on my SCCM 2012 setup and have been attempting to test distribution point resiliency and fall-back.

    I have a single primary site with two distribution points. One is in the main datacentre and is in the same boundary group as the clients and tagged as fast connection. The 2nd DP is in the DR datacentre, in a separate boundary group (no client subnets are in this group) and the distribution point has the allow failback checked.  Both DP's have the same content.

    After shutting down/taking offline the main distribution point, I've then kicked of an install of a package on a W7 client. It attempts to download and stays in that state for as long as I leave it. Looking in the logs I can see both DP's returned to the client (the DR tagged as REMOTE), it then attempts to connect to the main DP and just keeps retrying over and over.

    I thought if it couldn't connect to the main DP it would then failback to the DR DP, but it doesn't appear to do this. Is there a timeout on this before it would fail back?

    I'm also currently  trying adding the DR DP to the main boundary group and tagging the connection as slow so the main DP would still be used first. Again both DP's are returned to the client when installing software and the client attempts to connect to the main DP over and over without using the DR DP which is online.

    Is this normal behaviour or do I have a configuration issue?

    Appreciate your help.


    Carl

    Friday, April 5, 2013 5:47 AM

Answers

  • Hi, just to update on this thread, I raised a support call with Microsoft and the end result is that SCCM2012 clients wont fall back to an alternative DP where a DP is offline. The fall back is only for when content isn't on a DP. The 8hr timeout doesn't appear to be in affect anymore.

    What I have managed to get to work and test out is removing our production DP (also primary site server) from the production content boundary group, then clients will fall back to the DR DP as this is the only other DP available with content.

    I've managed to perform the update to remove the site server from the boundary group while the primary site server is offline, by using a PowerShell script to connect to the SCCM provider on the DR site server (DP/MP/SUP) to perform the update as our site database is off-box. This works well and the changes replicate to the SQL replica in DR that the DR MP uses and when clients failover to the DR MP they then begin using the DR DP and packages can be installed etc.

    How funny, I just fixed this at a Client this week.

    This is default client behavior as MS CSS probably told you, the client thinks the Distribution Point is coming back online soon so it waits, for good reasons. For some reason I keep thinking "7 days" not 8 hours, but maybe wrong.

    I have a work around for this, it just requires a change to the Distribution Points DNS record. Head to the DNS server, find the record for the Distribution Point that is down, change the IP address to a different member server IP address, this will cause the client, when it flushes it's DNS cache, to get an updated DNS record for the Distribution Point and it will try to connect to the Distribution Point using the changed IP address, which then induces what the client thinks is a severe error which makes it go to the next Distribution Point in the list it got from the Management Point. Once you've recovered the Distribution Point and it is back online, change it's IP address back in DNS or just let the Distribution Point update its own DNS record when it boots up (if configured to do so) and viola, you are back in business.

    Test, test and test again before ever putting something from "the web" into your production environment. I just implemented this at a client to solve their issues with their DR procedure.


    Rob Marshall | UK | My Blog | WMUG | File CM12 Feedback | CM12 Docs | CM12 Release Notes

    Friday, May 10, 2013 8:30 PM
  • Hi, just to update on this thread, I raised a support call with Microsoft and the end result is that SCCM2012 clients wont fall back to an alternative DP where a DP is offline. The fall back is only for when content isn't on a DP. The 8hr timeout doesn't appear to be in affect anymore.

    What I have managed to get to work and test out is removing our production DP (also primary site server) from the production content boundary group, then clients will fall back to the DR DP as this is the only other DP available with content.

    I've managed to perform the update to remove the site server from the boundary group while the primary site server is offline, by using a PowerShell script to connect to the SCCM provider on the DR site server (DP/MP/SUP) to perform the update as our site database is off-box. This works well and the changes replicate to the SQL replica in DR that the DR MP uses and when clients failover to the DR MP they then begin using the DR DP and packages can be installed etc.

    Wednesday, May 8, 2013 1:56 AM

All replies

  • I believe it will try for 8 hours until it changes to a new DP.

    Kent Agerlund | My blogs: blog.coretech.dk/kea and SCUG.dk/ | Twitter: @Agerlund | Linkedin: Kent Agerlund | Mastering ConfigMgr 2012 The Fundamentals

    • Proposed as answer by Ralph de Vos Friday, April 5, 2013 11:57 AM
    • Unproposed as answer by Carlitog Saturday, April 6, 2013 3:21 AM
    Friday, April 5, 2013 6:38 AM
  • Thanks Kent, I've left an install running and its been 4hrs so far, so 8hrs is looking right.

    So even though you can have multiple distribution points, if a distribution point goes offline, there's no way for clients to fail across to a DP thats online without an 8 hour delay?

    Cheers

    Friday, April 5, 2013 10:13 AM
  • I've just checked and its now been trying for almost 24hrs and is still trying to connect to the offline DP and hasnt attempted to use the second DP (tagged as remote).
    Saturday, April 6, 2013 3:20 AM
  • Hi Carl,

    As you have done all the setup, I believe, you must have even selected the option "Allow clients to use a fallback source location for content" on the Package deployment properties.

    If you haven't selected it yet, please do so.

    Regards,

    Manohar Pusala

    Monday, April 8, 2013 5:42 PM
  • Thanks Manohar, I've just checked the packages and while I thought I had this option ticked, it turns out I didn't.

    I'll enable this option and then run through the same tests again over the next few days and let you know the results.

    Appreciate the help.

    Thursday, April 11, 2013 4:47 AM
  • Update - I've ticked the "Allow clients to use a fallback source location for content" and retried the test, but still the client wont use the fallback DP. It just keeps trying the DP that is offline (15hrs before I stopped the test).

    I'm not sure if this is expected or not, but it doesn't appear to be ideal on the face of it.

    Tuesday, April 16, 2013 10:48 PM
  • I'm a bit stumped on this as there doesn't appear to be a way for sccm clients to fail across to an alternative DP if a DP is offline. We only have 2 DP's - one in the main datacentre and one in DR as we have high speed links to all our sites.

    However if the main DP goes down I want the clients to be able to use the DP in DR. Even if I add the DR DP to the same content location boundary group (tagged as slow connection and the main DP as fast), shutdown the primary site server/DP and then kick off an install, the client still keeps trying to install from the main DP.

    I left this for 16 hours and still the client doesn't use the DR DP. If I remove the content from the primary DP and then kick off an install, the client does install successfully from the DR DP so it doesn't have an issue installing from this DP but it just wont try

    Anyone else tried this out or have some other ideas?

    Thursday, April 18, 2013 2:23 AM
  •  Even if I add the DR DP to the same content location boundary group (tagged as slow connection and the main DP as fast) [...] I left this for 16 hours and still the client doesn't use the DR DP. If I remove the content from the primary DP and then kick off an install, the client does install successfully from the DR DP so it doesn't have an issue installing from this DP but it just wont try


    Bring up the properties of a deployment type -> content tab and check the deployment options. Is it set to "Do not download"?

    Torsten Meringer | http://www.mssccmfaq.de

    Thursday, April 18, 2013 6:42 AM
  • Hi Torsten, I'm testing with a standard package as opposed to the new application model. However I have the following options selected on the distribution points tab for the deployments.

    Thursday, April 18, 2013 7:57 AM
  • Hi, just to update on this thread, I raised a support call with Microsoft and the end result is that SCCM2012 clients wont fall back to an alternative DP where a DP is offline. The fall back is only for when content isn't on a DP. The 8hr timeout doesn't appear to be in affect anymore.

    What I have managed to get to work and test out is removing our production DP (also primary site server) from the production content boundary group, then clients will fall back to the DR DP as this is the only other DP available with content.

    I've managed to perform the update to remove the site server from the boundary group while the primary site server is offline, by using a PowerShell script to connect to the SCCM provider on the DR site server (DP/MP/SUP) to perform the update as our site database is off-box. This works well and the changes replicate to the SQL replica in DR that the DR MP uses and when clients failover to the DR MP they then begin using the DR DP and packages can be installed etc.

    Wednesday, May 8, 2013 1:56 AM
  • Hi, just to update on this thread, I raised a support call with Microsoft and the end result is that SCCM2012 clients wont fall back to an alternative DP where a DP is offline. The fall back is only for when content isn't on a DP. The 8hr timeout doesn't appear to be in affect anymore.

    What I have managed to get to work and test out is removing our production DP (also primary site server) from the production content boundary group, then clients will fall back to the DR DP as this is the only other DP available with content.

    I've managed to perform the update to remove the site server from the boundary group while the primary site server is offline, by using a PowerShell script to connect to the SCCM provider on the DR site server (DP/MP/SUP) to perform the update as our site database is off-box. This works well and the changes replicate to the SQL replica in DR that the DR MP uses and when clients failover to the DR MP they then begin using the DR DP and packages can be installed etc.

    How funny, I just fixed this at a Client this week.

    This is default client behavior as MS CSS probably told you, the client thinks the Distribution Point is coming back online soon so it waits, for good reasons. For some reason I keep thinking "7 days" not 8 hours, but maybe wrong.

    I have a work around for this, it just requires a change to the Distribution Points DNS record. Head to the DNS server, find the record for the Distribution Point that is down, change the IP address to a different member server IP address, this will cause the client, when it flushes it's DNS cache, to get an updated DNS record for the Distribution Point and it will try to connect to the Distribution Point using the changed IP address, which then induces what the client thinks is a severe error which makes it go to the next Distribution Point in the list it got from the Management Point. Once you've recovered the Distribution Point and it is back online, change it's IP address back in DNS or just let the Distribution Point update its own DNS record when it boots up (if configured to do so) and viola, you are back in business.

    Test, test and test again before ever putting something from "the web" into your production environment. I just implemented this at a client to solve their issues with their DR procedure.


    Rob Marshall | UK | My Blog | WMUG | File CM12 Feedback | CM12 Docs | CM12 Release Notes

    Friday, May 10, 2013 8:30 PM
  • Hi Rob, thanks for sharing your experience with this. I'll test this out in our environment and see if I get the same behaviour.

    I think for the scenario where our primary site server (production DP) is down but the site database is online, I'll use the powershell script to remove the site system via the DR sms provider.

    But where both the main primary site server and site database are down, this would be a good workaround as the sms provider update wouldn't then be an option.

    Saturday, May 11, 2013 12:04 AM
  • Hi Rob, thanks for sharing your experience with this. I'll test this out in our environment and see if I get the same behaviour.

    I think for the scenario where our primary site server (production DP) is down but the site database is online, I'll use the powershell script to remove the site system via the DR sms provider.

    But where both the main primary site server and site database are down, this would be a good workaround as the sms provider update wouldn't then be an option.


    I like the DR SMS Provider work-around too :-) I'll note this as well thanks.

    Rob Marshall | UK | My Blog | WMUG | File CM12 Feedback | CM12 Docs | CM12 Release Notes

    Saturday, May 11, 2013 7:25 AM
  • Hi Rob, thanks for sharing your experience with this. I'll test this out in our environment and see if I get the same behaviour.

    I think for the scenario where our primary site server (production DP) is down but the site database is online, I'll use the powershell script to remove the site system via the DR sms provider.

    But where both the main primary site server and site database are down, this would be a good workaround as the sms provider update wouldn't then be an option.


    I like the DR SMS Provider work-around too :-) I'll note this as well thanks.

    Rob Marshall | UK | My Blog | WMUG | File CM12 Feedback | CM12 Docs | CM12 Release Notes

    I wrote the solution out here: http://wmug.co.uk/wmug/b/r0b/archive/2013/05/16/unsticking-clients-from-unavailable-distribution-points.aspx

    And made a reference to you and this thread for your DR Provider solution where the SQL is remote and accessible.


    Rob Marshall | UK | My Blog | WMUG | File CM12 Feedback | CM12 Docs | CM12 Release Notes

    Monday, May 20, 2013 3:51 PM
  • Hi Rob, good comprehensive write up and thanks for the reference!
    Tuesday, May 21, 2013 3:07 AM
  • Hi Everyone,

    Thanks for asking this question, and providing answers.

    This new behaviour is going to be a problem for me!

    Our production environment is System Center Configuration Manager 2007 R3. It is a single site, but our network topology has one large site (5,000 clients) and 3 small sites (50-300 clients).  The large site has 5 DPs, and each small site has 1 local DP.  The sites are separate Active Directory sites.

    When designing our topology, I created a proof of concept and simulated a small site where I had switched off the distribution point; within seconds, the client failed over to another distribution point - GREAT!

    With this new behaviour, to get high availability, I'm going to need two distribution points at each small site.  That's 3 additional servers that haven't been planned or budgeted for, and money is tight.

    I can assign multiple DPs to each boundary group, but I can think of no way of making a client at a small site use a local DP, and only cross the WAN to a second DP if necessary.

    Any ideas are welcome!

    Anwar

    Wednesday, July 24, 2013 4:53 PM
  • This new behaviour is going to be a problem for me!
    Not sure what you are referring to here; the eight hour timeout has been there since SMS 2003 to my knowledge and definitely existed in 2007.

    Jason | http://blog.configmgrftw.com

    Wednesday, July 24, 2013 9:22 PM
  • Hi Jason,

    Don,

    Couldn't find the documentation (SCCM is huge) but I certainly identified the behaviour I wanted;

    • try the local distribution point
    • if it isn't available, fail over to a distribution point in another boundary

    I then confirmed this by testing.

    Strange that Microsoft reiterate this odd behaviour

    http://blogs.technet.com/b/wemd_ua_-_sms_writing_team/archive/2008/11/25/clarifying-retry-behavior-for-distribution-points.aspx

    Question:  Does Configuration Manager continue to hand out distribution points as available when they are not? 

    Answer:  Yes.  Although Configuration Manager periodically monitors site systems and therefore knows when they are not responding, it continues to hand out distribution points to clients even if it detects that they are not responding.  It’s up to the administrator to monitor the site systems manually (using the Site System Status home page) or automatically using Operations Manager or equivalent, and then either correct any problems or delete the failed distribution point.

    Thursday, July 25, 2013 7:12 AM
  • Hi, you can always manually remove the DP at the WAN site from the associated content boundary group if its offline for some time, so that the clients fall back to a central DP (if the DP is setup to allow fall back).

    Branch cache might be another option.

    Thursday, July 25, 2013 12:27 PM
  • You can change the 8 hour interval with a script, please see here
    http://happysccm.com/?p=187

    We set ours to 120 seconds, we don't have any slow links and can't think of a reason it should be anything more. So if we have any network issues the clients will retry a different DP within 4 minutes.

    Tuesday, May 6, 2014 11:30 PM
  • You can change the 8 hour interval with a script, please see here
    http://happysccm.com/?p=187

    We set ours to 120 seconds, we don't have any slow links and can't think of a reason it should be anything more. So if we have any network issues the clients will retry a different DP within 4 minutes.


    Totally unsupported so use at you own risk/peril.

    Jason | http://blog.configmgrftw.com

    Tuesday, May 6, 2014 11:38 PM