locked
Client needs 99.999% uptime including no downtime during SharePoint and other patches RRS feed

  • Question

  • They want to use a secondary data center to provide this, but I am unsure how to do this. Can anyone provide me with any solutions including third party. Also how to work with the DNS.

    Friday, January 22, 2010 4:35 PM

Answers

  • Hi jgiangiulio,

     

    It’s not possible to update or patch SharePoint without occurring some amount downtime. What we can do is to minimize the downtime.

    I recommend you read this blog as it shares some good tips to reduce the downtime:

    http://blogs.msdn.com/toddca/archive/2008/08/02/the-zero-downtime-sharepoint-patching-myth.aspx

    As Jeff said, there are also some 3<sup>rd</sup> party solutions which you can search in internet. Not tested any of them so not sure if they can achieve your requreiment.

     

    For more information, please refer to:

    http://technet.microsoft.com/en-us/library/ee514459.aspx

     

    Hope this helps.

     

    Lu Zou

    TechNet Subscriber Support in forum

    If you have any feedback on our support, please contact mtngfb@microsoft.com  

     

     

     

    • Marked as answer by Lily Wu Friday, January 29, 2010 9:42 AM
    Monday, January 25, 2010 6:31 AM
  • 5 nines gives about 26 seconds of unplanned downtime per month.  It will likely take longer than that to repoint DNS to an alternate farm in a failure scenario or by manually switching it over and hving those DNS changes flush from each of the servers in the domain, and given you would need to have a second farm offsite to cater for a no downtime patching regime you are going to have to keep this in a hot state - so things like binary level replication will be essential, as Log Shipping won't help you here - its just too slow.  Of course you could have your farm in a read only state for a long time while you ship logs and allow DNS to filter through, kinda defeats the purpose though of stipulating 5 9's if the farm is read only, its not actully operating as planned.

    Your DR manager must have an awful lot of money to spend, and it sounds like they do not understand the difference between planned and unplanned downtime in architecting MOSS platforms.  Planned downtime is when its acceptable to patch, and its simply planned in.  5X9 you have to architect for in your primary farm, but theres a huge difference between uptime and DR.  Also, what do they mean by 5 9's - if you have to reset Index, or take a query server out of rotation with no visible farm impact, is that breaching your SLA on 5 9's?

    Regards

    John Timney
    • Marked as answer by Lily Wu Friday, January 29, 2010 9:42 AM
    Monday, January 25, 2010 9:34 AM
  • Hi.

    You will need to bring your client back to earth.
    Let me quote Joel Spolsky:

    Really high availability becomes extremely costly. The proverbial "six nines" availability (99.9999% uptime) means no more than 30 seconds downtime per year. That's really kind of ridiculous. Even the people who claim that they have built some big multi-million dollar superduper ultra-redundant six nines system are gonna wake up one day, I don't know when, but they will, and something completely unusual will have gone wrong in a completely unexpected way, three EMP bombs, one at each data center, and they'll smack their heads and have fourteen days of outage.

    Think of it this way: If your six nines system goes down mysteriously just once and it takes you an hour to figure out the cause and fix it, well, you've just blown your downtime budget for the next century. Even the most notoriously reliable systems, like AT&T's long distance service, have had long outages (six hours in 1991) which put them at a rather embarrassing three nines ... and AT&T's long distance service is considered "carrier grade," the gold standard for uptime.



    Regards,
    Magnus


    My blog: InsomniacGeek.com
    • Marked as answer by Lily Wu Friday, January 29, 2010 9:42 AM
    Monday, January 25, 2010 6:56 PM

All replies

  • Big (expensive) topic here....

    A manual solution would be to:
    - Build a farm that mirrors the production farm in the new DC (names and everything would need to be identical)
    - Put the DR farm in "Read ONly" mode
    - Set up SQL log shipping from the prod farm to the DR farm
    - At failure time: put the DR farm in R/W
    - Manually Change DNS to point to the DR Farm.  It's very important to keep the TTL on the domain set low so that it doesn't take long for the change to replicate.

    3rd party solutions that do this very well are:

    1. http://neverfailgroup.com/
    2. http://www.doubletake.com/english/Pages/default.aspx


    I hope this helps and answers your question-

    Jeff DeVerter, MCSE
    Rackspace
    blog:http://www.social-point.com
    twitter: http://www.twitter.com/jdeverter
    Friday, January 22, 2010 8:09 PM
  • That is the solutions I have done in the past using Log shipping and mirroring only problem is the failover causes downtime from my experience. Usually about 5 minutes per content db, plus the dns repropegating.
    Friday, January 22, 2010 8:46 PM
  • I've heard MS spends over $250k to provision the services for each client on MS Online SharePoint, and I don't think they even offer 5 9s. 
    SharePoint MVP | Developer | Administrator | Speaker-- Twitter -- Blog - http://nextconnect.blogspot.com
    Friday, January 22, 2010 9:04 PM
  • Hi jgiangiulio,

     

    It’s not possible to update or patch SharePoint without occurring some amount downtime. What we can do is to minimize the downtime.

    I recommend you read this blog as it shares some good tips to reduce the downtime:

    http://blogs.msdn.com/toddca/archive/2008/08/02/the-zero-downtime-sharepoint-patching-myth.aspx

    As Jeff said, there are also some 3<sup>rd</sup> party solutions which you can search in internet. Not tested any of them so not sure if they can achieve your requreiment.

     

    For more information, please refer to:

    http://technet.microsoft.com/en-us/library/ee514459.aspx

     

    Hope this helps.

     

    Lu Zou

    TechNet Subscriber Support in forum

    If you have any feedback on our support, please contact mtngfb@microsoft.com  

     

     

     

    • Marked as answer by Lily Wu Friday, January 29, 2010 9:42 AM
    Monday, January 25, 2010 6:31 AM
  • 5 nines gives about 26 seconds of unplanned downtime per month.  It will likely take longer than that to repoint DNS to an alternate farm in a failure scenario or by manually switching it over and hving those DNS changes flush from each of the servers in the domain, and given you would need to have a second farm offsite to cater for a no downtime patching regime you are going to have to keep this in a hot state - so things like binary level replication will be essential, as Log Shipping won't help you here - its just too slow.  Of course you could have your farm in a read only state for a long time while you ship logs and allow DNS to filter through, kinda defeats the purpose though of stipulating 5 9's if the farm is read only, its not actully operating as planned.

    Your DR manager must have an awful lot of money to spend, and it sounds like they do not understand the difference between planned and unplanned downtime in architecting MOSS platforms.  Planned downtime is when its acceptable to patch, and its simply planned in.  5X9 you have to architect for in your primary farm, but theres a huge difference between uptime and DR.  Also, what do they mean by 5 9's - if you have to reset Index, or take a query server out of rotation with no visible farm impact, is that breaching your SLA on 5 9's?

    Regards

    John Timney
    • Marked as answer by Lily Wu Friday, January 29, 2010 9:42 AM
    Monday, January 25, 2010 9:34 AM
  • Hi.

    You will need to bring your client back to earth.
    Let me quote Joel Spolsky:

    Really high availability becomes extremely costly. The proverbial "six nines" availability (99.9999% uptime) means no more than 30 seconds downtime per year. That's really kind of ridiculous. Even the people who claim that they have built some big multi-million dollar superduper ultra-redundant six nines system are gonna wake up one day, I don't know when, but they will, and something completely unusual will have gone wrong in a completely unexpected way, three EMP bombs, one at each data center, and they'll smack their heads and have fourteen days of outage.

    Think of it this way: If your six nines system goes down mysteriously just once and it takes you an hour to figure out the cause and fix it, well, you've just blown your downtime budget for the next century. Even the most notoriously reliable systems, like AT&T's long distance service, have had long outages (six hours in 1991) which put them at a rather embarrassing three nines ... and AT&T's long distance service is considered "carrier grade," the gold standard for uptime.



    Regards,
    Magnus


    My blog: InsomniacGeek.com
    • Marked as answer by Lily Wu Friday, January 29, 2010 9:42 AM
    Monday, January 25, 2010 6:56 PM
  • ok, since SQL Server 2012 is out now, how can we implement Always On  technology to achieve this expectations ?

    /* Server Support Specialist */

    Saturday, July 20, 2013 9:21 AM
  • Hi Albert,

    I believe regardless of SQL Server's new added functionality delivering the five 9's is still just about unrealistic, to put it into perspective, alls you need is a correlation error and to a end user the SharePoint site could be down.. 

    As mentioned in earlier posts anyone trying to deliver this for SharePoint should re-evaluate your customers expectations

    There's always SharePoint Online hosted by Microsoft  but even that can have down time
    Saturday, July 20, 2013 11:30 AM