locked
BPOS Mailflow issues for North America

    Question

  • So any one else have something other than :The BPOS Operations team is investigating alerts indicating service degradation for Exchange Online mail flow for organizations served from this region. Users in affected organizations may be experiencing delays when trying to send or receive e-mail using Outlook, OWA, or mobile devices. The BPOS Operations team is actively working to determine the root cause and restore service.

     

    I have opened a ticket with MS already but was curious if some else had an idea of what is happening...

     

    Seems like there have been a lot of issues/outages lately...

     

    Thanks,

     

    Justin

    Tuesday, May 10, 2011 5:58 PM

Answers

  • Here is the information from the health dashboard on yesterday's outage:

    Time   Description
    6:58 PM Additional information The BPOS Operations team has resolved the problem affecting Exchange Online mail flow for customers served from this region. Companies may see delays in delivering messages while mail flow returns to normal. A full post mortem of this incident is being compiled by the Operations Team, and will be available via Microsoft Online Services Technical Support as part of our standard incident management process and upon completion of the root cause investigation.
    5:45 PM Additional information Customers may still be experiencing slight delays in sending or receiving emails, however the source of the problem has been corrected and mailflow conditions returning to normal operations.
    3:00 PM Performance degradation The BPOS Operations team continues to investigate service degradation issues with Exchange Online mail flow for organizations served from this region. They have identified the source of the problem and have implemented a fix. Customers may experience a delay in sending or receiving emails, however that delay should be improving as the fix takes effect.
    12:00 PM Performance degradation The BPOS Operations team continues to investigate service degradation issues with Exchange Online mail flow for organizations served from this region. The next service update will be provided within 2 hours if the issue is not resolved.
    9:40 AM Performance degradation The BPOS Operations team is investigating alerts indicating service degradation for Exchange Online mail flow for organizations served from this region. Users in affected organizations may be experiencing delays when trying to send or receive e-mail using Outlook, OWA, or mobile devices. The BPOS Operations team is actively working to determine the root cause and restore service.

    What I can tell you is that services were not completely down as I was able to send test messages to some customers, one of which was received in a timely manner.  Outages like this take time to investigate to find the cause.  If we were to fiip a switch there would be lost mail.  The issue is still under investigation and this particular outage was resolved at approximately 7:00 pm PDT.  Users should not have lost emails as they were queued either locally in Outboxes/Drafts or on the HUB servers.

    For those who got the "Our offices are closed now" message when calling in were actually getting a default message in our IVR.  This has now been changed to reflect that all lines are busy and access to a support agent will be delayed.

    If you opened a service request about this particular outage, you may request a post incident report.  If you did not open a ticket regarding this particular outage and wish a post incident report, you may wish to open a new SR and make the request specifically for the outage on May 10th.

    Friday, May 13, 2011 12:40 AM
    Moderator

All replies

  • agreed.. this is the 2nd time in 2 weeks or so.. first time around, only 1 user was affected (that i know) from my org. this time we're all affected. have not received or sent out email successfully in the past 2.5 hours...
    Tuesday, May 10, 2011 6:14 PM
  • Agreed.  One more Here.

     

    Thanks

    Chris

    Tuesday, May 10, 2011 6:20 PM
  • Another one here...

     

    John

     

    Tuesday, May 10, 2011 6:22 PM
  • Yes we too are experiencing it looks like it started approx 13:15 EST 05-10-2011.

     

    Tuesday, May 10, 2011 6:25 PM
  • This seems to be region wide, will just have to wait till they fix the issue.  It's causing us to have to use thumb drives right now.
    Tuesday, May 10, 2011 6:25 PM
  • Down here as well and have some pretty upset VIPs.
    Tuesday, May 10, 2011 6:30 PM
  • We are not sending or receiving email at all right now...
    Tuesday, May 10, 2011 6:30 PM
  • I am locate at Sao Paulo/Brazil and all email accounts have been affect by this interruption too.
    Tuesday, May 10, 2011 6:33 PM
  • We are having the same problem too.  I called BPOS support 24/7 support line and was told that their office is close, please try again later.
    Tuesday, May 10, 2011 6:33 PM
  • We did our migration by departments and it seems that the last few groups we did have had the most issues with connectivity(with the exception now being it is everyone having issues).

    I am guessing they are on different clusters but according to MS Tech Supp there is no way for them to tell us what clusters the users are on. Really annoying since the last group is all the senior management...


    Tuesday, May 10, 2011 6:33 PM
  • I spoke to a live rep about 30 mins ago and she reported to me that it is a known issue they're working on.  She did not have an ETA however.
    Tuesday, May 10, 2011 6:34 PM
  • Yay cloud services!

     

    So glad we switched away from our inhouse Exchange box to this...  We get to pay more and have less reliability!  Awesome!

    Tuesday, May 10, 2011 6:38 PM
  • We're having problems too.  Again.  I second what Justin said about the frequency of service outages and "impairments" lately.  We are growing very frustrated, as are our end users.
    Tuesday, May 10, 2011 6:39 PM
  • With you on that Mike Bvo...

     

    Face Palm...Apply directly to forehead.

    Tuesday, May 10, 2011 6:39 PM
  • wow.. just tried calling again and I got the "office currently closed" message. 
    Tuesday, May 10, 2011 6:40 PM
  • now on hold again....
    Tuesday, May 10, 2011 6:42 PM
  • Exactly.  My baseline expectation was that this service would be at least AS functional as, say, Hotmail or some other free email service.  Needless to say, I have been bitterly disappointed.
    Tuesday, May 10, 2011 6:42 PM
  • We are also having a domain-wide mail delivery issue beginning around 1:00pm ET.  We are in the PA area of the United States.  Second day on the service, this is going to nice to have to explain why this is better!!
    Tuesday, May 10, 2011 6:43 PM
  • Same issue here with users worldwide.  No email for entire Company.  Can't even send general email to notify everyone of the issues...  Going to be fun tracking down the right people to let them know..
    Tuesday, May 10, 2011 6:49 PM
  • So did we just experience the switch from BPOS to Office 365?
    Tuesday, May 10, 2011 6:56 PM
  • I migrated our company to Exchange Online from in-house Exchange 2003 last October, and I'm sorry to say that I regret everything I ever said about how this would be better.  It has been far worse in terms of both performance and reliability.  I hate to be so harsh, but I am deeply frustrated.  Email is the one thing that everyone from the guys in the factory up the CEO uses.  The c-suite execs hardly use anything BUT email.  It has a bigger impact on IT's reputation with end-users and business leaders than anything else, and these constant service outages and "impairments" have got all of us in IT panicked.  We're actively looking at migration paths back to in-house email.

    Kevin Baker

    Tuesday, May 10, 2011 6:57 PM
  • Just spoke to MS. No ETA on a fix.
    Tuesday, May 10, 2011 7:01 PM
  • same here.  this is very disappointing & frustrating at the same time.. we had to get some important emails out by 3pm.. 
    Tuesday, May 10, 2011 7:01 PM
  • I've been added to the call back list (yes, they have a manual list) so will update if/when I hear back from them. support team didn't have (or didn't want to share) any idea of scope or the cause.
    Tuesday, May 10, 2011 7:04 PM
  • why do they call this a "service degradation" when it's actually a "service interruption"?

     

    looking at this - https://health.noam.microsoftonline.com

     

    they have been pretty unreliable last few weeks..

    Tuesday, May 10, 2011 7:04 PM
  • Yes, submitted Online Ticket inside Admin Center.

    Tuesday, May 10, 2011 7:04 PM
  • I've been with Microsoftonline for two weeks now, two outages in that time and the boss looks at me like I'm a dolt. I was THIS close to signing with Intermedia.

     

    Rich

    Tuesday, May 10, 2011 7:05 PM
  • Are we all calling the same number? 1-866-676 6546

    Same problem, e-mails are sent at rediculous delays, or they're lost all together.

    Not a ery good thing to happen in a busy work day.

    Tuesday, May 10, 2011 7:06 PM
  • Yep down since about 12:15 EST.  One ticket, two calls, no ETA up-time.  Health status shows lots of mailflow degradation in last few weeks.  Wish I could get more info into whats happening
    Tuesday, May 10, 2011 7:07 PM
  • Me too.  Entire office is down... This is bad.
    Tuesday, May 10, 2011 7:08 PM
  • Same issue, opened a ticket, no ETA but said I should get an email within an hour with an update.  This is looking really bad, we just purchased this solution at the end of the year and started migrating our many users over.  We have over 1100 webmail only users we are getting ready to migrate and we have migrated over 250 full users already with about 100 of those left and it's really looking like we made a bad choice going with this solution.  Only reason we moved away from old one was we were using a 10 year old on premise pop/imap/webmail appliance with no support since it was so old and no storage so everybody was using PST's.  We really wanted to use Exchange but don't have the in house support and knowhow to set up or manage it so thus we went with this hosted solution and it's really looking like maybe we picked the wrong company to go with.  We simply cannot have things like this happening hardly ever, let along multiple times per month this is outragous!  Fix this MS!

    Tuesday, May 10, 2011 7:12 PM
  • how will you receive the email when it doesn't work! lol

    hopefully MS will fix this SOON... too much heat for all of us

    Tuesday, May 10, 2011 7:14 PM
  • I just received an email!!!Maybe its working again.????
    Tuesday, May 10, 2011 7:17 PM
  • We're getting e-mail now!
    Tuesday, May 10, 2011 7:17 PM
  • I received 1 email now.
    Tuesday, May 10, 2011 7:17 PM
  • We've been on BPOS since 2009 and this is the first time email service has been completely non-functional (granted there have been performance issues), but in general I would say the service has been exceptionally reliable.
    Tuesday, May 10, 2011 7:18 PM
  • Just chiming in with the same issue; my IT teams excellent reputation is suffering and I can't mitigate with an ETA or explanation of the cause.  I am amazed that the health report (https://health.noam.microsoftonline.com) calls this a 'service degradation' when it is clearly an outage.  I guess this is how MS maintains their relatively high availability numbers!?

    Tuesday, May 10, 2011 7:21 PM
  • we just had a business merger... Today was not the day for the email to go down... We thought we were getting away from the frustration by coming to BPOS.

    Tuesday, May 10, 2011 7:22 PM
  • Just received an e-mail, not one of the tests I sent earlier .... but at least its progress.
    Tuesday, May 10, 2011 7:22 PM
  • So far its just 1 email.... And I know that's not all the email that's been held back... so not really up yet... just 1 little piece.... maybe they are rationing us all.... "only one email for u"

    Tuesday, May 10, 2011 7:23 PM
  • No email yet
    Tuesday, May 10, 2011 7:25 PM
  • Yep, we received three emails and none were my tests. 
    Tuesday, May 10, 2011 7:26 PM
  • Able to send & receive now; I am connected in San Antonio TX. No previous emails have flowed (either still hung up in a queue or lost?).
    Tuesday, May 10, 2011 7:27 PM
  • They are calling this a degradation.  This is a full blown service interruption/outage.  They need to admit it.  I have about 45 users with 2 different companies that just signed up in the past 2 months, and like the rest of you we have been without the ability to send or receive e-mails since about 12:30pm EST.  My clients are ready to jump ship to google apps or some other system. Not to mention the black eye and loss of service fees I get as their IT consultant.  But we are just 45 users on BPOS so we are probably no big deal for Microsoft since they just bought Skype for 8B.   --- JUST VENTING.

    Tuesday, May 10, 2011 7:28 PM
  • Yay!  The cloud!!!  We switched from our own infrastructure about 9 months ago, and MSOS is far worse.  WAY more downtime, way slower.  I get responses to emails I send before I get the one I sent and was copied on!  You'd think they'd be the best at running their own products........
    Tuesday, May 10, 2011 7:28 PM
  • Yeah, and imagine how much better Skype will be soon too ;)
    Tuesday, May 10, 2011 7:31 PM
  • We are down too.
    Tuesday, May 10, 2011 7:32 PM
  • One email only here in Vancouver BC.

    FWIW I'm with Jay Wilson. BPOS has given our small firm with 5 locations access to better email service than we could normally afford. BPOS hasn't been that bad. This is the first big splat.

    Tuesday, May 10, 2011 7:33 PM
  •  new test email sent to myself delivered... old email missing..
    Tuesday, May 10, 2011 7:33 PM
  • Received a test that I sent myself.....looks like emails that were sent during downtime are gone though :(
    Tuesday, May 10, 2011 7:33 PM
  • Same here... I wonder if emails during the 2 hour blackout will actually go anywhere....
    Tuesday, May 10, 2011 7:37 PM
  • they could be in the queue most likely... 
    Tuesday, May 10, 2011 7:38 PM
  • I agree with KJ_BPOS. BPOS has been good for my Organization too. Much better performance than what was available in house for fraction of the cost.

    • Edited by mmpai Tuesday, May 10, 2011 7:42 PM
    Tuesday, May 10, 2011 7:38 PM
  • Absolutely.  I ran an inhouse Exchange server from 1999 to 2009 and it went down more times than I could count on my fingers during that time... this is the first major outage since I started using BPOS.  I would not overreact to it unless you have no other choice to save face.
    Tuesday, May 10, 2011 7:40 PM
  • I agree. BPOS has been good for my Organization too. Much better performance than what was available in house for fraction of the cost.

    We are down still, no mail flowing
    Tuesday, May 10, 2011 7:40 PM
  • Had a user call to inform they received 3 email sent at 3:30 PM EDT - old email pending
    Tuesday, May 10, 2011 7:40 PM
  • Received an email at 343 EST...email was originally sent at 126 EST...there is hope!
    Tuesday, May 10, 2011 7:43 PM
  • Message sent from BPOS to my test mailbox in Office 365 Beta was delivered. Email sent to Yahoo at 3:35 was delivered.
    Tuesday, May 10, 2011 7:46 PM
  • No email here EST

    Hosted exchange is getting close to their 99.9% scheduled uptime with financially backed service level agreements

    Tuesday, May 10, 2011 7:47 PM
  • Jay. Test emails I sent never bounced back, all depends on how often the mail server that is sending will retry delivery.
    Tuesday, May 10, 2011 7:48 PM
  • Starting to trickle. Received an email sent at 10:13 just now at 12:48

    Tuesday, May 10, 2011 7:49 PM
  • Message sent from BPOS to my test mailbox in Office 365 Beta was delivered. Email sent to Yahoo at 3:35 was delivered.
    My inbound is working (new messages only), but no outbound messages are reaching their destinations.
    Tuesday, May 10, 2011 7:50 PM
  • Senior Exec indicated mail received.. hopefully the issue is resolved
    Tuesday, May 10, 2011 7:53 PM
  • the same problem on our site no email for the last 3 hours plus.

    Any answer from MS in this regards?

    Tuesday, May 10, 2011 8:01 PM
  • Had info from Microsoft support that the issue has been resolved.
    Tuesday, May 10, 2011 8:02 PM
  • Latest from MS at 2PM:

    "The BPOS Operations team continues to investigate service degradation issues with Exchange Online mail flow for organizations served from this region. The next service update will be provided within 2 hours if the issue is not resolved."

     

    Tuesday, May 10, 2011 8:03 PM
  • Another outage here.  System wide.  Started with only one user, apparently.  Outgoing and incoming fail.  receive no errors. started service request, came back up and delivered only a few wmails then stopped again.  Was assured the emails have all been spooled since outage, but concerned that some are getting through, while others are not.  Spooling should be all or nothing, until outage is resolved.  hmmm...
    Tuesday, May 10, 2011 8:03 PM
  • Email trickling in for my company.  Located in SLC, Utah.

    Still waiting to see if all the email sent gets delivered...

    Tuesday, May 10, 2011 8:05 PM
  • Had info from Microsoft support that the issue has been resolved.

    Chug... it took 20 minutes for my test email to be received after being sent.....

     

    I'm not sure "resolved" is the word I would use.  Some of our users are also experiencing an issue where items that are sent appear in their drafts folder instantly after sending them... occurs in OWA and Outlook 2011 for Mac.

    Tuesday, May 10, 2011 8:06 PM
  • As of 1:00PM MST From MS:

    The BPOS Operations team continues to investigate service degradation issues with Exchange Online mail flow for organizations served from this region. The next service update will be provided within 2 hours if the issue is not resolved.

     

    I really truly hope that once 365 drops these types of issues cease to happen.

    Tuesday, May 10, 2011 8:14 PM
  • Yay!  The cloud!!!  We switched from our own infrastructure about 9 months ago, and MSOS is far worse.  WAY more downtime, way slower.  I get responses to emails I send before I get the one I sent and was copied on!  You'd think they'd be the best at running their own products........
    Thats a good point.  Delivery has always been slow for us.  If there is an attachment, (small 1 meg word doc or whatever) forget it.  We're talking hours.  Maybe just one hour, but we've seen as high as 7 hours to receive an email with an attachment.
    Tuesday, May 10, 2011 8:20 PM
  • If anyone would like the link to the SLA - http://microsoftvolumelicensing.com/DocumentSearch.aspx?Mode=3&DocumentTypeId=37

    This is a costly outage. 4+ hours

    Tuesday, May 10, 2011 8:20 PM
  • Update:  Some outgoing are trickling in, but time/date stamp is wrong.  Delivery takes ~20 minutes and date stamp shows it as only 2 minutes...  ugh.
    Tuesday, May 10, 2011 8:23 PM
  • Oh ... but per microsoft its not an outage ... its a service degridation.

     

    Tuesday, May 10, 2011 8:29 PM
  • If anyone would like the link to the SLA - http://microsoftvolumelicensing.com/DocumentSearch.aspx?Mode=3&DocumentTypeId=37

    This is a costly outage. 4+ hours


    great find! Thanks!!!
    Tuesday, May 10, 2011 8:29 PM
  • Update:  Some outgoing are trickling in, but time/date stamp is wrong.  Delivery takes ~20 minutes and date stamp shows it as only 2 minutes...  ugh.

    Same here.
    Tuesday, May 10, 2011 8:29 PM
  • email flow seems to have stopped
    Tuesday, May 10, 2011 8:31 PM
  • Same here, and still missing a very large quantity of mail that was sent and received all afternoon.

    Tuesday, May 10, 2011 8:34 PM
  • received some new email and replies to my email sent to others in our Organization in the last 5 minutes. Looks like new email is working. waiting for email from earlier in the day to be delivery
    Tuesday, May 10, 2011 8:42 PM
  • I can only send emails to my domain in BPOS.


    In twitter have message: 30 minutes return server. I don´t believe.

     

    What´s Happened?

    Regards,

    Tuesday, May 10, 2011 8:42 PM
  • Oooh feel the power of the "cloud".... that airplane flying sooo fast... Practice Accelerator for Business Productivity Online Suite
    Tuesday, May 10, 2011 8:48 PM
  • emails starting to trickle in EST
    Tuesday, May 10, 2011 8:49 PM
  • Cloud Power!!!!!!
    Tuesday, May 10, 2011 8:51 PM
  • Looks good here (EDT) for new messages in/out.. earlier messages missing.(received 1 from 12:45 PM EDT)
    Tuesday, May 10, 2011 9:05 PM
  • Justin,

    We are in the same boat, email has been either non existent or very slow since 9:30 a.m. PST.

    I have ticket open, they said they would call me on the hour to let me know status, hasn't happend

    that way, been two hours since I called them last. Spoke to the folks that manage our network and they have had several customers calling in with the same issues. 

    I'd feel better just having a clue as to what the issue is or a time frame as to when it would be resolved.
    We moved to the hosted services from an in house server to avoid this type of issue. Thought going w/Microsoft directly would be a good idea...

    Tuesday, May 10, 2011 9:22 PM
  • Justin,

    We are in the same boat, email has been either non existent or very slow since 9:30 a.m. PST.

    I have ticket open, they said they would call me on the hour to let me know status, hasn't happend

    that way, been two hours since I called them last. Spoke to the folks that manage our network and they have had several customers calling in with the same issues. 

    I'd feel better just having a clue as to what the issue is or a time frame as to when it would be resolved.
    We moved to the hosted services from an in house server to avoid this type of issue. Thought going w/Microsoft directly would be a good idea...


    Thanks for your reply. It's actually kind of nice to know some of these issues we have been having over the last month since we migrated aren't limited to our experience. MS has generally been horrible in updating the MS Health Dashboard with info and they consistently display inaccurate levels of interruption...This issue is not a performance degradation by any means. Mail does not work.

    I would take a look at the SLA syabrough posted above. Maybe we can get some credit to help recoup the costs of this issue.

    MS also said they would post an update on the issue 2 hours after their last post. Been 2.5 hours and no new info.

    What a joke.

    As of right now an email I sent 50 minutes ago just got delivered from an external address to my BPOS account. Internal emails sent over 3 hours ago are just starting to be delivered although it seems to be to the tune of one an hour.

    Tuesday, May 10, 2011 9:31 PM
  • We have had intermittent or no access this afternoon.... We talked with our 1000 or so clients around North America with similar experiences

    E-mails from after 1230p are now ebeginning to flow. I have receved 12  emails sent between 1230 and 130pm EST today....

    Apparently the forefront queueing as well as internal queueing is working.

    Now forthe explanantion....

     

     

     

     

     

     

     


    Jim Canfield Champion Solutions Group www.championcloudservices.com
    Tuesday, May 10, 2011 9:36 PM
  • Justin,

    We are in the same boat, email has been either non existent or very slow since 9:30 a.m. PST.

    I have ticket open, they said they would call me on the hour to let me know status, hasn't happend

    that way, been two hours since I called them last. Spoke to the folks that manage our network and they have had several customers calling in with the same issues. 

    I'd feel better just having a clue as to what the issue is or a time frame as to when it would be resolved.
    We moved to the hosted services from an in house server to avoid this type of issue. Thought going w/Microsoft directly would be a good idea...


    Thanks for your reply. It's actually kind of nice to know some of these issues we have been having over the last month since we migrated aren't limited to our experience. MS has generally been horrible in updating the MS Health Dashboard with info and they consistently display inaccurate levels of interruption...This issue is not a performance degradation by any means. Mail does not work.

    I would take a look at the SLA syabrough posted above. Maybe we can get some credit to help recoup the costs of this issue.

    MS also said they would post an update on the issue 2 hours after their last post. Been 2.5 hours and no new info.

    What a joke.

    I've yet to even get a phone call or an email (well maybe I did?!) regarding my support ticket.  This is the biggest outage I've yet to experience with MS, but I will say that although I'm willing to laugh off a few hours of shoddy performance today, this had better be 100% fixed by time I wake up, otherwise support tickets will start turning into service cancellation ticket rapidly.  I cannot go more than 8 hours in this condition without being forced to consider alternatives.  At 24 hours, I'll be plotting a migration path to someone else.  At 48 hours I'll have already begun migrating.  Email is our business lifeblood.  Its almost as important as the internet itself. 
    Tuesday, May 10, 2011 9:39 PM
  • Any news yet?  I'm seeing some emails coming through but still having major delays.  Could somebody from MS please tell us exactly what the problem has been with this lately?  Is there not enough servers to handle the mail flow and you guys are adding more?  What is the cause and what action is being taken to fix this for real instead of just band aid it?
    Tuesday, May 10, 2011 10:05 PM
  • Any news yet?  I'm seeing some emails coming through but still having major delays.  Could somebody from MS please tell us exactly what the problem has been with this lately?  Is there not enough servers to handle the mail flow and you guys are adding more?  What is the cause and what action is being taken to fix this for real instead of just band aid it?


    3:00PM The BPOS Operations team continues to investigate service degradation issues with Exchange Online mail flow for organizations served from this region. The next service update will be provided within 2 hours if the issue is not resolved.

    Nothing new yet...Maybe the hamsters died?


    Tuesday, May 10, 2011 10:07 PM
  • I just got off the phone with a tech. apparently he has been told that the load balancing stopped working properly on a large number of N. American servers. They don't know why but that is what seems to have started the whole thing. Now email is being pushed out to other healthy servers so we are seeing email from this a.m. and this afternoon come in. Of course those transport hubs are now overloaded, he expects that by later tonight everything will be back to normal.

    Let us hope he is right. 

    Tuesday, May 10, 2011 10:14 PM
  • I just got off the phone with a tech. apparently he has been told that the load balancing stopped working properly on a large number of N. American servers. They don't know why but that is what seems to have started the whole thing. Now email is being pushed out to other healthy servers so we are seeing email from this a.m. and this afternoon come in. Of course those transport hubs are now overloaded, he expects that by later tonight everything will be back to normal.

    Let us hope he is right. 


    Thanks for the ray of light...More and more mail seems to be doing a little better but not at all where it should be. Still about 2 hours behind...


    MS just emailed me to say they have resolved the issue affecting mail flow...
    Tuesday, May 10, 2011 10:15 PM
  • Have seen a similar issue yesterday with one customer and today with another. Have also had custoerms reporting that their mail address is being rejected as an unidentified "MX" record
    Wednesday, May 11, 2011 2:39 PM
  • Just got off the phone with support, closed the ticket, everything seems normal.  I told him this was the worst outage yet, he agreed and mentioned it affected over 3000 customers, so they were on it.... hope that never happens again!
    Wednesday, May 11, 2011 4:46 PM
  • if you have a ticket open and would like more information on what happened, you can request for a Post Incident Report (PIR).
    Wednesday, May 11, 2011 7:01 PM
  • if you have a ticket open and would like more information on what happened, you can request for a Post Incident Report (PIR).

    Asked under my support ticket. Hopefully I hear from them soon as the higher ups want answers.
    Wednesday, May 11, 2011 7:49 PM
  • if you have a ticket open and would like more information on what happened, you can request for a Post Incident Report (PIR).

    they won't post it on the health dashboard or on the forums?
    Wednesday, May 11, 2011 7:53 PM
  • if you have a ticket open and would like more information on what happened, you can request for a Post Incident Report (PIR).

    they won't post it on the health dashboard or on the forums?


    MS answer: The BPOS Operations team has resolved the email queuing issues affecting customers served from this region. The team will perform a complete post mortem of the incident which will be delivered to customers upon request.


    I will post what they send me when they send it but haven't heard anything yet.

    Wednesday, May 11, 2011 7:55 PM
  • The BPOS Operations team is investigating alerts indicating service degradation for access to Exchange Online mailboxes hosted on Cluster 42. Users in affected organizations may be experiencing difficulties connecting to their mailbox via Outlook, OWA and other MAPI-based e-mail clients. The BPOS Operations team is actively working to determine the root cause and restore service.

    Seriously ???? Again???!!!!

    Wish MS could tell us what cluster my users were on...No point in mentioning it unless I can correlate the info to my organization.

    Wednesday, May 11, 2011 8:46 PM
  • Webmail is down now...  and we haven't received an outlook e-mail for 30 minutes.  Hope this isn't the same thing we went thru two days ago.
    Thursday, May 12, 2011 4:38 PM
  • Oh look, an old term gets new life.  Turns out we are all cluster-f***ed...  (Sorry, i couldn't help myself.)
    Thursday, May 12, 2011 4:39 PM
  • And mail is back down again...
    Utah, USA


    Attempting to find the SPF record using a DNS TEXT record query.
      ExRCA wasn't able to find the SPF record.

     


    That's the error I get when running SMTP test...

    https://www.testexchangeconnectivity.com/

    Thursday, May 12, 2011 4:45 PM
  • absolutely pathetic, MS.
    Thursday, May 12, 2011 5:02 PM
  • Yep.  Down for us as well. 

    Seattle, USA

    Thursday, May 12, 2011 5:03 PM
  • I think this is just crazy that we cannot get any support or real information.  It is very hard to answer questions about what is going on without any information from Microsoft. 

    We are all Technical so just please explain what is going on with our services.

     

     

    Thursday, May 12, 2011 5:07 PM
  • I think this is just crazy that we cannot get any support or real information.  It is very hard to answer questions about what is going on without any information from Microsoft. 

    We are all Technical so just please explain what is going on with our services.

     

     


    Agreed. Does anyone official from MS even read these forums or are they just a dumping ground for us to lament their services?
    Thursday, May 12, 2011 5:08 PM
  • Start tweeting your complaints.  Use hashtag of #microsoftdown - a failure of this magnitude (especially on two consecutive days) should make the evening news.
    Thursday, May 12, 2011 5:13 PM
  • down in so cal again 
     
    Thursday, May 12, 2011 5:14 PM
  • I've sent tips to both gizmodo and engadget...lord knows they like bashing MS at every turn. I'd suggest everyone send reports to other tech-focused websites & blogs to get the word out there about how unreliable this service has become.  Make sure to mention that BPOS is a precursor to Office 365. 
    Thursday, May 12, 2011 5:16 PM
  • This is the 3rd straight day of issues and we've had intermittent problems since we got on board back in January. Sick and tired of it. I'll be looking for another solution. Microsoft can't even run their own email application!

     

    Furious in Pittsburg, KS....

    Thursday, May 12, 2011 5:20 PM
  • Twice this week for my organization. Issue affected all users two days ago and again today we are currently down.

     

    Thursday, May 12, 2011 5:23 PM
  • Third time in 48 hours in Ohio. Tuesday PM, overnight/this morning(from around midnight to 8am) and now.

    Absolutely ridiculous

    Thursday, May 12, 2011 5:27 PM
  • This needs to stop now, degraded and down service basically all week is not what we are paying fo.r  The tech support on the line is not giving proper explanation as to what the issue is. Having a support rep at MS tell me it went from orange to red and back again is disrepectful and has me seriously questioning why we should continue using this service. How about a real explanation and a real fix ?
    Thursday, May 12, 2011 5:27 PM
  • Groan.

    We have been on this service for 2 Weeks... and now SERIOUSLY considering simply turning on our internal Exchange server again.

    I cant believe it is this bad.  I am getting crucified here.  

    (Boston)


    Thursday, May 12, 2011 5:29 PM
  • we are trying to figure out how to migrate back to in-house. it will suck but MS is not using any lube here and my bum is sore and tired...
    Thursday, May 12, 2011 5:33 PM
  • Down 2+ hours as well. NorCal; our org. and our customers...
    Thursday, May 12, 2011 5:33 PM
  • Absolutely ridiculous.  My free Google Apps account has significantly better uptime than this BPOS junk.  I have a large client we just finished migrating last week, they are not impressed, nor happy.  
    Thursday, May 12, 2011 5:36 PM
  • Oh look, an old term gets new life.  Turns out we are all cluster-f***ed...  (Sorry, i couldn't help myself.)

    I noticed that you didn't add a happy face.  No happy faces here, either.  Back to in-house servers we go, I suppose.  This string of incidents will set the cloud/offsite model back months, if not years, i fear..  This may be good for us on the ground, but bad for the hosted model forward momentum.

    I am blown away at the lack of coverage in the searches, media, blogs, etc.  Google BPOS down and you get nothing current!  We need to get the word out, i agree.

    You know we are in a bad place when the best explanation i can give my clients is a copy of this blog.  BTW: Apparently misery doesn't love company.  Did i  mention that most of my clients areattorneys?

     

    Thursday, May 12, 2011 5:36 PM
  • In less than a year on BPOS, i've had about 5-6 issues already that effected more than 50% of my users.   On average we had one of these 1-2 times a year when hosting our own.   This is just happening too much.

    Thursday, May 12, 2011 5:40 PM
  • Oh look, an old term gets new life.  Turns out we are all cluster-f***ed...  (Sorry, i couldn't help myself.)

    I noticed that you didn't add a happy face.  No happy faces here, either.  Back to in-house servers we go, I suppose.  This string of incidents will set the cloud/offsite model back months, if not years, i fear..  This may be good for us on the ground, but bad for the hosted model forward momentum.

    I am blown away at the lack of coverage in the searches, media, blogs, etc.  Google BPOS down and you get nothing current!  We need to get the word out, i agree.

    You know we are in a bad place when the best explanation i can give my clients is a copy of this blog.  BTW: Apparently misery doesn't love company.  Did i  mention that most of my clients areattorneys?

     


    I have personally sent tips to Endgadget and Gizmodo but who knows what good that will do. send them out to any and all tech blogs you can. This needs to defintely get some coverage
    Thursday, May 12, 2011 5:40 PM
  • Has anyone checked the OFFICE365 beta accounts.  Is mail flowing on them.  I have a friend whose 365 account is up and running fine.  That worries me even more.  Maybe they are all getting ready to go leave for Tech Ed next week so they can try to sell the online services even more. 

    Thursday, May 12, 2011 5:40 PM
  • This is the 3rd straight day of issues and we've had intermittent problems since we got on board back in January. Sick and tired of it. I'll be looking for another solution. Microsoft can't even run their own email application!

     

    Furious in Pittsburg, KS....


    AMEN
    Thursday, May 12, 2011 5:45 PM
  • Story should be getting some coverage soon from www.ZDNet.com

    Edit: here is the link!

    http://www.zdnet.com/blog/btl/microsoft-bpos-office-365-uptime-sparking-customer-angst-pleas-for-help/48680

    Thursday, May 12, 2011 6:05 PM
  • here's another article I found on google

     

    http://www.pcmag.com/article2/0,2817,2385076,00.asp

    Thursday, May 12, 2011 6:32 PM
  • Very disappointing! I was told by MS tech. that the issue began around 4:00 a.m. pst. No reason given for the continued issues. They say they will notify me hourly, but I am not holding my breath. 

     

    Thursday, May 12, 2011 6:34 PM
  • From the Americas Dashboard:

    12:39 PM Performance degradation The BPOS Operations team has resolved service degradation for Exchange Online mail flow for organizations served from this region. Email is flowing without delay for ~50% of customers. The team continues to closely monitor mail queues for remaining impacted customers while message delivery returns to normal. Next update will be within one hour or when new information is available.
    11:39 AM Performance degradation The BPOS Operations team has resolved service degradation for Exchange Online mail flow for organizations served from this region. While service is restored, users may experience ~15 minute delays on email sent since 9am PDT. Next update will be within one hour or when new information is available.
    10:58 AM Service interruption The BPOS Operations team is working to resolve service degradation for Exchange Online mail flow for organizations served from this region. Users in affected organizations will experience ~40 minute delays when trying to send or receive e-mail using Outlook, OWA, or mobile devices. The BPOS Operations team is actively working to restore service. Next update will be within one hour or when new information is available.

    Just got one test message that was finally delivered after an hour!!!! Come on MS!!! More like BS!
    Thursday, May 12, 2011 6:42 PM
  • From the Americas Dashboard:

    12:39 PM Performance degradation The BPOS Operations team has resolved service degradation for Exchange Online mail flow for organizations served from this region. Email is flowing without delay for ~50% of customers. The team continues to closely monitor mail queues for remaining impacted customers while message delivery returns to normal. Next update will be within one hour or when new information is available.
    11:39 AM Performance degradation The BPOS Operations team has resolved service degradation for Exchange Online mail flow for organizations served from this region. While service is restored, users may experience ~15 minute delays on email sent since 9am PDT. Next update will be within one hour or when new information is available.
    10:58 AM Service interruption The BPOS Operations team is working to resolve service degradation for Exchange Online mail flow for organizations served from this region. Users in affected organizations will experience ~40 minute delays when trying to send or receive e-mail using Outlook, OWA, or mobile devices. The BPOS Operations team is actively working to restore service. Next update will be within one hour or when new information is available.

    Why am I always part of the 50% of customers that are delayed :)
    Thursday, May 12, 2011 6:44 PM
  • In the same boat here - I just got my first test message sent from the outside at 1pm Eastern at 2:40pm Eastern.  When they say 50% of their customers are affected, do they mean the 50% with active e-mail accounts?  Also, what is with this 40 minute delay joke, it has absolutely not been the case all day!
    Thursday, May 12, 2011 6:46 PM
  • In the same boat here - I just got my first test message sent from the outside at 1pm Eastern at 2:40pm Eastern.  When they say 50% of their customers are affected, do they mean the 50% with active e-mail accounts?  Also, what is with this 40 minute delay joke, it has absolutely not been the case all day!

    I sent 3 tests from GMAIL to my Cloud account over 2 hours ago.  Still haven't received them.
    Thursday, May 12, 2011 6:48 PM
  • http://twitter.com/#!/search?q=%23bpos

    some more people rowing the same boat as us...

    Thursday, May 12, 2011 6:54 PM
  • Same here.
    Thursday, May 12, 2011 6:57 PM
  • As it happens, I'm in the middle of a trial of Exchange Online as our company was looking to move to it in the near future. Between this and the difficulties for users who do not use Outlook, OWA or mobile device that supports Exchange Active Sync, I'm starting to second guess this potential migration. At least to BPOS... Maybe.
    Thursday, May 12, 2011 7:09 PM
  • DON'T DO IT!!!!!!
    Thursday, May 12, 2011 7:17 PM
  • same issues here at our law firm for the past three days. Support has been very elusive about cause of the problem. MS is really hurting our business and I have been asked to look for altenate email solutions after using this for only 6 months. I hope we will be entitled to credit for the breach of their SLA.
    Thursday, May 12, 2011 7:21 PM
  • Honestly, I'm glad we only have one small client and ourselves using BPOS.  We'd likely be fired if any of our clients' on-premise Exchange servers were this unavailable/unreliable and we kept them so in the dark about the root cause and steps we were taking to prevent future issues.
    Thursday, May 12, 2011 7:27 PM
  • We're a worldwide corporation using this. If it doesn't improve, we may have to go back to in-house Exchange.
    Thursday, May 12, 2011 7:28 PM
  • I assume that messages sent during these outages will eventually be delivered and they are not being lost in the cloud.  Can anyone confirm my assumption?
    Thursday, May 12, 2011 7:29 PM
  • email Steve Ballmer.  Let him know what you think: steveb@microsoft.com
    Thursday, May 12, 2011 7:30 PM
  •  

    We are testing BPOS for wider deployment ... looks like MS has shot itself in the foot with these constant outages.

     

    - Why dont they share any information with their users?

    - Why not call an outage an outage? Whats this BS about service degradation?

    - Why do you need admin credentials to even see the system status? They should make it a publicly available page - IF they could keep their servers and service up, that would be their best advertising - considering how badly they are doing, they are better off keeping their service status under lock and key - wait, I think I answered my own question.

    - We shouldn't have to pay for this poor service - hotmail users probably have better uptime than we are experiencing right now.

     

    Absolutely ridiculous - MS cant even run their own product on their own servers.

    Thursday, May 12, 2011 7:30 PM
  • We are getting messages intermittently, both current and from a few hours ago.
    Thursday, May 12, 2011 7:30 PM
  • I can't believe this.  I just assured my clients that This was the best thing for them.  I look like an idiot.  Microsoft, Please help us provide greater reporting to our clients.

     

    commuication is the next best thing to being operational.

    Thursday, May 12, 2011 7:51 PM
  • We started migration last October, and completed coexistence in February.  Based on my experience over the past seven months, I would absolutely not recommend Exchange Online to anyone for any reason.  We are actively planning to move back to in-house Exchange servers, which we ran for years without anything like the sort of problems we've had with Exchange Online.
    Thursday, May 12, 2011 7:55 PM
  • What I dislike almost as much as the actual service failure - is the lack of timely updates (Which they backdate at a later time), overestimation of service restoration, and general lack of information regarding what is going on and what is being done.

    'We know we have a problem and are working on it' is not good enough. 

    This is like getting a root canal.  We are still seeing hours old messages leaking through. Sortof.  VIPs are about to force my hand on this service - and I currently do not blame them in the least.

    Takes alot to alienate a customer this much after only 2weeks.



    Thursday, May 12, 2011 8:05 PM
  •  

    We are testing BPOS for wider deployment ... looks like MS has shot itself in the foot with these constant outages.

     

    - Why dont they share any information with their users?

    - Why not call an outage an outage? Whats this BS about service degradation?

    - Why do you need admin credentials to even see the system status? They should make it a publicly available page - IF they could keep their servers and service up, that would be their best advertising - considering how badly they are doing, they are better off keeping their service status under lock and key - wait, I think I answered my own question.

    - We shouldn't have to pay for this poor service - hotmail users probably have better uptime than we are experiencing right now.

     

    Absolutely ridiculous - MS cant even run their own product on their own servers.

    agreed on all points
    Thursday, May 12, 2011 8:13 PM
  • "The BPOS Operations team continues to monitor email flow in the environment. Email queues continue to drain, but we still see delays of up to 3 hours based on the significant amount of email that is queued. Next update will be within one hour or when new information is available"

     

    3 hours?  I have emails I sent 4 hours ago, I still haven't received. 

    Thursday, May 12, 2011 8:16 PM
  • Here we go again.  I'm in the #microsoftdown tweeting...  Microsoft, can you please explain what is going on here? This feels almost like the PSN outage here... No concrete information for weeks and in a few days they are going to tell us we've all been hacked and change your password.

     

    Honestly,  I've started looking at alternatives, not that I'm switching yet but, wow this 2011, not 1985.  Email should be lightning fast.  Anything less than perfect is pathetic unfortunately.

    Thursday, May 12, 2011 8:18 PM
  • Agreed, the lack of transparency and timely communication is pissing me off more than anything else.  I don't need all the gory details for an occasional service hiccup, but a) this service's hiccups have been far more than occasional since we migrated and b) we're heading into four days of impaired and intermittently failed service.  It's past time that somebody posted an honest and at least somewhat detailed post on the Online Services Team Blog or somewhere -- explaining just exactly what the hell is going on with Exchange Online.  Email is mission-critical and customer-facing for almost all businesses, and most of us came to Exchange Online after hosting our own Exchange infrastructure, so it's not like we're incapable of understanding an honest explanation.
    Thursday, May 12, 2011 8:58 PM
  • Is there a place where I can get the information when something like this happens? I've been out of the office and had no idea what was going on until the owner of my company called me about it.

     

    Is there a place where microsoft will put current issues and notify us when something goes wrong?

    Thursday, May 12, 2011 9:16 PM
  • Is there a place where I can get the information when something like this happens? I've been out of the office and had no idea what was going on until the owner of my company called me about it.

     

    Is there a place where microsoft will put current issues and notify us when something goes wrong?


    Theoretically, that's what https://health.noam.microsoftonline.com is for, although a lot of us feel like MS needs to be a lot more specific and honest on that dashboard.
    Thursday, May 12, 2011 9:25 PM
  • Credit where credit's due, this is a good (if belated) start:

    Time Description
    3:12 PM Performance degradation

    This is a short update on work underway to resolve problems that have occurred with the Exchange Online Service on May 12 2011 and the actions that the team is taking to resolve these problems. Starting at 9:10am PDT, service monitoring detected malformed email traffic on the service. This malformed email traffic resulted in problems sending and receiving email until 10:03am PDT, when the problem was rectified. The offending mail was removed from the service, and service restored. Email was delayed by ~45minutes during this time. A second issue was detected via monitoring at 11:35am PDT, with email stuck in end users outboxes. The issue was remediated at 12:04pm PDT. During this time, more than 1.5 million messages had queued on the service awaiting delivery. This email is now flowing through the system, however because of this large volume of email; we are experiencing delays of as long as 3 hours. The team continues to work to fully resolve the issue, and will provide a full post mortem of this incident following service restoration, and also will provide additional updates on how our service level agreement (SLA) was impacted.

    Thursday, May 12, 2011 9:26 PM
  • Yes - you can go to https://health.noam.microsoftonline.com

    and login with an account with admin rights to your service.

    I warn you that this is not very informative.

     

     

    Thursday, May 12, 2011 9:27 PM
  • http://health.noam.microsoftonline.com

    You must login with admin credentials.  It is usually updated with information 30m to an hour after the initial problem is seen.  They put this up after many people asked for it about 6 months ago, I think?  I remember getting a call from an MS manager asking me questions about 'how I would make their service better" and my response was 'communicate with us when theres a problem, we're IT people too!'  their response was to slap a pretty band-aid on and call it a feature.

    Thursday, May 12, 2011 9:29 PM
  • Credit where credit's due, this is a good (if belated) start:

    Time   Description
    3:12 PM Performance degradation

    This is a short update on work underway to resolve problems that have occurred with the Exchange Online Service on May 12 2011 and the actions that the team is taking to resolve these problems. Starting at 9:10am PDT, service monitoring detected malformed email traffic on the service. This malformed email traffic resulted in problems sending and receiving email until 10:03am PDT, when the problem was rectified. The offending mail was removed from the service, and service restored. Email was delayed by ~45minutes during this time. A second issue was detected via monitoring at 11:35am PDT, with email stuck in end users outboxes. The issue was remediated at 12:04pm PDT. During this time, more than 1.5 million messages had queued on the service awaiting delivery. This email is now flowing through the system, however because of this large volume of email; we are experiencing delays of as long as 3 hours. The team continues to work to fully resolve the issue, and will provide a full post mortem of this incident following service restoration, and also will provide additional updates on how our service level agreement (SLA) was impacted.


    Well it's about time they give us a little more info!!!!
    Thursday, May 12, 2011 9:54 PM
  • I looks as if we are finally back up.  Hopefully for good.  Malformed Email was the reason.  Several.  Is that an attack or what?  Load balancing issues were also given as a somewhat abstract reason for the outages. *Yes, I said outages, not degradations.  

    Here is the latest from MS

    3:12 PM Performance degradation The BPOS Operations team continues to monitor email flow in the environment. 80% of email queues have drained and new email is being sent and received without delays. Next update will be within one hour or when new information is available.
    Thursday, May 12, 2011 10:26 PM
  • Hope so,   I also hopge this is a rare occurance.  Three times in one week.  We are evaluating and are pleased, except these outages are throwing cold water on us.  I really like the ability of e-mail coexistance which gives us the ablitity have limited migration of our users.  We also have a lot of Mac users.   We looked hard at Gapps, Intermedia and Rack.  Intermedia attaction is ex 2010, but they have bad posts (worse than this board) and you have to do a big bang migration.

     

    I would guess that MS will refund some fees for these outages

    Thursday, May 12, 2011 11:48 PM
  • Here is the information from the health dashboard on yesterday's outage:

    Time   Description
    6:58 PM Additional information The BPOS Operations team has resolved the problem affecting Exchange Online mail flow for customers served from this region. Companies may see delays in delivering messages while mail flow returns to normal. A full post mortem of this incident is being compiled by the Operations Team, and will be available via Microsoft Online Services Technical Support as part of our standard incident management process and upon completion of the root cause investigation.
    5:45 PM Additional information Customers may still be experiencing slight delays in sending or receiving emails, however the source of the problem has been corrected and mailflow conditions returning to normal operations.
    3:00 PM Performance degradation The BPOS Operations team continues to investigate service degradation issues with Exchange Online mail flow for organizations served from this region. They have identified the source of the problem and have implemented a fix. Customers may experience a delay in sending or receiving emails, however that delay should be improving as the fix takes effect.
    12:00 PM Performance degradation The BPOS Operations team continues to investigate service degradation issues with Exchange Online mail flow for organizations served from this region. The next service update will be provided within 2 hours if the issue is not resolved.
    9:40 AM Performance degradation The BPOS Operations team is investigating alerts indicating service degradation for Exchange Online mail flow for organizations served from this region. Users in affected organizations may be experiencing delays when trying to send or receive e-mail using Outlook, OWA, or mobile devices. The BPOS Operations team is actively working to determine the root cause and restore service.

    What I can tell you is that services were not completely down as I was able to send test messages to some customers, one of which was received in a timely manner.  Outages like this take time to investigate to find the cause.  If we were to fiip a switch there would be lost mail.  The issue is still under investigation and this particular outage was resolved at approximately 7:00 pm PDT.  Users should not have lost emails as they were queued either locally in Outboxes/Drafts or on the HUB servers.

    For those who got the "Our offices are closed now" message when calling in were actually getting a default message in our IVR.  This has now been changed to reflect that all lines are busy and access to a support agent will be delayed.

    If you opened a service request about this particular outage, you may request a post incident report.  If you did not open a ticket regarding this particular outage and wish a post incident report, you may wish to open a new SR and make the request specifically for the outage on May 10th.

    At this point I will be closing this thread. 

    Friday, May 13, 2011 12:40 AM
    Moderator
  • Here is the information from the health dashboard on yesterday's outage:

    Time   Description
    6:58 PM Additional information The BPOS Operations team has resolved the problem affecting Exchange Online mail flow for customers served from this region. Companies may see delays in delivering messages while mail flow returns to normal. A full post mortem of this incident is being compiled by the Operations Team, and will be available via Microsoft Online Services Technical Support as part of our standard incident management process and upon completion of the root cause investigation.
    5:45 PM Additional information Customers may still be experiencing slight delays in sending or receiving emails, however the source of the problem has been corrected and mailflow conditions returning to normal operations.
    3:00 PM Performance degradation The BPOS Operations team continues to investigate service degradation issues with Exchange Online mail flow for organizations served from this region. They have identified the source of the problem and have implemented a fix. Customers may experience a delay in sending or receiving emails, however that delay should be improving as the fix takes effect.
    12:00 PM Performance degradation The BPOS Operations team continues to investigate service degradation issues with Exchange Online mail flow for organizations served from this region. The next service update will be provided within 2 hours if the issue is not resolved.
    9:40 AM Performance degradation The BPOS Operations team is investigating alerts indicating service degradation for Exchange Online mail flow for organizations served from this region. Users in affected organizations may be experiencing delays when trying to send or receive e-mail using Outlook, OWA, or mobile devices. The BPOS Operations team is actively working to determine the root cause and restore service.

    What I can tell you is that services were not completely down as I was able to send test messages to some customers, one of which was received in a timely manner.  Outages like this take time to investigate to find the cause.  If we were to fiip a switch there would be lost mail.  The issue is still under investigation and this particular outage was resolved at approximately 7:00 pm PDT.  Users should not have lost emails as they were queued either locally in Outboxes/Drafts or on the HUB servers.

    For those who got the "Our offices are closed now" message when calling in were actually getting a default message in our IVR.  This has now been changed to reflect that all lines are busy and access to a support agent will be delayed.

    If you opened a service request about this particular outage, you may request a post incident report.  If you did not open a ticket regarding this particular outage and wish a post incident report, you may wish to open a new SR and make the request specifically for the outage on May 10th.

    Friday, May 13, 2011 12:40 AM
    Moderator
  • http://blogs.technet.com/b/msonline/archive/2011/05/13/update-on-bpos-standard-email-issues.aspx Here is the copy of the blog article. Update On BPOS-Standard Email Issues DaveT_MSFT 12 May 2011 5:47 PM • Comments 0 I lead the engineering organization responsible for BPOS. My team builds, operates and supports our BPOS service, and over the last few days, we have not satisfied our customer’s needs. On Tuesday and today we experienced three separate service issues that impacted customers served from our Americas data center. All of these issues have been resolved and the service is now running smoothly. These incidents were unique to BPOS and not related to Office 365 or any other Microsoft services. I’d like to apologize to you, our customers and partners, for the obvious inconveniences these issues caused. We know that email is a critical part of your business communication, and my team and I fully recognize our responsibility as your partner and service provider. We will provide a full post mortem, and will also provide additional updates on how our service level agreement (SLA) was impacted. We will be proactively issuing a service credit to our impacted customers. I also want to provide more detail about the recent issues. On Tuesday at 9:30am PDT, the BPOS-S Exchange service experienced an issue with one of the hub components due to malformed email traffic on the service. Exchange has the built-in capability to handle such traffic, but encountered an obscure case where that capability did not work correctly. The result was a growing backlog of email. By 12:00am PDT, the malformed traffic was isolated and the mail queues cleared. The delays encountered by customers varied, on the order of 6-9 hours. Short term mitigation was implemented and a fix was under development. At 9:10am PDT today, service monitoring again detected malformed email traffic on the service. The problem was resolved at 10:03am, but users experienced up to 45 minute email delays during this time. A second, but related issue was detected via monitoring at 11:35am PDT, resulting in email stuck in some end users’ outboxes. The issue was remediated at 12:04pm PDT. During this time, more than 1.5 million messages had queued on the service awaiting delivery. The backlog was 90% clear by 4:12 PM, but because of this large backlog of email, customers may have experienced delays of as long as 3 hours. We are implementing a comprehensive fix to both problems. As a result of Tuesday’s incident, we feel we could have communicated earlier and been more specific. Effective today, we updated our communications procedures to be more extensive and timely. We understand that it is critical for our customers to be as fully informed as possible during service impacting events. We will continue to improve the timeliness and specificity of our communications. The primary mechanism for communicating to our customers on issues has been and will continue to be the Service Health Dashboard. For North America, that dashboard is at https://health.noam.microsoftonline.com/. In an unrelated incident, starting at 1:04am PDT, service monitoring detected a failure in the Domain Name Service (DNS) hosting the http://mail.microsoftonline.com domain. This failure, prevented users from accessing Outlook Web Access hosted in the Americas, and partially impacted some functionality of Microsoft Outlook and Microsoft Exchange ActiveSync devices. The team diagnosed, and fixed, an underlying problem in the servers hosting Domain Name Service (DNS) for the http://mail.microsoftonline.com domain, and restored service at 4:52am PDT. The team identified a number of improvements in our handling of problems associated with DNS, and will provide a full post mortem of this incident available through Microsoft Support. As I’ve said before, all of us in the BPOS team and at Microsoft appreciate the serious responsibility we have as a service provider to you, and we know that any issue with the service is a disruption to your business – that’s not acceptable. I want to assure you that we are investing the time and resources required to ensure we are living up to your – and our own – expectations for a quality service experience every day. As always, if you are experiencing any service issues, we encourage customers to check the Service Health Dashboard for the latest information or contact our customer support team. Our customer support is available 24 hours a day by telephone or via Service Requests submitted from the Microsoft Online Services Administration Center. Dave Thompson Corporate Vice-President, Microsoft Online Services
    Friday, May 13, 2011 2:17 AM
  • http://blogs.technet.com/b/msonline/archive/2011/05/13/update-on-bpos-standard-email-issues.aspx

     

    Here is the copy of the blog article.

    Update On BPOS-Standard Email Issues

    DaveT_MSFT

    12 May 2011 5:47 PM

    ·         

    I lead the engineering organization responsible for BPOS.  My team builds, operates and supports our BPOS service, and over the last few days, we have not satisfied our customer’s needs.  On Tuesday and today we experienced three separate service issues that impacted customers served from our Americas data center.  All of these issues have been resolved and the service is now running smoothly. These incidents were unique to BPOS and not related to Office 365 or any other Microsoft services.

    I’d like to apologize to you, our customers and partners, for the obvious inconveniences these issues caused.  We know that email is a critical part of your business communication, and my team and I fully recognize our responsibility as your partner and service provider. We will provide a full post mortem, and will also provide additional updates on how our service level agreement (SLA) was impacted.   We will be proactively issuing a service credit to our impacted customers.

    I also want to provide more detail about the recent issues.

    On Tuesday at 9:30am PDT, the BPOS-S Exchange service experienced an issue with one of the hub components due to malformed email traffic on the service.   Exchange has the built-in capability to handle such traffic, but encountered an obscure case where that capability did not work correctly.  The result was a growing backlog of email.  By 12:00am PDT, the malformed traffic was isolated and the mail queues cleared.  The delays encountered by customers varied, on the order of 6-9 hours.   Short term mitigation was implemented and a fix was under development.

    At 9:10am PDT today, service monitoring again detected malformed email traffic on the service.   The problem was resolved at 10:03am, but users experienced up to 45 minute email delays during this time.   A second, but related issue was detected via monitoring at 11:35am PDT, resulting in email stuck in some end users’ outboxes. The issue was remediated at 12:04pm PDT. During this time, more than 1.5 million messages had queued on the service awaiting delivery.   The backlog was 90% clear by 4:12 PM, but because of this large backlog of email, customers may have experienced delays of as long as 3 hours.   We are implementing a comprehensive fix to both problems.  

    As a result of Tuesday’s incident, we feel we could have communicated earlier and been more specific.  Effective today, we updated our communications procedures to be more extensive and timely.   We understand that it is critical for our customers to be as fully informed as possible during service impacting events.  We will continue to improve the timeliness and specificity of our communications.  The primary mechanism for communicating to our customers on issues has been and will continue to be the Service Health Dashboard.  For North America, that dashboard is at https://health.noam.microsoftonline.com/.

     In an unrelated incident, starting at 1:04am PDT, service monitoring detected a failure in the Domain Name Service (DNS) hosting the http://mail.microsoftonline.com domain.  This failure, prevented users from accessing Outlook Web Access hosted in the Americas, and partially impacted some functionality of Microsoft Outlook and Microsoft Exchange ActiveSync devices.  The team diagnosed, and fixed, an underlying problem in the servers hosting Domain Name Service (DNS) for the http://mail.microsoftonline.com domain, and restored service at 4:52am PDT.  The team identified a number of improvements in our handling of problems associated with DNS, and will provide a full post mortem of this incident available through Microsoft Support.

     As I’ve said before, all of us in the BPOS team and at Microsoft appreciate the serious responsibility we have as a service provider to you, and we know that any issue with the service is a disruption to your business – that’s not acceptable.  I want to assure you that we are investing the time and resources required to ensure we are living up to your – and our own – expectations for a quality service experience every day.

    As always, if you are experiencing any service issues, we encourage customers to check the Service Health Dashboard for the latest information or contact our customer support team. Our customer support is available 24 hours a day by telephone or via Service Requests submitted from the Microsoft Online Services Administration Center.

     

    Dave Thompson

    Corporate Vice-President, Microsoft Online Services

    Friday, May 13, 2011 2:20 AM