none
Intermittent "Your mailbox appears to be unavailable. Try to access it again in 10 seconds" in OWA 2010

    Question

  • We have an "interesting" intermittent problem affecting one of our OWA (2010) installations.

    We are in the middle of a migration from Exchange 2003 to Exchange 2010 (same forest / organisation).  2010 setup is 2 x CAS/HT, 2 x Mailbox in a DAG configuration, with all databases currently mounted on one host.  There are <20 users during this UAT phase using Exchange 2010.  All servers are SP2 Rollup 4.

    Intermittently, when using OWA, the 2010 users are presented after login, the "Your mailbox appears to be unavailable.  Try to access it again in 10 seconds" message.  Pressing refresh in the browser instantly brings up the mailbox.  This happens on various browsers and client types.  After accessing the mailbox properly, the session continues normally without any perceived problems.  This has been ongoing for days.

    All required Exchange services are running on all four servers (test-servicehealth comes back clean).

    I have restarted the CAS/HT servers.  BPA seems clear of anything that might affect OWA.

    Get-MailboxDatabaseCopyStatus shows that the databases are all mounted, and the copies are healthy.

    Test-MapiConnectivity is successful to all databases.

    Test-OutlookConnectivity is successful for all scenarios.

    Test-OutlookWebServices is successful for all tests, with responses received from the correct services.

    Event logs show nothing obvious, on CAS or Mailbox servers.

    All I'm seeing is a long "time-taken" in the initial get request in the IIS logs on the CAS (over 20 seconds).

    Any help or advice would be appreciated.

    Many thanks in advance!
    Friday, January 25, 2013 4:04 PM

Answers

All replies

  • Test-ServiceHealth
    Make sure all the necessary services are running.

    Om

    View Om Prakash Nath's profile on LinkedIn


    Friday, January 25, 2013 7:54 PM
  • "All required Exchange services are running on all four servers (test-servicehealth comes back clean)."
    Friday, January 25, 2013 7:56 PM
  • Is this happening with all the mailboxes or only with a particular mailbox?

    Om

    View Om Prakash Nath's profile on LinkedIn

    Friday, January 25, 2013 8:00 PM
  • Most, if not all of them. Accessed on the LAN or via the web.
    Friday, January 25, 2013 8:03 PM
  • During off business hours, restart the store service once, this works some time.

    Om

    View Om Prakash Nath's profile on LinkedIn

    Friday, January 25, 2013 8:11 PM
  • I'll restart the information store service on both mailbox servers later and test again. Thanks Om.
    Friday, January 25, 2013 8:17 PM
  • Information store restarted on both nodes, databases back on same node as before the service restart.

    So far so good, but will test over the next couple of days (users not back in until Monday) and report the results back here.

    Thanks again.

    Friday, January 25, 2013 11:53 PM
  • Hi thezookeeper

    Thanks for the reply

    Also, I Found another thread said they Start the service "Microsoft Exchange Information Store" and then issue had been resolved.

    http://social.technet.microsoft.com/Forums/en-US/exchange2010/thread/d0977cb3-646b-44ca-b30b-34d2abe58a6e

    So, If no more further questions,please Mark Om's Post and Finish this thread

    Cheers


    Zi Feng
    TechNet Community Support

    Monday, January 28, 2013 7:56 AM
  • The information stores were both started, but I re-started them as per Om's advice.

    Unfortunately, this didn't work.  Now that users are testing the system, the message is intermittently appearing again.

    See below an example of an IIS log entry from this morning, where the "time-taken" hits over 29 seconds before the message appears, yet there's an sc-status of 200, and a refresh takes you straight in, and the "time-taken" drops to next to nothing :

    #Fields: cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status time-taken
    GET /owa/ &ex=E122 443 DOMAIN\user x.x.x.x Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.17+(KHTML,+like+Gecko)+Chrome/24.0.1312.56+Safari/537.17 200 0 0 29093

    There is nothing in the event logs or our other monitoring to suggest there was a problem.

    Any further suggestions would be appreciated.

    Many thanks once again.

    Monday, January 28, 2013 10:29 AM
  • Any luck if you preform a iisreset /noforce ?

    Om

    View Om Prakash Nath's profile on LinkedIn

    Monday, January 28, 2013 12:20 PM
  • Are you suggesting on Mailbox servers or CAS?  For CAS, I actually rebooted both servers, and the problem was still present.

    I haven't tried an IISRESET on the Mailbox servers, but I believe the communication is RPC between CAS and Mailbox servers, so IISRESET on there should have no effect?

    Thanks again.


    • Edited by thezookeeper Monday, January 28, 2013 12:27 PM more specific that IIS reset referred to is on mailbox servers
    Monday, January 28, 2013 12:27 PM
  • It was for CAS servers..

    If you have DAG configured, can you move of the active database copy to the 2nd DAG node?


    Om

    View Om Prakash Nath's profile on LinkedIn

    Monday, January 28, 2013 12:36 PM
  • Thanks for your persistence on this one Om.

    Yes, I can move the active databases to the 2nd node.  I did that on Friday night when I was restarting the information store service.

    Tasks performed to restart information store:

    • Restart information store service on node 2
    • Move databases to node 2
    • Restart information store service on node 1
    • Move databases back to node 1

    I'd rather keep the databases on the current node, to keep as many variables static as possible, but if switching them to node 2 can be diagnostic, I'm happy to give it a try!

    Thanks again.

    Monday, January 28, 2013 12:43 PM
  • How this got stated? Any maintenance/patching??

    Om

    View Om Prakash Nath's profile on LinkedIn

    Monday, January 28, 2013 12:52 PM
  • This is a completely new environment that we've built, to replace an ageing Exchange 2003 organisation.

    We've built the servers in the same forest (and domain) as the Exchange 2003 servers, everything else seems fine, all our other tests are fine, but the UAT users that we've migrated across are reporting this issue.

    Monday, January 28, 2013 1:06 PM
  • How do you loadbalance your two CAS Servers?

    Martina Miskovic

    Monday, January 28, 2013 1:21 PM
  • Cisco ACE.

    We're source IP sticky, and the same issue affects clients via both CAS.

    Monday, January 28, 2013 1:23 PM
  • See if you can reproduce the error when bypassing the ACE (use the local host file)
    If you can't, then I think you can be pretty sure that the problem lies somewhere in your ACE.

     


    Martina Miskovic

    Monday, January 28, 2013 1:34 PM
  • Any HTTP caching on the load balancers set?

    If you can repro this for 1 user, bypass the load balancer via a hosts file and test.

    Better still remove the LB out of the equation and test.


    Sukh

    Monday, January 28, 2013 1:35 PM
  • Many thanks for your answers and suggestions so far.

    I understand what you guys are saying, but the timeout is happening between the CAS and the mailbox.  The load balancer is out of the equation at this point.  I can see no way of the load balancer causing a 29 second wait for a GET request to /owa, that is logged in the IIS logs on the CAS.

    I have read elsewhere, that potentially session idle timeouts can cause this, on the firewall between the CAS and mailbox, and I have a change request in for tomorrow night to remove that for TCP/135 and all high ports.

    Unfortunately, because of the intermittent nature of the problem, I'd have to try and remove the ACE from everybody's connection, and right now that isn't possible - we load balance the connection to ISA 2006, then again to CAS.

    I think it's important to focus on what's actually causing the time-taken in the IIS logs to be so high, and that sits behind all that infrastructure.

    What are your thoughts?

    Thanks again.

    Monday, January 28, 2013 1:44 PM
  • I have read elsewhere, that potentially session idle timeouts can cause this, on the firewall between the CAS and mailbox, and I have a change request in for tomorrow night to remove that for TCP/135 and all high ports.


    Are you saying that you have a firewall between CAS and MB?
    That is not a supported configuration.

    I still think you should run some tests without having the ACE in the middle and using the local host file on some Computers is an easy way to do just that.

    Martina Miskovic

    Monday, January 28, 2013 1:49 PM
  • Can you clarify the firewall between your mbx & cas servers?

    If you have a repro do that test for a user.

    Now if you do have a firewall between the mbx & cas then that may well be your issue. Something which wasn't mentioned before.


    Sukh

    Monday, January 28, 2013 1:55 PM
  • Just wondering, any redirection configured for OWA?

    Om

    View Om Prakash Nath's profile on LinkedIn

    Monday, January 28, 2013 1:57 PM
  • Thanks Martina.  I'll make sure any tests I perform are done taking the ACE and ISA out of the equation.
    Monday, January 28, 2013 2:12 PM
  • We have a Juniper between the CAS and mailbox servers.  Full IP is open bidirectionally between the two sets of IP addresses, and with all rules logging, no packets have been recorded as blocked.

    We have 3 other Microsoft Exchange setups using the same infrastructure and firewall setup (2 x 2007, 1 x 2010), without this particular issue being presented.

    Sorry, I'm not sure what you mean by "if you have a repro do that test for a user".  Happy to perform any tests, but just unsure what you're suggesting.

    Many thanks!

    Monday, January 28, 2013 2:16 PM
  • We have the majority of users still on 2003, and we do have a rule in ISA publishing the legacy exchange.  Is that what you meant?

    Thanks again.

    Monday, January 28, 2013 2:17 PM
  • To isolate the thing, perform this little testing:

    1.Create a new test database.

    2.Create a new test user.

    3.Create a test mailbox for this test user in the test database.

    Now, try to access this mailbox via OWA.


    Om

    View Om Prakash Nath's profile on LinkedIn

    Monday, January 28, 2013 2:33 PM
  • I'll give that a try, thanks Om.

    Be aware though, that our users are spread across multiple databases, and there is a mix of migrated and new.

    Monday, January 28, 2013 2:43 PM
  • What I'm saying is that if you can reproduce the issue with a user then do the tests with the hostfile for that one user.

    I'm not fully convinced this an Exch issue. It either your Juniper or the LB.

    If you do the hosts file test we can rule out the LB, as for the Juniper, either shut that down or place the the CAS on the same LAN/subnet as MBX and test again.


    Sukh

    Monday, January 28, 2013 2:45 PM
  • Thanks for the clarification Sukh.

    The biggest issue we have is that it's intermittent.  I've not experienced it myself since restarting the information stores on Friday night, but other users have.  It doesn't happen every time they log in either, so narrowing it down is a bit difficult.

    I'm going to make sure that all testing I do myself is direct to CAS, avoiding the load balancers, and we'll see what happens.

    Unfortunately, taking the Juniper out of the equation is not an option right now.  If all other diagnostics show that there's no other possible diagnosis, we'll have to go down that road, but for now I have to wait for my change tomorrow night for the idle timeout, and see how it goes!

    I understand that having the firewall there is not a supported configuration, but right now I have to work with it, and try to get things running as normal.

    Thanks again.

    Monday, January 28, 2013 2:55 PM
  • How about moving the CAS to the MBX subnet or maybe build a new MBX on the same subnet as CAS for testing purposes.

    I suppose you should let the change go through 1st.


    Sukh

    Monday, January 28, 2013 2:58 PM
  • Thanks Sukh - I could probably get a mailbox server in the same network zone as the CAS for testing.  I agree that I need to wait for the first change to be implemented first.

    Thank you again for your responses!

    Monday, January 28, 2013 3:01 PM
  • I've marked Om's suggestion to restart the information store services as the answer, because we have had only one example of the issue since performing those actions, and I'm going to treat that as a one off, unrelated to the originally reported issue.  That was happening regularly, to multiple people, throughout each day.

    Many thanks for all suggestions and help!

    • Proposed as answer by Rehan Miah Friday, July 26, 2013 9:44 PM
    • Unproposed as answer by Rehan Miah Friday, July 26, 2013 9:44 PM
    Thursday, January 31, 2013 11:19 AM
  • I got this error however this was due to the mailbox database being unmounted.
    you can check thestatus of the mailbox by entering the following command

    get-mailboxdatabasecopystatus -identity "mailbox database name"

    this will tell you if the mailbox is unmounted if it is unmounted then you would need to mount it and this can be done with the following command

    mount-database -identity "mailbox database name"

    this should mount the database and you should no longer get the error message appearing.

    Friday, July 26, 2013 9:44 PM