locked
Exchange 2010 MP causing mailbox corruption / quarantine? RRS feed

  • Question

  • Last week we imported the latest Exchange 2010 MP, version 14.3.38.2. We had previously been running the Feb 2011 version. Withing an hour of the MP import our Exchange mailbox server started reporting many mailboxes as quarantined. We had over 50 events like the one below

    Log Name:      Application
    Source:        MSExchangeIS
    Date:          6/24/2012 3:04:19 PM
    Event ID:      10018
    Task Category: General
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      MBSERVER003.domain.com
    Description:
    The mailbox for user a8d95d3f-2109-4d55-942f-69ac8c05ec04: /o=Our Organization/ou=First Administrative Group/cn=Recipients/cn=Jdoe has been quarantined. Access to this mailbox will be restricted to administrative logons for the next 6 hours.
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="MSExchangeIS" />
        <EventID Qualifiers="49158">10018</EventID>
        <Level>2</Level>
        <Task>6</Task>
        <Keywords>0x80000000000000</Keywords>
        <TimeCreated SystemTime="2012-06-24T22:04:19.000000000Z" />
        <EventRecordID>1196226</EventRecordID>
        <Channel>Application</Channel>
        <Computer>MBSERVER003.domain.com</Computer>
        <Security />
      </System>
      <EventData>
        <Data>a8d95d3f-2109-4d55-942f-69ac8c05ec04: /o=Our Organization/ou=First Administrative Group/cn=Recipients/cn=Jdoe</Data>
        <Binary>5B444941475F4354585D000026000000FF802600000000000002180000001D48201000000000FD5C201000000000BD5F201000000000</Binary>
      </EventData>
    </Event>

    After struggling with moving mailboxes and attempts at repairing our mb databases, we spoke to Microsoft who said to try and shut down the SCOM agent. The support folks said that they've seen this happen before. We did what they suggested and the events disappeared. Wow. 

    At this point is seems it must be the newly-imported Exchange MP - has anyone run into this issue before? 


    Orange County District Attorney

    Monday, June 25, 2012 2:13 PM

Answers

  • They gave me two options, I'm going with #1 myself

    Workaround 1:

    Disable these monitors from SCOM:

    KHI: The database copy is low on database volume space and continues to grow. The volume is under 25% free.

    KHI: The database copy is low on database volume space and continues to grow. The volume has reached error levels under 16% free.

    KHI: The database copy is low on database volume space and continues to grow. The volume has reached critical levels 8% free.

    KHI: Failed to execute Troubleshoot-DatabaseSpace.ps1

    Workaround 2:

    On EVERY affected Exchange server, we can change the constants being used by the script. These are stored in the StoreTSConstants.ps1 file.

    Location of the file is C:\program files\Exchnage\V14\Scripts (installation path might be different on customer's machines)

    You can open it in notepad and change the values. By default towards the bottom part of the file, you should see:

    # The percentage of disk space for the EDB file at which we should start quarantining users.

    $PercentEdbFreeSpaceDefaultThreshold = 25

    # The percentage of disk space for the logs at which we should start quarantining users.

    $PercentLogFreeSpaceDefaultThreshold = 25

    # The percentage of disk space for the EDB file at which we are at alert levels.

    $PercentEdbFreeSpaceAlertThreshold = 16

    # The percentage of disk space for the EDB file at which we are at critical levels.

    $PercentEdbFreeSpaceCriticalThreshold = 8

    #The number of hours we can wait before running out of space.

    $HourDefaultThreshold = 12

    Ideally, the values above need to be changed as per the customer's environment. However, to begin, make the following changes:

    Beginning at the top,

    change 25 to 10

    change 25 to 10

    change 16 to 10

    do NOT change 8

    change 12 to 2

    After making these changes, you will need to monitor the servers and might have to finetune these settings further.

    The thing to keep in mind while doing this is that both $PercentEdbFreeSpaceDefaultThreshold and $PercentLogFreeSpaceDefaultThreshold NEED to be greater than or equal to $PercentEdbFreeSpaceAlertThreshold


    Orange County District Attorney

    Tuesday, June 26, 2012 6:34 PM

All replies

  • Not sure and we have not imported the latest Exchange MP.  So you are telling me the fix is to bounce your SCOM agents on Exchange 2010 servers?  That's rich.


    Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/

    Monday, June 25, 2012 3:17 PM
  • Actually, the answer is to stop or disable the SCOM agent on the mailbox server. Once you do that, and restart the Information Store service, the mailboxes all behave and don't appear to be corrupted. According to Microsoft, mailbox corruption can happen if Exchange detects

    • A thread that is doing work for a mailbox has crashed.
    • More than 5 threads allocated to process a mailbox, have not
      progressed for long time.
    My guess is that SCOM is doing something to our mailboxes that Exchange detects as one of these two reasons.

    Orange County District Attorney

    Monday, June 25, 2012 3:22 PM
  • I am wondering if this was occurring before in the previous exchange 2010 mp.  Did they say it was?  This is a pretty big issue if you have to bounce the IS service because of possible workflow issues / synthetic transaction issues of the Exchange 2010 MP.

    After you bounced the IS service, have you seen this issue since and was it happening on all of your mailbox servers?


    Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/

    Monday, June 25, 2012 4:45 PM
  • Actually, no. It just started with the import of the new MP. We only have one Mailbox Server and that's the box we're seeing it on.

    I'm going to open a case with PSS in SCOM to see if they've seen anything like this.


    Orange County District Attorney

    Monday, June 25, 2012 5:07 PM
  • We had this same problem last Thursday.  After struggling through 6 hours of database repairs, just like the original poster, Microsoft support finally found that it's being caused by the "C:\Program Files\Microsoft\Exchange Server\V14\Scripts\troubleshoot-databasespace.ps1" script run by scom when you 1st install the MP. They had us disable the scom agent and the user quarantining stopped.

    If you take a look in your event log at under "Applications and service logs" there's the "Microsoft-Exchange-Troubleshooters/Operational" log.  In there you'll find events about database space starting right round the time you installed the new MP.

    To me this looks like a bug in the new MP.  It looks like scom is executing the script even though it's disabled by default.  So far I haven't found any way to override the default thresholds in the script, I've changed the defaults in the C:\Program Files\Microsoft\Exchange Server\V14\Scripts\storetsconstants.ps1 on each of the severs but the script run by scom uses it's own temp directory.    

    I'm pursuing the issue with PSS too, I'll report any findings.


    Tuesday, June 26, 2012 2:12 PM
  • I finally opened a case with PSS and they've confirmed basically what you've said Dave. They've told me they are aware of the issue and are working on a fix.

    Orange County District Attorney

    Tuesday, June 26, 2012 2:25 PM
  • It's not a bug, apparently the script that does this was set to time out after five minutes originally and they changed the timeout to something much larger, so now the script is finally completing (working).  MSFT is aware of the issue and is working on it.  I will say that until I know how to curb this behavior, this new MP is not going into production.  I would have a hard time explaining to my management that a script from the Exchange 2010 MP caused mail outages.  

    Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/

    Tuesday, June 26, 2012 3:36 PM
  • So they fixed the timeout "bug" in the MP, I get that.  Now why is the script running in the 1st place if the rule is disabled when you first install the MP?

    Tuesday, June 26, 2012 5:14 PM
  • I am not certain which module is kicking off the script.  Something we want to find out so that we can disable it and if not, then hopefully we can change some thresholds and or time interval.

    Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/

    Tuesday, June 26, 2012 5:17 PM
  • Bugs me that the MP is still available for download. I installed it this morning and noticed the issue within 30 mins.

    Tuesday, June 26, 2012 5:48 PM
  • I just received a couple of Workarounds for the issue from PSS. Hopefully this will allow me to start monitoring the box again.

    Orange County District Attorney

    Tuesday, June 26, 2012 6:03 PM
  • What was it Sandy?

    Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/

    Tuesday, June 26, 2012 6:25 PM
  • They gave me two options, I'm going with #1 myself

    Workaround 1:

    Disable these monitors from SCOM:

    KHI: The database copy is low on database volume space and continues to grow. The volume is under 25% free.

    KHI: The database copy is low on database volume space and continues to grow. The volume has reached error levels under 16% free.

    KHI: The database copy is low on database volume space and continues to grow. The volume has reached critical levels 8% free.

    KHI: Failed to execute Troubleshoot-DatabaseSpace.ps1

    Workaround 2:

    On EVERY affected Exchange server, we can change the constants being used by the script. These are stored in the StoreTSConstants.ps1 file.

    Location of the file is C:\program files\Exchnage\V14\Scripts (installation path might be different on customer's machines)

    You can open it in notepad and change the values. By default towards the bottom part of the file, you should see:

    # The percentage of disk space for the EDB file at which we should start quarantining users.

    $PercentEdbFreeSpaceDefaultThreshold = 25

    # The percentage of disk space for the logs at which we should start quarantining users.

    $PercentLogFreeSpaceDefaultThreshold = 25

    # The percentage of disk space for the EDB file at which we are at alert levels.

    $PercentEdbFreeSpaceAlertThreshold = 16

    # The percentage of disk space for the EDB file at which we are at critical levels.

    $PercentEdbFreeSpaceCriticalThreshold = 8

    #The number of hours we can wait before running out of space.

    $HourDefaultThreshold = 12

    Ideally, the values above need to be changed as per the customer's environment. However, to begin, make the following changes:

    Beginning at the top,

    change 25 to 10

    change 25 to 10

    change 16 to 10

    do NOT change 8

    change 12 to 2

    After making these changes, you will need to monitor the servers and might have to finetune these settings further.

    The thing to keep in mind while doing this is that both $PercentEdbFreeSpaceDefaultThreshold and $PercentLogFreeSpaceDefaultThreshold NEED to be greater than or equal to $PercentEdbFreeSpaceAlertThreshold


    Orange County District Attorney

    Tuesday, June 26, 2012 6:34 PM
  • Sounds about right from what I have been told.  Thank you Sandy for sharing this.

    Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/

    Tuesday, June 26, 2012 6:36 PM
  • Sandy can you send me an email please to my personal email address.  Thanks.

    Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/

    Wednesday, June 27, 2012 4:45 AM
  • We had the exact same issue. I guess Microsoft pulled out the latest Exchange 2010 Mangement Pack because the Pinnpoint URL from http://systemcenter.pinpoint.microsoft.com/en-US/applications/microsoft-exchange-server-2010-monitoring-management-pack-12884902079 -> http://www.microsoft.com/en-us/download/details.aspx?id=692 don't work anymore.

    This kind of action within MP should be disabled by default and the functionality should become from a single Rule. Now it's enabled by default and the functionality comes from Correlation Engine with those four monitors?

    KHI: The database copy is low on database volume space and continues to grow. The volume is under 25% free.
    KHI: The database copy is low on database volume space and continues to grow. The volume has reached error levels under 16% free.
    KHI: The database copy is low on database volume space and continues to grow. The volume has reached critical levels 8% free.
    KHI: Failed to execute Troubleshoot-DatabaseSpace.ps1

    I hope Microsoft will release fix ASAP.
    • Edited by Satak Wednesday, June 27, 2012 1:48 PM
    Wednesday, June 27, 2012 1:47 PM
  • I was told they are going to fix the issue and then re publish the mp soon.

    Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/

    Wednesday, June 27, 2012 3:33 PM
  • Good to hear! I'll be looking for it.

    Orange County District Attorney

    Wednesday, June 27, 2012 4:22 PM
  • Friday, June 29, 2012 6:35 AM