none
Determining cause of back pressure events due to version buckets

    Question

  • Several times a week, our Exchange 2007 server (SP2 UR4) experiences back pressure events (event ID 15004, see below) due to high "version buckets". What is the best method of determining the root cause of this condition?

    Event Type: Warning
    Event Source: MSExchangeTransport
    Event Category: ResourceManager
    Event ID: 15004
    Date:  10/31/2012
    Time:  2:29:06 PM
    User:  N/A
    Computer: EXCHANGE
    Description:
    Resource pressure increased from Normal to High.

    Resource utilization of the following resources exceed the normal level:
    Version buckets = 246 [High] [Normal=80 Medium=120 High=200]

    Back pressure caused the following components to be disabled:
    Inbound mail submission from Hub Transport servers
    Inbound mail submission from the Internet
    Mail submission from the Pickup directory
    Mail submission from the Replay directory
    Mail submission from Mailbox servers
    Mail delivery to remote domains

    The following resources are in the normal state:
    Queue database and disk space ("E:\EXCHSRVR\TransportRoles\data\Queue\mail.que") = 12% [Normal] [Normal=95% Medium=97% High=99%]
    Queue database logging disk space ("E:\EXCHSRVR\TransportRoles\data\Queue\") = 15% [Normal] [Normal=95% Medium=97% High=99%]
    Private bytes = 7% [Normal] [Normal=71% Medium=73% High=75%]
    Physical memory load = 50% [limit is 94% before message dehydration occurs.]

    Wednesday, October 31, 2012 7:41 PM

Answers

  • Hi,

    Based on my experience, this version bucket issue occurs because of an unexpectedly high volume of incoming messages, spam attacks, problems with the message queue database integrity, or hard disk performance.

    When we see the warning again in the application log again, we could check the queue and see if there are a large number of messages or large size messages, or we could check the mail.que database size.

    We could run :  Get-MessageTrackingLog -Start "***" -End "***" to check the mail flow when the issue happens.

    Thanks,

    Andy

    Friday, November 02, 2012 7:13 AM
    Moderator
  • Same here. The 2007 servers exist only to support legacy systems that still use Outlook 98 (arrggh), plus the journal mailbox for Enterprise Vault, which is, I believe, what's causing the back pressure issue. Just 5 mailboxes. Everyone else is on 2010, and I've never seen a back pressure event on the 2010 servers in the year or so since they've been installed.

    I believe the back pressure is occurring whenever the journal mailbox receives a particularly large message at the same time that it's processing what amounts to bulk mail generation during our nightly sales processing cycle. We're doing org level journaling, so it's got messages coming in from all over the world, and a few seem to be very large (> 20 MB). Not too often, but several times a day.

    The problem may go away once I move that journal mailbox over to the 2010 environment. That won't happen until Q1 next year, though. Thanks for the advice. I appreciate it.

    Friday, November 02, 2012 3:46 PM
  • Yes, I think that's what's going on here. The legacy systems generate what amounts to hundreds of messages during our nightly middleware processing of store sales. That, combined with the occasional very large journal message creates a back pressure situation that lasts only a few seconds, but long enough to generate the event log, which then trips an alert in SCOM.

    Not much I can do at this time since we're in a system freeze during Q4, but it's on my radar for early next year. Thanks for the advice.

    Mike

    Friday, November 02, 2012 3:55 PM

All replies

  • Did you implement the recommended changes already?

    http://blogs.technet.com/b/exchange/archive/2008/05/14/3405502.aspx

    New maximum database cache size guidance for Exchange 2007 Hub Transport Server role

    Read through that whole article, it discusses the possible causes for version bucket issues

    Wednesday, October 31, 2012 8:11 PM
  • Yes, the DatabaseMaxCacheSize is already at the recommended size of 536870912, and the VersionBucket*Threshold values are at the normal/medium/high defaults of 80, 120, and 200.

    The server hardware is fairly decent for a 4-year old box - dual quad core procs, 8 GB ram, 15k SAS drives in a RAID-1 array. No performance issues observed.

    Wednesday, October 31, 2012 8:45 PM
  • Yes, the DatabaseMaxCacheSize is already at the recommended size of 536870912, and the VersionBucket*Threshold values are at the normal/medium/high defaults of 80, 120, and 200.

    The server hardware is fairly decent for a 4-year old box - dual quad core procs, 8 GB ram, 15k SAS drives in a RAID-1 array. No performance issues observed.


    That does seem adequate. What is going on as far as message traffic during these alerts?
    Wednesday, October 31, 2012 9:27 PM
  • Well, to be honest, I could use a little advice in that area. Outside of pulling tracking logs, I'm not sure how to monitor or report on message traffic. Which counters should I be monitoring? Thanks!
    Thursday, November 01, 2012 2:22 AM
  • Well, to be honest, I could use a little advice in that area. Outside of pulling tracking logs, I'm not sure how to monitor or report on message traffic. Which counters should I be monitoring? Thanks!

    While the events are happening, I would start with the queues and see whats going on. COuld be a large mailing or PF replication etc...

    If you want to use perfmon:

    http://technet.microsoft.com/en-us/library/bb201704(v=exchg.80).aspx

    Monitoring Hub Transport Servers

    Thursday, November 01, 2012 2:52 PM
  • Hard to catch them, as they're usually very short, less than a minute. I've got perfmon going now, waiting for the next event. We also have SCOM 2007 monitoring our systems, so I'll see if I can get historical data for some of those counters, if we're capturing them. Thank you for your help.
    Thursday, November 01, 2012 8:25 PM
  • Hard to catch them, as they're usually very short, less than a minute. I've got perfmon going now, waiting for the next event. We also have SCOM 2007 monitoring our systems, so I'll see if I can get historical data for some of those counters, if we're capturing them. Thank you for your help.

    One thing to note. I havent seen those type of issues in Exchange 2010 or 2013.  Used to see them all the time with 2007.

    Thursday, November 01, 2012 9:26 PM
  • Hi,

    Based on my experience, this version bucket issue occurs because of an unexpectedly high volume of incoming messages, spam attacks, problems with the message queue database integrity, or hard disk performance.

    When we see the warning again in the application log again, we could check the queue and see if there are a large number of messages or large size messages, or we could check the mail.que database size.

    We could run :  Get-MessageTrackingLog -Start "***" -End "***" to check the mail flow when the issue happens.

    Thanks,

    Andy

    Friday, November 02, 2012 7:13 AM
    Moderator
  • Same here. The 2007 servers exist only to support legacy systems that still use Outlook 98 (arrggh), plus the journal mailbox for Enterprise Vault, which is, I believe, what's causing the back pressure issue. Just 5 mailboxes. Everyone else is on 2010, and I've never seen a back pressure event on the 2010 servers in the year or so since they've been installed.

    I believe the back pressure is occurring whenever the journal mailbox receives a particularly large message at the same time that it's processing what amounts to bulk mail generation during our nightly sales processing cycle. We're doing org level journaling, so it's got messages coming in from all over the world, and a few seem to be very large (> 20 MB). Not too often, but several times a day.

    The problem may go away once I move that journal mailbox over to the 2010 environment. That won't happen until Q1 next year, though. Thanks for the advice. I appreciate it.

    Friday, November 02, 2012 3:46 PM
  • Yes, I think that's what's going on here. The legacy systems generate what amounts to hundreds of messages during our nightly middleware processing of store sales. That, combined with the occasional very large journal message creates a back pressure situation that lasts only a few seconds, but long enough to generate the event log, which then trips an alert in SCOM.

    Not much I can do at this time since we're in a system freeze during Q4, but it's on my radar for early next year. Thanks for the advice.

    Mike

    Friday, November 02, 2012 3:55 PM
  • Hello Mike,

    Thank you for your reply and we are glad to hear that the information we provided is useful. Now I would like to see if you need any additional assistance or have any other questions related to this issue. We would be glad to help you.

    Thanks,

    Andy

    Tuesday, November 06, 2012 2:46 AM
    Moderator