locked
Back Pressue causing internal and external mail flow to Stop! RRS feed

  • Question

  • i have one exchange server 2007 SP3 x64 in the following configuration.

    Latest message ( i have been dealing with this off & on for 3-4 weeks) and need to resolve it. the server is a physical box IBM x3650 with a DS3200 storage cage with (2) Quad Core processors and 48gb of Ram. all volumes listed above are running on their own 15k spindle drives in a Raid 1 configuration.  C and E partitions are in the x3650.  F, G & H are in the DS3200 with a SCSI connection to the x3650. We have a 1gigabit network connection. We are using the Mimecast cloud server for spam and email archiving so we have our mail route to them first then down to our local server. outgoing mail routes to mimecast then out to it's destination.

    Resource pressure increased from Normal to Medium.

    Resource utilization of the following resources exceed the normal level:
    Version buckets = 127 [Medium] [Normal=80 Medium=120 High=300]
    Physical memory load = 91% [limit is 94% before message dehydration occurs.]

    Back pressure caused the following components to be disabled:
    Inbound mail submission from the Internet
    Mail submission from the Pickup directory
    Mail submission from the Replay directory
    Mail delivery to remote domains

    The following resources are in the normal state:
    Queue database and disk space ("E:\Program Files\Microsoft\Exchange Server\TransportRoles\data\Queue\mail.que") = 7% [Normal] [Normal=95% Medium=97% High=99%]
    Queue database logging disk space ("E:\Program Files\Microsoft\Exchange Server\TransportRoles\data\Queue\") = 9% [Normal] [Normal=94% Medium=96% High=98%]
    Private bytes = 1% [Normal] [Normal=71% Medium=73% High=75%]



    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    I have been working with Microsoft Support we have done the following:

    increased DatabaseMaxCacheSize

    default- 134217728
    updated to-  1073741824

    increased QueueDatabaseLoggingFileSize

    default - 524280
    updated to - 31457280

    increased DatabaaseCheckPointDepthMax

    default - 20971520
    updated to - 31457280

    But these changes have not solved the problem permanently. No other software changes have occurred on this system.  Can anyone else assist me with some other places to look to resolve this?

    Thank you

    R


    UPDATE: i see the following in my event viewer: ID: 15004

    Resource pressure increased from Medium to High.

    Resource utilization of the following resources exceed the normal level:
    Version buckets = 315 [High] [Normal=80 Medium=120 High=300]
    Physical memory load = 91% [limit is 94% before message dehydration occurs.]

    Back pressure caused the following components to be disabled:
    Inbound mail submission from Hub Transport servers
    Inbound mail submission from the Internet
    Mail submission from the Pickup directory
    Mail submission from the Replay directory
    Mail submission from Mailbox servers
    Mail delivery to remote domains

    The following resources are in the normal state:
    Queue database and disk space ("E:\Program Files\Microsoft\Exchange Server\TransportRoles\data\Queue\mail.que") = 8% [Normal] [Normal=95% Medium=97% High=99%]
    Queue database logging disk space ("E:\Program Files\Microsoft\Exchange Server\TransportRoles\data\Queue\") = 9% [Normal] [Normal=94% Medium=96% High=98%]
    Private bytes = 1% [Normal] [Normal=71% Medium=73% High=75%]



    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Monday, January 20, 2014 5:11 PM

Answers

  • Did reducing the maximum message size help? Did uninstalling (not just stopping the service) Symantec help?

    While it's nice to think that your file-level A/V is faultless, that isn't always the case. You might want to uninstall that, too, while you're troubleshooting.

    I know that "running naked" is a problem, but it's one way of discovering if the problem is associated with that product. Maybe you can install another HT server role and use the Anti-Spam software on it (temporarily) and have your Internet e-mail delivered there. If the problem on the existing HT role disappears you'll have an idea that perhaps that software was the source of the problem.

    I didn't see anything really alarming in your video except for the brief spikes in I/O per second (from, say, 24 to more than 100).

    You might want to have a look at Exchange PerfWiz and PAL (Performance Analyzer of Logs). They make performance data collection and analysis easier, at least while you're casting about for where the problem might be. After that you can home in on the area(s) of concern with more frequent snapshots of the performance counters of concern.


    --- Rich Matheisen MCSE&I, Exchange MVP

    • Marked as answer by cara chen Friday, January 31, 2014 6:15 AM
    Friday, January 24, 2014 2:26 AM
  • The underlying problem is the HT server's inability to handle the combination of large message sizes and the frequency of those messages.

    The 50MB message size, while large, isn't that large. But if the server is constrained in terms of I/O per second (which may simply more I/O than the volume can handle) then the answer is not to add more servers, but to increase the ability of the system to handle the workload. I'd run PerfWiz on the system and then let PAL analyze the results of the data collection to see if (or if) there are problems. Without an objective look at the system all anyone can do is guess at the real cause of the problem.

    In the interim, you might try increasing the thresholds on version buckets

    http://msexchangeguru.com/2013/07/29/troubleshooting-backpressure/

    http://blogs.technet.com/b/exchange/archive/2009/07/07/3407776.aspx


    --- Rich Matheisen MCSE&I, Exchange MVP

    • Marked as answer by cara chen Friday, January 31, 2014 6:15 AM
    Monday, January 27, 2014 10:08 PM

All replies

  • here is what the current process, memory utilization, show:
    Monday, January 20, 2014 5:35 PM
  • The reason for the back pressure is attributed to "Version Buckets".  What you haven't said is what your maximum message size is, what the average message size is, and the rate at which messages arrive.

    Big messages consume more version buckets. Big messages arriving at a high rate consume LOTS of version buckets!

    This might help to explain it:

    http://blogs.technet.com/b/exchange/archive/2009/07/07/3407776.aspx

    Do you have e-mail AV running on the HT server?

    On what disk is the queue database? Is that disk performing okay (sec/read and sec/write <10ms, disk queue length <2, etc.)?


    --- Rich Matheisen MCSE&I, Exchange MVP

    Tuesday, January 21, 2014 4:22 AM
  • Hi Rich,

    Thanks for the reply. We have reduced the global settings transport settings policy attachment size from 50mb inbound/outbound down to 20mb again.  Next step is educating our users to use alternate methods for large attachments via, internal links, FTP, or cloud based solutions we have in the office.  We have Symantec Mail Security for Microsoft Exchange running on our server. During the email outage we disabled this service from running and stopped the processes.  I have ESET file server security running as well to protect the OS. I have excluded the exchange related folders and files for exchange, IIS, all storage groups/mailbox databases from it to prevent a performance issue.

    Queue Database is located on the E: drive now which is used for Log files only.  here is a link to a brief 1minute video of the performance of the E drive read/write and queue length:

    http://tinypic.com/r/2lk4snc/5


    Thursday, January 23, 2014 4:10 PM
  • Did reducing the maximum message size help? Did uninstalling (not just stopping the service) Symantec help?

    While it's nice to think that your file-level A/V is faultless, that isn't always the case. You might want to uninstall that, too, while you're troubleshooting.

    I know that "running naked" is a problem, but it's one way of discovering if the problem is associated with that product. Maybe you can install another HT server role and use the Anti-Spam software on it (temporarily) and have your Internet e-mail delivered there. If the problem on the existing HT role disappears you'll have an idea that perhaps that software was the source of the problem.

    I didn't see anything really alarming in your video except for the brief spikes in I/O per second (from, say, 24 to more than 100).

    You might want to have a look at Exchange PerfWiz and PAL (Performance Analyzer of Logs). They make performance data collection and analysis easier, at least while you're casting about for where the problem might be. After that you can home in on the area(s) of concern with more frequent snapshots of the performance counters of concern.


    --- Rich Matheisen MCSE&I, Exchange MVP

    • Marked as answer by cara chen Friday, January 31, 2014 6:15 AM
    Friday, January 24, 2014 2:26 AM
  • Reducing the attachment size seems to have helped. No more 15004 event id's relating to back pressure. We did not completely uninstall the symantec mail security application, but i will do that this week as we don't really use it anymore now that we have our mail scrubbed by mimecast systems first before mail hits our exchange server.  I still have ESET file security running on the server this whole time. i will go back over and make sure the necessary exclusions are in place. 

    i will definiately take a look at those to see what kind of answers they provide. Would having multiple HT server roles help if my ultimate goal is to try an increase the attachment size to help assist with mail flow?

    Monday, January 27, 2014 7:58 PM
  • The underlying problem is the HT server's inability to handle the combination of large message sizes and the frequency of those messages.

    The 50MB message size, while large, isn't that large. But if the server is constrained in terms of I/O per second (which may simply more I/O than the volume can handle) then the answer is not to add more servers, but to increase the ability of the system to handle the workload. I'd run PerfWiz on the system and then let PAL analyze the results of the data collection to see if (or if) there are problems. Without an objective look at the system all anyone can do is guess at the real cause of the problem.

    In the interim, you might try increasing the thresholds on version buckets

    http://msexchangeguru.com/2013/07/29/troubleshooting-backpressure/

    http://blogs.technet.com/b/exchange/archive/2009/07/07/3407776.aspx


    --- Rich Matheisen MCSE&I, Exchange MVP

    • Marked as answer by cara chen Friday, January 31, 2014 6:15 AM
    Monday, January 27, 2014 10:08 PM