locked
UAG randomly hangs and crashes its IIS application pool RRS feed

  • Question

  • For the past 6 months UAG has been randomly unavailable on our servers, where users are unable to connect to the UAG portal at all. When we examine the server, we've found the IIS application pool that UAG runs under (DefaultAppPool) completely unresponsive. Recycling the app pool and restarting IIS doesn't bring UAG back up, and we end up restarting the server to bring it back up. The outages themselves appear random, sometimes happening as often several times a day, other times as little as   month or so.

     

    Looking through the event logs, we've found a number of events from the WAS service that seem to correspond with the outage. 

     WAS event 5009: A process serving application pool 'DefaultAppPool' terminated unexpectedly. The process id was '6312'. The process exit code was '0xc000012d'.

     

    This is usually followed by one of the following events:

     WAS event 5011: A process serving application pool 'DefaultAppPool' suffered a fatal communication error with the Windows Process Activation Service. The process id was '4568'.  The data field contains the error number.

     WAS event 5022: The Windows Process Activation Service failed to create a worker process for the application pool 'DefaultAppPool'. The data field contains the error number.

     

    Windows Error Reporting for the crash shows the following:

     

     Event Name: APPCRASH

      Response: Not available

     Cab Id: 0

     

     Problem signature:

     P1: w3wp.exe

     P2: 7.5.7600.16385

     P3: 4a5bd0eb

     P4: ntdll.dll

     P5: 6.1.7600.16695

     P6: 4cc7b325

    P7: c0000005

    P8: 000000000004c8f4

    P9: 

    P10: 

     

    Attached files:

    C:\Windows\Temp\WERB061.tmp.appcompat.txt

    C:\Windows\Temp\WERCCE3.tmp.WERInternalMetadata.xml

    C:\Windows\Temp\WERCD12.tmp.hdmp

    C:\Windows\Temp\WER5E52.tmp.mdmp

     

    These files may be available here:

    C:\ProgramData\Microsoft\Windows\WER\ReportQueue\AppCrash_w3wp.exe_bacc1121e4b19011b4fc1a203d345983b1ae9c_cab_1179619b

     

    Analysis symbol: 

    Rechecking for solution: 0

    Report Id: b2d9ca06-75e1-11e0-9658-00505600033d

    Report Status: 4

     

     

    There's also a number of these system events as well, however they don't seem to correspond to the outages:

    WAS event 5010: A process serving application pool 'DefaultAppPool' failed to respond to a ping. The process id was '3284'.

     

    Several months ago we setup a second UAG server that is load-balanced with the first. Shortly after, it started to have the same problems as well. Both servers run on Windows 2008 r2. We have not applied sp1 for win2k8 r2 yet pm these servers. We've also gone back to the date the problem first appeared to see if any changes or updates had been applied; nothing had changed on that server for at least a week. 

     

    We're planning on opening a support incident with MS (there's some confusion with MS whether or not we have SA and/or incidents with this product or not; that's a separate issue). For now I was wondering if anyone had seen this issue before, or could offer any suggestions.

     


    • Edited by Josh Derr Monday, May 9, 2011 8:12 PM fixed typo in OS used
    Monday, May 9, 2011 3:50 PM

Answers

  • The only situation where we've seen such behavior is where UAG has been customized, which may cause a resource leak. I would suggest you start by removing any customizations you may have made on the server (even simple ones, like look-and-feel). If that doesn't help, this should be investigated within a support case with CSS Security, so the next step would be to open a support case with Microsoft.
    Ben Ari
    Microsoft CSS UAG/IAG Support
    Sammamish, WA
    • Marked as answer by Erez Benari Monday, May 9, 2011 11:36 PM
    Monday, May 9, 2011 11:36 PM
  • Turns out it wasn't a customization issue at all. In our authentication settings for UAG, "Level of Nested Groups" was left blank, which means UAG does not limit how far it will drill down into nested groups when authorizing users. Also, apparently sometime around date we started having problems a change in a child group of department role group in our organization resulted in a looping nested group (a child group of the role group also contained its parent as a member). We discovered this while investigating a seemingly unrelated problem: users in the department were denied access to UAG despite being in the correct group. 

    Setting "Level of Nested groups" to a fixed number and corercting the nested looping group has fixed the problem. I suspect UAG was endlessly running through the looping nested groups chewing up resources until they were exhausted, at which point UAG crashed.

    • Marked as answer by Josh Derr Monday, July 25, 2011 1:37 PM
    Monday, July 25, 2011 1:37 PM

All replies

  • Does the IIS process also consume 100% CPU?
    Jason Jones | Forefront MVP | Silversands Ltd | My Blogs: http://blog.msedge.org.uk and http://blog.msfirewall.org.uk
    Monday, May 9, 2011 4:32 PM
  • No. The none of the IIS processes or the system as a whole show any resource spikes around the times of the outages. 
    Monday, May 9, 2011 4:52 PM
  • The only situation where we've seen such behavior is where UAG has been customized, which may cause a resource leak. I would suggest you start by removing any customizations you may have made on the server (even simple ones, like look-and-feel). If that doesn't help, this should be investigated within a support case with CSS Security, so the next step would be to open a support case with Microsoft.
    Ben Ari
    Microsoft CSS UAG/IAG Support
    Sammamish, WA
    • Marked as answer by Erez Benari Monday, May 9, 2011 11:36 PM
    Monday, May 9, 2011 11:36 PM
  • Hi Josh,

    can you share what the solution is to the problem? What kind of customization did you do to cause this issue? We are looking at the same problem at the moment, but didn't do any customization as far as we know.

    Thanx,

    Wim

    Monday, July 25, 2011 1:19 PM
  • Turns out it wasn't a customization issue at all. In our authentication settings for UAG, "Level of Nested Groups" was left blank, which means UAG does not limit how far it will drill down into nested groups when authorizing users. Also, apparently sometime around date we started having problems a change in a child group of department role group in our organization resulted in a looping nested group (a child group of the role group also contained its parent as a member). We discovered this while investigating a seemingly unrelated problem: users in the department were denied access to UAG despite being in the correct group. 

    Setting "Level of Nested groups" to a fixed number and corercting the nested looping group has fixed the problem. I suspect UAG was endlessly running through the looping nested groups chewing up resources until they were exhausted, at which point UAG crashed.

    • Marked as answer by Josh Derr Monday, July 25, 2011 1:37 PM
    Monday, July 25, 2011 1:37 PM
  • Hi Josh,

    I will check with the customer if they have a simular case. Thanx for your quick answer!

    Regards

    Wim

    Tuesday, July 26, 2011 7:48 AM