none
Tracking a memory leak in 2008r2...?

    Question

  • Hello --

    Recently we've encountered event 2019 srv - The server was unable to allocate from the system nonpaged pool because the pool was empty.  The server was rebooted yesterday and already Memory usage has increased from 5gb to 8gb.  Based on the past couple weeks, the server will run fine for about a week before needing a reboot.

    Using poolmon, I grabbed the following (top 5 processes shown):  

    At 10:00a:

     Memory:16766772K Avail: 9046292K  PageFlts:   107   InRam Krnl: 6764K P:830528K
     Commit:7649792K Limit:33531684K Peak:7751492K            Pool N:503820K P:839464K
     System pool information
     Tag  Type     Allocs            Frees            Diff       Bytes                  Per Alloc

     Proc Nonp      65099 (   0)         3 (   0)    65096    86447424 (          0)        1327
     File Nonp   73822364 (  77)  73685557 (  71)   136807    45098640 (       2080)         329
     Ntfx Nonp     837972 (   5)    704712 (   0)   133260    42202560 (       1760)         316
     KLsc Nonp    3439741 (   5)   3306608 (   0)   133133    40472432 (       1520)         304
     Irp  Nonp     378392 (   0)    333265 (   3)    45127    34859760 (       -912)         772

    And at 1:30p:

     Memory:16766772K Avail: 8335728K  PageFlts: 36023   InRam Krnl: 6812K P:868796K
     Commit:8558940K Limit:33531684K Peak:9405944K            Pool N:521428K P:877464K
     System pool information
     Tag  Type     Allocs            Frees            Diff       Bytes                  Per Alloc

     Proc Nonp      73554 (   3)         3 (   0)    73551    97675664 (       3984)        1327
     File Nonp  105171269 (17537) 105042282 (17575)   128987    42430320 (     -12704)         328
     Ntfx Nonp     933558 (  15)    810599 (  14)   122959    38608864 (        352)         313
     KLsc Nonp    3809366 (  29)   3686678 (  32)   122688    37297152 (       -912)         304
     Irp  Nonp     443612 (  19)    398610 (  16)    45002    35735200 (      -1904)         794

    What really concerns me is the Proc tag, which seems to be win32k.sys (using findstr /m /l Proc *.sys).  Am I correct to assume that win32k.sys is the cause of the memory leak?  How would I correct this?

    Thanks -- michael~

    Wednesday, July 03, 2013 5:30 PM

Answers

  • Going back to the original problem, Windows ran out of the non-paged pool because something is not releasing its memory, and the server crashed (event 2019).  I appreciate the replies, but there is obviously something wrong and I'm just looking for suggestions on how to troubleshoot it.  Thanks

    Like I told, check your TS application your user run. I rarely see big memfault in windows service/roles. Use processexplorer and check page fault in the proprety's page to help to identify the badly process.

    win32k.sys is used for kernel callback too, so it could be easily a AV that badly release memory, as often it will attach itself up to the kernel.


    MCP | MCTS 70-236: Exchange Server 2007, Configuring
    Microsoft Translator Widget - French moderator (Technet Wiki)

    Twitter - @yagmoth555 ()
    Blog: http://www.jabea.net | http://blogs.technet.com/b/wikininjas/

    Tuesday, July 09, 2013 1:42 PM

All replies

  • Hi Michael, 

    we have to analyze step by step to troubleshoot this issue.

    • Check once you might be received another event 333 along with this 2019.
    • Also suggest you to check file system status using chkdsk/chkntfs utility, sometimes file system corruption also cause of this kind of issues.
    • Has any application installation/upgrade has been performed on the server recently?
    • Any modifications to the server functionality has been done?


    Regards, Ravikumar P

    Thursday, July 04, 2013 8:25 AM
  • Hello --

    - No, there were no event ids 333 in any log;

    - chkdsk reported that everything is fine on both the system and data volumes;

    - We had our win2003 file/data server crash recently, and I've had to restore everything to this 2008 box.  So it's doing everything for the time being - PDC, DNS, DHCP, CA, Terminal server, Advantage db, Sybase db, and file server;

    Here are the top 5 processes from poolmon after 3 days of uptime.. I'm still wondering what the "Proc" tag is doing with such a difference between Allocs and Frees..  Thoughts?

     Memory:16766772K Avail: 5535404K  PageFlts:377186   InRam Krnl: 7008K P:787372K
     Commit:9815824K Limit:33531684K Peak:14866556K            Pool N:676828K P:794836K
     System pool information
     Tag  Type     Allocs            Frees            Diff       Bytes                  Per Alloc

     Proc Nonp     187096 (   0)         3 (   0)   187093   248459440 (          0)        1327
     BCM0 Nonp         80 (   0)        20 (   0)       60    40662912 (          0)      677715
     Irp  Nonp     633381 (  10)    589763 (   0)    43618    34509872 (       8000)         791
     SeOn Nonp     187095 (   0)         3 (   0)   187092    32873552 (          0)         175
     Mdl  Nonp    2884388 (   0)   2754062 (   0)   130326    27385136 (          0)         210

    Thanks -- michael~

    Monday, July 08, 2013 12:35 AM
  • Here's the top processes in poolmon after another day..

     Memory:16766772K Avail: 6068700K  PageFlts: 24304   InRam Krnl: 7144K P:742348K
     Commit:11021472K Limit:33531684K Peak:14866556K            Pool N:760784K P:742468K
     System pool information
     Tag  Type     Allocs            Frees            Diff       Bytes                  Per Alloc

     Proc Nonp     244995 (   3)         3 (   0)   244992   325349312 (       3984)        1327
     SeOn Nonp     244994 (   3)         3 (   0)   244991    43035840 (        528)         175
     Irp  Nonp    1038011 (   8)    993642 (   3)    44369    35731824 (       4496)         805
     SeTl Nonp   21270565 ( 193)  21023196 ( 198)   247369    31663232 (       -640)         128
     Mdl  Nonp    3286470 (   2)   3161366 (   3)   125104    26307024 (     -16464)         210

    The "Proc" thing really concerns me..  If the problem Is caused by a driver, how exactly would I figure out which one it is?  Disable hardware, one-by-one, until the memory is freed up?  

    Thanks

    Tuesday, July 09, 2013 12:41 AM
  • I seen Terminal Server, any app that could make a lot of memfault that would not free up the memory used ?

    Be aware that TS on a DC is not a recommanded setup too.


    MCP | MCTS 70-236: Exchange Server 2007, Configuring
    Microsoft Translator Widget - French moderator (Technet Wiki)

    Twitter - @yagmoth555 ()
    Blog: http://www.jabea.net | http://blogs.technet.com/b/wikininjas/

    Tuesday, July 09, 2013 2:14 AM
  • Going back to the original problem, Windows ran out of the non-paged pool because something is not releasing its memory, and the server crashed (event 2019).  I appreciate the replies, but there is obviously something wrong and I'm just looking for suggestions on how to troubleshoot it.  Thanks
    Tuesday, July 09, 2013 2:16 AM
  • Going back to the original problem, Windows ran out of the non-paged pool because something is not releasing its memory, and the server crashed (event 2019).  I appreciate the replies, but there is obviously something wrong and I'm just looking for suggestions on how to troubleshoot it.  Thanks

    Like I told, check your TS application your user run. I rarely see big memfault in windows service/roles. Use processexplorer and check page fault in the proprety's page to help to identify the badly process.

    win32k.sys is used for kernel callback too, so it could be easily a AV that badly release memory, as often it will attach itself up to the kernel.


    MCP | MCTS 70-236: Exchange Server 2007, Configuring
    Microsoft Translator Widget - French moderator (Technet Wiki)

    Twitter - @yagmoth555 ()
    Blog: http://www.jabea.net | http://blogs.technet.com/b/wikininjas/

    Tuesday, July 09, 2013 1:42 PM
  • Thank you yagmoth555 --

    Unfortunately, I'm coming in to a poorly designed environment and slowly trying to clean things up.  Currently our 2008r2 server hosts everything - DC, DNS, DHCP, CA, RDS, File Services, databases, etc..  Eventually, I'll split roles between several VMs.  

    You mentioned AV and I thank you for that.  I'm running Kaspersky Endpoint Security 10 on there now and I wonder if that is causing the problem.  I'll downgrade it to enterprise edition, as that seems to be preferred among most Kaspersky users.   Thanks for the suggestion.

    Tuesday, July 09, 2013 3:05 PM
  • It's not so bad, it's mostly the TS role that is problematic. For every other roles you can limit process and such, but you never can estimate what a user will click on in a TS session.

    For the DC and TS, it's because you have to give them local right to the server to have the login work, that's why it's a security hole, the user after can open a youtube page, and take all cpu for themselft in exemple, or load you up a Trojan in your server, etc..


    MCP | MCTS 70-236: Exchange Server 2007, Configuring
    Microsoft Translator Widget - French moderator (Technet Wiki)

    Twitter - @yagmoth555 ()
    Blog: http://www.jabea.net | http://blogs.technet.com/b/wikininjas/


    Tuesday, July 09, 2013 4:02 PM
  • So since this is a production server, I haven't been able to downgrade the AV during the week.. I'll need to do that over the weekend; however, I've been noticing that the PIDs are increasing by about 200,000 every day (after 2-1/2 days of uptime, the highest PID is currently 619508).

    To track this -- We have a program that runs every 30 seconds to poll a database and move some things around, if necessary.  The program runs, calls csc.exe and conhost.exe a couple times, then exits and relaunches 30 seconds later.  I've been tracking its PID for the past couple minutes and the values have been: 615904..615860..616400..617208..617432..618432..618424..616996..619508.

    Granted this is during the business day, but this is typical of how quickly PIDs increase.  Should this be a concern to go along with the possible memory leak issue?

    Thanks

    Thursday, July 11, 2013 2:32 PM
  • I second Vegan, the count is really high. I have a doubt about that application just because of that, any way to have it re-coded to simply recal csc and conhost at each 30 seconds ?

    MCP | MCTS 70-236: Exchange Server 2007, Configuring
    Microsoft Translator Widget - French moderator (Technet Wiki)

    Twitter - @yagmoth555 ()
    Blog: http://www.jabea.net | http://blogs.technet.com/b/wikininjas/

    Friday, July 12, 2013 1:17 AM
  • Ok, so I was finally able to downgrade the Kaspersky AV to the enterprise version, and while it's helped to use less memory per TS user, unfortunately, after about two days of uptime, the PIDs are still climbing past 450k, and the available ram is still mysteriously disappearing.  I'm told that there has never been a problem with the recurring program in the past, which leads me to assume something is not releasing its PID(s).  Is it possible to for a process to claim a PID without it showing up in Task Manager? 

    I'm still looking at this win32k.sys (the Proc tag in poolmon). Every second, it takes between 3-9 new Allocs and never any Frees (it's up to over 110k Allocs, and steady at 3 Frees). Already it's using more than 147mb of the non-paged pool.

    In reading, it looks like win32k.sys is used in graphics calls. Besides an outside software tech support person connecting into the console (thru TeamViewer) maybe once a week, no one logs on locally to the console. We have about 7 people connecting in thru RDS and running a couple 32-bit progs. Even so, I've had the standard Microsoft VGA driver in place for the past week.

    Where should I go from here? Any suggestions on how to further troubleshoot this? Thanks.

    Tuesday, July 16, 2013 6:09 PM