none
Memory Leak on Windows Server 2008 R2 RRS feed

  • Question

  • Hi,
    I have a memory leak problem that seems to be spreading and has only started occurring in the last two months, it has gotten worse with time.


    Symptoms include the expected slowness as the physical memory approaches maximum available. What's strange is that I am not seeing a rising non-paged pool size. Because of this, I don't really know how to go about investigating this problem.

    Rebooting keeps the memory down for about a day or few days. Doesn't seem to matter if there are people in the building or not.

    I have these tools but I dont know how to read data to prove which applicatiuon is leaking memory.

    https://docs.microsoft.com/en-us/sysinternals/downloads/vmmap

    https://docs.microsoft.com/en-us/sysinternals/downloads/rammap

    PoolMon.exe

    Performance monitor

    https://www.microsoft.com/en-us/download/details.aspx?id=49924

    Wednesday, August 21, 2019 8:05 AM

Answers

  • OK, quick guide..

    Poolmon -e -u shows the Pool tag ordered by the maximum allocation..

    If we open your log file, we can see the first line:

    Tag    Type     Allocs        Frees         Diff          Bytes              Per Alloc

    Toke  Paged   27296884  26135919  1160965   1236520032    1065  

    The important values are Tag=Toke, which means a Token security object and Bytes=1.236.520.032 which indicates the total number of bytes allocated at the time you took the info.. The system is really dynamic so, may be that 5 minutes later this will no longer be the max allocation.. numbers goes up and down quickly when the system is under stress..

    Your next step is to search down in the log for the same Tag again, so to see if the total number of bytes allocated increases or decreases..

    So, CTRL+F Toke and we get the second line this appears in the log.. it is still at the top of allocated memory

    10:55
     Tag       Bytes
     Toke     1238807200      

    and so on and on..

    11:00Toke      1243002528

    11:05
    Toke      1247086848

    as you can see it is always at the top and the total allocated memory increases..

    So you reach the latest log at 11:55

    Toke      1287888992

    You could put all the values in excel and create a nice graph,but the important thing is that the latest value is way higher than the first one, so you get the last value, and subtract the first one to get the difference per hour..

    1287888992-1236520032=51.368.960

    This is the leaked memory per hour on your system

    If you look at the second allocation CM31, you will notice that the first values is 214.691.840 which is exactly = to the latest value 214.691.840, so it is only the first tag which is consuming memory..

    HTH
    -mario

    • Marked as answer by lalaJee Wednesday, August 28, 2019 2:51 PM
    Friday, August 23, 2019 5:38 PM
  • That's the problem.. 9000 token objects are leaked somehow..

    CM31 has nothing to do with it probably, but to find out what driver is implementing that tag open a cmd as administrator and execute these two commands:

    cd c:\windows\system32\drivers

    strings * | findstr /i "CM31"

    HTH
    -mario

    • Marked as answer by lalaJee Wednesday, August 28, 2019 2:51 PM
    Tuesday, August 27, 2019 1:34 PM
  • This sounds coherent with your problem description and with the log you provided..

    The leaked object is of type TOKEn.. this object is generally handled by the LSASS process, but there may be something else which "duplicate" those token and then leaves them open.

    So, there may be some third party library "injected" in LSASS causing the problem or some other sort of service/driver leaking those kind of objects.

    The fact that you already know what has been installed recently on your network and is causing the problem doesn't leave any more space that work with the software vendor to find the root cause of the leak and solve it..

    Thanks!
    -mario

    • Proposed as answer by mariora_ Wednesday, August 28, 2019 7:49 AM
    • Marked as answer by lalaJee Wednesday, August 28, 2019 2:51 PM
    Wednesday, August 28, 2019 7:49 AM

All replies

  • What OS are we talking about and what roles has the server installed??

    On Windows server 2008 till 2012 there was a problem with the NTFS cache if the Server is running as file server.. you would see with Ram map a huge amount of memory of type Metafile.. if that's the case there is an old blog post of mine here: https://blogs.technet.microsoft.com/itasupport/2012/05/27/windows-2008-ridatemi-la-mia-memoria-storia-di-rammap-e-dyncache/

    If that's not the case, start answering the above questions and send us a screenshot of RamMap and of the Performance tab of Task Manager Memory when you are having troubles..

    THT
    -mario 

    Wednesday, August 21, 2019 2:57 PM
  • What OS are we talking about and what roles has the server installed??

    On Windows server 2008 till 2012 there was a problem with the NTFS cache if the Server is running as file server.. you would see with Ram map a huge amount of memory of type Metafile.. if that's the case there is an old blog post of mine here: https://blogs.technet.microsoft.com/itasupport/2012/05/27/windows-2008-ridatemi-la-mia-memoria-storia-di-rammap-e-dyncache/

    If that's not the case, start answering the above questions and send us a screenshot of RamMap and of the Performance tab of Task Manager Memory when you are having troubles..

    THT
    -mario 

    Thank you for theis article.

    We are using Windows server 2008 R2 and some of the servers do have File Role install them but not all of the servers.

    The server which doesnt have file role install it still having same issue as the one which has file server.

    Thursday, August 22, 2019 6:51 AM
  • Send a RamMap screen shot because even if a server doesn't have the role installed but have some shares used remotely, it may have the same exact problem, because in the end it is working as a file server..

    Screenshot please..

    Thanks

    -mario

    Thursday, August 22, 2019 6:58 AM

    • Edited by lalaJee Thursday, August 22, 2019 7:18 AM
    Thursday, August 22, 2019 7:13 AM
  • From the ram map screenshot the problem seems to be in the paged pool and not in the non paged pool..

    Can you please save these lines in a cmd file and run it as administrator every 5 minutes for an hour an then post back the file txt with the values?

    time /T >> c:\temp\pool.txt
    poolmon -n c:\temp\pool.txt -e -u

    Thanks!
    -mario

    Thursday, August 22, 2019 7:59 AM
  • Hi, Please find the pool log.

    https://drive.google.com/drive/folders/13c0u1hYWJ74tjg8DA8K1OLT3sgycs3qr?usp=sharing

    Thursday, August 22, 2019 11:11 AM
  • Given the log, looks like the leaked object are Token object contained in Lsass.. Now your next steps are contained in this article:

    https://blogs.technet.microsoft.com/askpfeplat/2014/03/09/another-troubleshooting-adventure-more-real-life-memory-pool-leaks/

    So, use Handle.exe from Sysinternals and confirm that Lsass has a huge amount of Token open in it:

    Handle -p lsass

    And then follow the article..

    HTH
    -mario

     
    Thursday, August 22, 2019 2:24 PM
  • When I was running the poolmon i can see that LRfr was allocating e.g. 1700b but only free 1000b, I couldnt see the 700b coming back into system.

    What I can undetsand that SeTl is keep reporting errors for these services this is why it might be showing high.

    Thursday, August 22, 2019 2:52 PM
  • You are looking at the wrong side of poolmon.. Non Paged Pool it's OK.. It's the paged that has trouble..

    Please, follow my instructions on the above answer and look at the lsass process using Handle.exe.
    Then post back the result file..

    Handle -p lsass > c:\temp\handle.txt

    HTH
    -mario

    Thursday, August 22, 2019 5:29 PM
  • Please find the log file. I also included some other processors hadler too.

    https://drive.google.com/drive/folders/13c0u1hYWJ74tjg8DA8K1OLT3sgycs3qr?usp=sharing

    Also When I run the poolmon i entered P and B

    When I restarted server and run poolmon i see the SeTl was at 5th place by each day it start move up.
    • Edited by lalaJee Friday, August 23, 2019 8:08 AM
    Friday, August 23, 2019 7:31 AM
  • Very few information are available with those parameters..

    try this way:

    Handle -a -p lsass.exe > c:\temp\handle_lsass.txt

    Thanks
    -mario

    Friday, August 23, 2019 8:09 AM
  • I have upload the log.

    https://drive.google.com/drive/folders/13c0u1hYWJ74tjg8DA8K1OLT3sgycs3qr?usp=sharing

    File name: handle_lsass.txt

    Friday, August 23, 2019 11:20 AM
  • There are only 113 token occurrence in LSASS.. so this is not the problem..

    These are the data from your perfmon log of yesterday..

    10:51

     Tag  Type     Allocs            Frees               Diff            Bytes                 Per Alloc

     Toke Paged           27296884           26135919           1160965     1236520032             1065        
     CM31 Paged             184338             136759             47579      214691840             4512        
     Sg01 Nonp               17320              15000              2320      151414064            65264        

     10:55

     Tag  Type     Allocs            Frees               Diff            Bytes                 Per Alloc

     Toke Paged           27349844           26186688           1163156     1238807200             1065        
     CM31 Paged             184338             136759             47579      214691840             4512        
     Sg01 Nonp               17331              15011              2320      151414064            65264        

     11:00

     Tag  Type     Allocs            Frees               Diff            Bytes                 Per Alloc

     Toke Paged           27434795           26267672           1167123     1243002528             1065        
     CM31 Paged             184338             136759             47579      214691840             4512        
     Sg01 Nonp               17355              15035              2320      151414064            65264        

     11:05


     ...

     11:55

     Tag  Type     Allocs            Frees               Diff            Bytes                 Per Alloc

     Toke Paged           28408198           27198656           1209542     1287888992             1064        
     CM31 Paged             184338             136759             47579      214691840             4512        
     SeTl Nonp            28408198           27198656           1209542      154821376              128        

    Stop               -           Start       =      Leak 

    1.287.888.992 - 1.236.520.032  = 51.368.960

    practically your system is leaking 50MB/Hour of Token objects.

    Token are security objects which generally are handled by the Lsass, for this reason I asked to look at that process to start.. there are many way to duplicate token , so it is at all possible that one of the program installed on your system is duplicating and leaving them open by error..
    You stated that the problem started not many time ago, so I would look for process you recently installed, especially related to security.. 

    You can still use handle.exe -a -p with all the running process until you find the offending process.. But as we are talking about kernel objects it is at all possible that a driver is handling those Token so you may not see them from user mode, then you will need to go down with WPA like in the article I posted two days ago:

    https://blogs.technet.microsoft.com/askpfeplat/2014/03/09/another-troubleshooting-adventure-more-real-life-memory-pool-leaks/

    So, good luck with your search..

    HTH
    -mario

         


    Friday, August 23, 2019 2:29 PM
  • Thank you so much for your help.

    The application which I think is causing the LSASS to leak memory is the log collecting software which we got install on all of our system.

    Can you please let me know how did you read the handle log. Can you please explain to me. So i can do this myself in a future.


    • Edited by lalaJee Friday, August 23, 2019 3:24 PM
    Friday, August 23, 2019 2:51 PM
  • Where did you get this info and how did you get this info.

    Stop               -           Start       =      Leak 

    1.287.888.992 - 1.236.520.032  = 51.368.960

    practically your system is leaking 50MB/Hour of Token objects.


    • Edited by lalaJee Friday, August 23, 2019 2:58 PM
    Friday, August 23, 2019 2:57 PM
  • OK, quick guide..

    Poolmon -e -u shows the Pool tag ordered by the maximum allocation..

    If we open your log file, we can see the first line:

    Tag    Type     Allocs        Frees         Diff          Bytes              Per Alloc

    Toke  Paged   27296884  26135919  1160965   1236520032    1065  

    The important values are Tag=Toke, which means a Token security object and Bytes=1.236.520.032 which indicates the total number of bytes allocated at the time you took the info.. The system is really dynamic so, may be that 5 minutes later this will no longer be the max allocation.. numbers goes up and down quickly when the system is under stress..

    Your next step is to search down in the log for the same Tag again, so to see if the total number of bytes allocated increases or decreases..

    So, CTRL+F Toke and we get the second line this appears in the log.. it is still at the top of allocated memory

    10:55
     Tag       Bytes
     Toke     1238807200      

    and so on and on..

    11:00Toke      1243002528

    11:05
    Toke      1247086848

    as you can see it is always at the top and the total allocated memory increases..

    So you reach the latest log at 11:55

    Toke      1287888992

    You could put all the values in excel and create a nice graph,but the important thing is that the latest value is way higher than the first one, so you get the last value, and subtract the first one to get the difference per hour..

    1287888992-1236520032=51.368.960

    This is the leaked memory per hour on your system

    If you look at the second allocation CM31, you will notice that the first values is 214.691.840 which is exactly = to the latest value 214.691.840, so it is only the first tag which is consuming memory..

    HTH
    -mario

    • Marked as answer by lalaJee Wednesday, August 28, 2019 2:51 PM
    Friday, August 23, 2019 5:38 PM
  • Thank you thats really good.

    So how did you read the handle log or why was it needed, was they any info in handle log which was important.

    Saturday, August 24, 2019 4:20 PM
  • Actually it wasn't helpful, because the hypothesis that the token object were leaked inside LSASS has not been confirmed..

    You have to count the occurrence of TOKEN object inside the log.. in the lsass handle log the count was 113.. that's too low to cause 1.28GB of allocation.. you shoukd try using handle against all the process and hopefully found a process with a token count of thousands of open handles.. that would be the offending process..

    Obviously the problem may be caused by a driver in kernel mode,so only a full memory dunp would help to find the offending driver..

    HTH
    -mario  

    Saturday, August 24, 2019 5:13 PM
  • How do i get full memory dump which might give me this information.\

    Saturday, August 24, 2019 8:39 PM
  • https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/forcing-a-system-crash-from-the-keyboard

    but then examine a full memory dump it's not an easy thing..

    You said that you know the software that is leaking. You can send the full memory dump to the vendor for examination..

    HTH
    -mario

    Saturday, August 24, 2019 10:20 PM
  • Do you know what is CM31 is?

    We had a issue with Domian Controller not allow user to login so I through I run the handle command to see how many tokes are open.

    Handle -a -p lsass.exe > c:\temp\handle_lsass.txt

    it came back with 9000 tokens.

    https://drive.google.com/drive/folders/13c0u1hYWJ74tjg8DA8K1OLT3sgycs3qr?usp=sharing

    Tuesday, August 27, 2019 12:58 PM
  • That's the problem.. 9000 token objects are leaked somehow..

    CM31 has nothing to do with it probably, but to find out what driver is implementing that tag open a cmd as administrator and execute these two commands:

    cd c:\windows\system32\drivers

    strings * | findstr /i "CM31"

    HTH
    -mario

    • Marked as answer by lalaJee Wednesday, August 28, 2019 2:51 PM
    Tuesday, August 27, 2019 1:34 PM
  • in your log, the token object are half of type

     268C: Token         domain\DC_Name10$:29b390

    and half of type

     2688: Token         domain\DC_Name09$:32fa7fb


    Do you have two DC named DC_Name09 and 10??

    If yes, then this may be authentication from those two DC..

    Are you having the problem right now??

    -mario

    Tuesday, August 27, 2019 1:47 PM
  • I have mutliple DC.

    When I taken the handle log i was having the issue but the leak still be happening on these 2008 servers.

    If server hasnt been reboot it more then say 2-3 days then we start seeing this issue on dc manily on DC7 and 09


    • Edited by lalaJee Wednesday, August 28, 2019 7:21 AM
    Wednesday, August 28, 2019 7:21 AM
  • This sounds coherent with your problem description and with the log you provided..

    The leaked object is of type TOKEn.. this object is generally handled by the LSASS process, but there may be something else which "duplicate" those token and then leaves them open.

    So, there may be some third party library "injected" in LSASS causing the problem or some other sort of service/driver leaking those kind of objects.

    The fact that you already know what has been installed recently on your network and is causing the problem doesn't leave any more space that work with the software vendor to find the root cause of the leak and solve it..

    Thanks!
    -mario

    • Proposed as answer by mariora_ Wednesday, August 28, 2019 7:49 AM
    • Marked as answer by lalaJee Wednesday, August 28, 2019 2:51 PM
    Wednesday, August 28, 2019 7:49 AM
  • Thank you So much for your help on this. Its much appreciate.
    Wednesday, August 28, 2019 2:51 PM