locked
2008R2 memory caching

    Question

  • I have an issue with my 2008R2 Enterprise file server gradually using more and more physical memory until most of it is taken up, and only a reboot clears it.  This server is not currently in production, and I just happened to notice the memory creeping up over a couple of days as I was copying files to it.

    I have done some research on this problem and it seems it has to do with "Excessive Cached Read I/O".  This behavior is nicely explained in this msdn blog (http://blogs.msdn.com/ntdebugging/archive/2007/11/27/too-much-cache.aspx)

    The "unsupported" fix in the blog, became the supported fix listed in the Performance Tuning Guidelines for Windows Server 2008 as the "Microsoft Windows Dynamic Cache Service" (http://www.microsoft.com/downloads/details.aspx?FamilyID=E24ADE0A-5EFE-43C8-B9C3-5D0ECB2F39AF&displaylang=en)

    Unfortunately, 2008R2 is not supported by this download and the Performance Tuning Guidelines for Windows Server 2008R2 doc states that;

    "Previous releases of Windows Server sometimes benefitted from tools that limit the working-set size of the Windows file cache. These tools are not necessary on most servers running Windows Server 2008 R2. You should reevaluate your use of such tools."

    I still seem to have the same problem, physical memory being used up by cached read I/O, but there is no supported method of controling it in 2008R2.

    My concern is that there is no way to stop the cached read I/O from squeezing kernel processes out to virtual memory (disk) and killing performance.

    Is there another fix available for R2?  Is there an upgrade to the cache manager in R2 that fixes the problem?

    • Edited by rexif Monday, March 1, 2010 4:53 PM New title more appropriate for content
    Thursday, February 25, 2010 7:13 PM

All replies

  • I've done some additional research and it does look like Microsoft is was aware this was an issue in Server 2008, but does not believe it is a problem in Server 2008R2.  they have said as much in the Performance Tuning Guidelines for Windows Server 2008R2 doc, mentioned above, and its also mentioned in http://support.microsoft.com/kb/976618

    "The memory management algorithms were updated to address this problem in Windows 7 and Windows Server 2008 R2 operating systems. Therefore, we do not recommend that you use the provided functions or the Microsoft Windows Dynamic Cache Service in computers that are running Windows 7 or Windows Server 2008 R2."


    Ok, so the system caches files, and the memory manager is now updated to handle them (no details given on how).  However when I look at performance monitor to see where/how all the memory is allocated, things don't seem to add up.

    An MSDN article on memory performance information (http://msdn.microsoft.com/en-us/library/aa965225(VS.85).aspx#system_memory_performance_information) says that the following performance counters should correspond to the "Physical Memory (MB): Cached" displayed by Task Manager.

    Cache Bytes + Modified Page List Bytes + Standby Cache Reserve Bytes + Standby Cache Normal Priority Bytes + Standby Cache Code Bytes (I beleive this last counter is actually "Standby Cache Core Bytes" as "Standby Cache Code Bytes" doesn't exist)

    The weird thing is that when I add all the values up for these counters and do a little math to get to MB it does not match.  Task Manager displays 599 MB as the "Physical Memory (MB): Cached" value, yet the counter add up to 3,287 MB.  Big difference. 

    Note: The larger number I get from the performance counters does seem to jive with what I see in Task Manager for real time memory usage.

    So my confusion...  I'm seeing real time memory usage, in Task Manager, increase to the point of using up all the physical memory.  Microsoft says that this is system cache, and the memory manager is handling it better in R2.  Task manager says it isn't cached (in physical memory), but Perfmon counters say it is cached, and the MSDN article says that should be reflected in Task Manager, but it isn't.

    Is there anything out there that explains what R2 is doing, and what I'm looking at here?

    Monday, March 1, 2010 4:51 PM
  • Hi rexif,

    "Task Manager displays 599 MB as the "Physical Memory (MB): Cached" value, yet the counter add up to 3,287 MB.  Big ,
    difference. "

    Usually every one has the tendency to monitor task manager and come into faster conclusion that they have a memory leak or application leak. To be precise task manager doesnt gives you the actual memory in usage. So you should use performance counter to calculate the bytes so that you will get what is the total amount of bytes residing in memory. So you need to get the working set of the system cache with which you can decide if there is a performance bottleneck.

    Yes the memory manage is very much different from windows 2008 to windows 2008 R2. PFN lock mechanism has been re-designed.

    When you said enterprise server using more memory, can you be specific at the module ? , are you seeing increase in srv.sys ? or mrxsmb.sys ?. Microsoft uses SMB protocol , earlier versions of operating systems used smb 1.1 version , and newer operating systems are using smb 2.0 with more data handling abilities per packet. So you have to further dwell down and check what is the behavior , SMB is one of the complex protocol to understand as its implementation falls into many drivers and protocols. It uses MRXSMB.sys, SRV.sys, Netbt.sys file , mup.sys file etc...

    So you need to check if there is an excessive write I/o or excessive read I/O ?, Also one of the important consideration not to forget is the shell32.dll, when you are accessing file using explorer.exe it uses shell , so there is an additional overhead on the communication . Following are some of the things which i would like you to follow to get more understanding of the behavior.


    a) which process is occupying more memory

    b) does the memory depletion happen locally ( eg: when accessing using \\localhost\share )

    c) check the behavior in safe mode with networking : if the problem doesnt exhibits in safe mode with networking , you can safely think that there is no microsoft component involved in the communication and hence the memory bottleneck.

    d) try performing net use and access the share .

    e) does the memory depletion occurs while accessing certain files  ( file type , file size ) or with any file and any size

    f) you need to monitor the process's memory activity which gives you clear understanding of whether the memory is increasing at regular intervals or it spikes up when there is a session established by any user.

    I would then use adplus dump to dump the process memory activity and then perform debuging on the process memory / vm activity.

    Tuesday, March 2, 2010 4:11 AM
    Moderator
  • Hi Sainath,

    Thanks for the reply.

    I'm a little confused at your statement that "To be precise task manager doesnt gives you the actual memory in usage".  I thought that was exactly what it did.  Just that in 2008 and 2008R2, it displayed physical memory in use, not virtual memory.  Also note that the MSDN article does state quite clearly that the performance counters I'm looking at to ascertain cached memory should correspond directly with the "Physical Memory (MB): Cached" displayed by Task Manager.

    I think if I explain exactly what I'm doing with this server it might help.

    I'm migrating a large physical file server to a virtual server.  Our virtual environment is VMware 3.5 update 4

    I noticed the uptick in memory when I would either restore files via backup exec, or copy over files using scriptlogic's SecureCopy software.  I have verified with Symantec we have the latest agent for 2008R2.

    I am copying / synching 1.7 terabytes of data total to the new server.  The SecureCopy software is located on the local machine, and also on another 2008R2 server (using it as a data mover). 

    Apart from copies and sync's, nothing else touches this machine.  After the copies / sync's are done, physical memory usage is reported as high by the operating system (perfmon, resource monitor and task manager).  Some physical memory is released (approx. 300MB) after the data transfer, but that it.  After 4 days, my physical memory is still 84% used according to resource monitor.

    However, none of the processes are using much memory.  explore.exe is the largest user by far at 40MB  total memory usage by all processes is less than 400MB.  yet resource monitor is showing that 3247MB of physical memory is in use, and that 586MB of physical memory is cached. (todays numbers).  This is the mystery, what is using all the physical memory?  So far, I believe its the read cache I/O mentioned in the articles above.  I think I can see this in the perfmon counters, but it doesn't match up with the task managers "Physical Memory (MB): Cached", which according to MS, its supposed to.  I also cannot find anything on how the new memory manager handles read cache I/O and other caching. This would be nice to see.

    networking seems fine.  I can access shares locally and across the network with no problem.

    I will monitor the memory activity of the processes at the next copy, but as mentioned above, they seem to be acting normally.

    I'm begining to think that files are cached in memory in a sort of FIFO (first in first out) fashion, as part ofthe read cache I/O operation, and there is just some weird reporting problem in task manager.

    my main concern is around performance when this machine is in production.  It will have many files copied to it daily, and I'm just trying to get a handle on how this memory issue will impact it.  It just may be that what I'm seeing is by design, I just can't find anything that says so, and the monitors available to me seem to give conflicting info. 

    Thanks for your help. 
    Tuesday, March 2, 2010 3:00 PM
  • Hi Sainath,

    I have rebooted the virtual 2008R2 file server to clear the memory.  After the reboot Task Manager is reporting 498MB in use.

    I ran a copy using the SecureCopy software from another 2008R2 server (the data mover mentioned above)

    I also ran performance monitor with the following counters:

    Cache Bytes
    Modified Page List Bytes
    Standby Cache Reserve Bytes
    Standby Cache Normal Priority Bytes
    Standby Cache Core Bytes

    During the copy, memory in use (per Task Manager) climbed to 737MB  This is close to the amount of data that was actually copied to the server.

    The "Cached" value in Task Manager also climbed to 3101MB

    All the performance counters went up (except for Standby Cache Core Bytes) Notably, Standby Cache Normal Priority Bytes rose to 3,246,235,648 bytes.  far more than the other counters.

    I also noticed that if you add the "Cached" value to the "In use" value you get the "Total Physical Memory (MB)" value

    Could this be what is happening;

    When the copy starts, the memory manager starts assigning memory to the normal priority standby cache page lists (this is the Standby Cache Normal Priority counter).  This then shows up as the "Cached" value in Task Manager

    As files are copied over and are "read" by the OS, the memory manger moves more physical pages to the cache manager which then uses memory from the Standby Cache, thus putting it "In use" as far as Task Manager is concerned.

    This would explain what I was seeing, that as more files are copied over to the server, memory usage would go up and Cache would go down. 

    The only problem with this analysis is that the "Standby Cache Normal Priority Bytes" value always stays at about the same as the amount of the physical RAM.  This could be explained as this is a count of total bytes put aside, not a dynamic count as bytes are used. see description of the process below.

    Standby Cache Normal Priority Bytes is the amount of physical memory, in bytes, that is assigned to the normal priority standby cache page lists. This memory contains cached data and code that is not actively in use by processes, the system and the system cache. It is immediately available for allocation to a process or for system use. If the system runs out of available free and zero memory, memory on lower priority standby cache page lists will be repurposed before memory on higher priority standby cache page lists.

    What do you think?

    Tuesday, March 2, 2010 9:00 PM
  • rexif,

    I am having similar problems and would like to know if you solved this problem.  I am trying to consolidate some older machines into one new enterprise server 2008 R2 box set up as AD, DNS, DHCP and a Hyper-V host.  I cannot copy a large file from C: to C:, or across drives, or across network for that matter.  When copying, it will start out at 100mbps, then eat all physical ram, gradually slow to 10mpbs, then halts the system.

    I find it odd that i cannot even copy a file onto the same drive that it resides on....

    I have tried in safe mode with networking, turned off firewall, stopped hyper-v services, removed file server role and other various suggestions, but have not resolved the issue.  Anyone find a solution?

    UPDATE:  I wanted to update this information as I have tried many things and have not resolved my problem (described above). 
    Steps taken:  I wanted to rule out a HW / RAID5 / MotherBoard issue. Initial config was 5x1TB drives, H/W RIAD5, with two volumes carved out, one for boot and one for data, initialized as MBT in win.  So, i blew away the machine, created a RAID1 on 2x500GB drives, set that as the bootvol, and threw 4x1TB drives into a RAID5 initialized in windows as GPT.  This resolved the issue on my bootvol (C:\), but not on the data volume.  I re-initialized the disks as MBT (in doing so you lose anyting over 2TB) just to rule out a GPT issue.  This did not resolve the problem.  Reconfigured the box so that all of the 1TB sata drives were MBT and their own drive letters....still have the problem.  Lastly, I set the drives up into a RAID1 w/ MBT...still no good.   I tried turning on and off write caching on that (those) volumes and that did not resolve the issue.  So, essentially, I ruled out a hardware issue.

    I did notice that as I copy, the memory (just from what is visible on task manager) peaks out and plateaus...I let the file continue to copy until finished...the memory did not just get let go as I would have first thought....it slowly tricked away....  Next I am going to look at GP and see if there are any issues there...noticed 200k events logged in my security event log.... 

    One last thing:  I have an exact H/W replica machine in which i placed Win7...and I do not have the file copy issues

    Best, Mike

    • Edited by mv_gibson Friday, April 16, 2010 3:22 PM Update
    Tuesday, March 9, 2010 5:07 AM
  • Hi rexif,

    The MSDN page you mentioned above hasn't been updated for win7. On win7, the Cached counter in task manager includes only modified and standby pages, and does not include the size of the system cache working set.

    It can be normal for the system cache to consume most of the system memory, even if the system is idle. If memory becomes in demand (for example, if you run some memory intensive application) the memory manager will trim unused pages from the system cache and give them to the application.

    Before win7, limiting the size of the system cache was sometimes necessary to prevent the cache from consuming memory so fast that process working sets would have to be constantly trimmed, resulting in poor system performance. On win7 the cache can still grow to consume most of the memory, but user processes and the system as a whole should remain responsive.

    That said, simply copying a very large file isn't supposed to cause the system cache to consume all available memory. This might indicate a problem in the application that is performing the copy. Can you check whether this issue occurs if you copy the file using one of the built in methods (Explorer/copy/xcopy/robocopy)?

    Thanks,
    Pavel
    Tuesday, March 9, 2010 6:30 AM
  • Hi Pavel,

    I have a second machine created from the same template as the one giving me the problem.  I have installed the File Server and File Server Resource Manager roles (to make it the same as the problem one), and it also has a copy Secure Copy installed.

    When I do a regular copy/xcopy to the new machine, the memory behaves.  The only thing different now between the servers is the Symantec Backup Exec remote Agent for Windows Systems (Ver. 12.5.2213) that is installed on the server I'm having problems with.

    I'm having the agent pushed to my test box, and I'll see if that breaks it.

    That said, when beremote.exe gave me memory issues before, (on a windows 2003 64bit box) it 1. actually reported in task manager that it was using a ton of memory, and 2. released it if I restarted the service.  It does neither on the 2008R2 box, so I'm not too optomistic... but here's hoping.

    thx,
    Wednesday, March 17, 2010 8:03 PM
  • I am having this EXACT same issue. I have had Microsoft support working on this for almost 2 weeks. They have looked at perfmon logs, I've had to provide crash dumps, etc and they still can't figure out why the server keeps eating up ram.
    Thursday, April 8, 2010 2:44 PM
  • Certain services on Windows 2008 R2 machines are configured to use all available memory resources.  These include Exchange, Active Directory Domain Services, SQL, and others.  Also, Windows 2008 R2 and Windows 7, along with 2008 and Vista, precache frequently access materials.  Why?  If the RAM is otherwise unused, then it's better to shove it full of things that will be used than to leave it completely empty.  This way, retrieval of objects will be faster than fetching this from disk.  It is for this reason Virtual Machines should be "right-sized" by configuring the amount of memory appropriate for the server.  Too much memory will result in large amounts of caching and too little will result in large amounts of paging to disk.  Having too much memory is not as big of a problem as too little, so error on the safe side.  Also, please make sure you distiguish between Physical Memory Available and Physical Memory Free.  Free is memory that is not being used for cache, while available is RAM that is cached by can be readily emptied for utilization by other applications.

    Friday, April 9, 2010 8:02 PM
  • @Rabid Squirrel
    I'm aware of certain services taking all the available memory resources, being an Exchange admin gets you up close and personal with the Store.exe process :)
    MS used to give you a way to throttle these processes, (well, at least back in Exchange 5.5) and in fact did so with 2008 Server, but they seemed to have taken it away in R2.  I understand your point about right sizing a server, but I'm confident I should be ok with this config.  I have 4gigs of RAM on a windows 2008R2 vm, and all its providing is file services, and has a backup agent running.  I'm also cool with the idea that prefetching from memory is better than pulling from disk.  In fact, I think thats a great idea, and wouldn't have a problem if that's what's happening.  My problem is that I can't seem to reproduce the behavior on other 2008R2 boxes, and I would have thought the prefetching behavior would be consistant.  On my test box, the memory usage drops after a short time, it doesn't on the production box.  Its also a little disconcerting that different tools give me different readings on what you'd think is the same counter, namely cached memory.

    According to Resource Monitor, the box currently has 305 MB of physical memory available, 305 MB of physical memory cached, 0 MB of physical memory free, and 3542 MB of physical memory In Use. However, Process Explorer reports that there's about 3.1 Gig of physical memory system cache, which is a ____ of a lot more than the 305 MB of physical memory reported as cached by Resource Monitor. 

    As of right now, I have had a suggestion to bump up the memory to 6 or 8 gig and see if it all goes to cache as well.  we'll see...

    @cyr0nk0r
    If you get anything useful back from MS, please post it here.

    Thursday, April 15, 2010 8:16 PM
  • MS used to give you a way to throttle these processes, (well, at least back in Exchange 5.5) and in fact did so with 2008 Server, but they seemed to have taken it away in R2.

    Just on that point, what exactly do you mean? Doesn't Windows System Resource Manager allow you to do that under 2008 R2?

    Friday, April 16, 2010 12:44 AM
  • The system cache size reported by Process Explorer is different from "cached" memory in Resource Monitor.

    The System Cache size is the same thing as Memory\Cache Bytes in perfmon. This memory is used for caching files, but it can't be immediately reused for other purposes. Before a page from the system cache can be reused it needs to be removed from the system cache working set, and, if it happens to be dirty, written to disk. This is why resource monitor treats system cache as part of "in-use" memory.

    "Cached" memory in Resource Monitor is the sum of standby and modified lists (Memory\Standby Cache XXX Bytes and Memory\Modified Page List Bytes, respectively). Typically, standby pages comprise the majority of this. Standby pages can contain either file data or application private memory, and they can be immediately repurposed (because they already have an up-to-date copy of their data on disk), so they are considered part of "available" memory (together with free and zeroed pages).

    In your case, 3.1 GB is in the system cache because some application accessed all this memory using cached IO. None of this memory was prefetched, because server versions of win7 don't have Superfetch, and even on client systems, Superfetch wouldn't have put pages into the system cache. It would instead take free pages, populate them with data from disk and put them onto the standby list.

    You also have 300+ MB of standby/available memory. Generally, this is a pretty healthy state for a server machine, so the memory manager is not trying to create more available pages. However, if you run some memory intensive program on this system, the memory manager will start trimming old pages from working sets, and the system cache will eventually shrink.

    Friday, April 16, 2010 6:03 AM
  • Ok then, look at these screen shots and tell me if this looks normal.

    http://www.lanschoolyard.com/memory.jpg (insane memory usage)

    http://www.lanschoolyard.com/memory1.jpg (1 minute after a server reboot)

    http://www.lanschoolyard.com/memory2.jpg (1 hour after a server reboot)

    Friday, April 16, 2010 2:25 PM
  • The first screenshot shows 257 MB of available memory and no hard faults for the entire duration of the graph. From the memory manager's perspective this situation is normal and there is no immediate need to attempt to reduce memory usage.

    Whether all this memory should have been consumed in the first place is a different question. For example, if this happened simply as a result of copying a lot of files, then it's not normal, and probably indicates a problem in the tool that was used to perform the copy. But without knowing more details about what caused this situation, and what exactly is consuming the memory (is it the system cache? or some user process with a large shared working set?), it's hard to say for sure.

    Saturday, April 17, 2010 5:35 AM
  • The memory usage is ENTIRELY from copying files using robocopy. Even when we stopped copying files and let the server just sit there and idle the ram usage sits there at 99% just like in the screenshot.

    Microsoft ESCALATED technicians continue to say things are normal. (which I think is bullshit)

    If you'd like to take a look at my currently open ticket which should give you access to the crashdumps and perfmon/poolmon logs I've already provided to Microsoft over the last several weeks it is : [REG:110032657811866]

    Saturday, April 17, 2010 6:17 AM
  • Hi, i have the same problem.

    I have 2 VM with VMWare ESX4, but only one have this isue with memory usage. 2 VM have 12Gb Ram memory (Windows 2008 R2 x64). 1 of them works fine, i use this server for (Terminal services, office, internet). But the other is increasing the memory throughout the day and if I restart the server works ok, this server have only (windows 2008 r2 x64 + updates, sql 2008 x32 sp1, shared folder).

    I do not understand because it only happens in one of two.

    thanks

     

    Saturday, April 17, 2010 7:00 AM
  • Hi, i have the same problem.

    I have 2 VM with VMWare ESX4, but only one have this isue with memory usage. 2 VM have 12Gb Ram memory (Windows 2008 R2 x64). 1 of them works fine, i use this server for (Terminal services, office, internet). But the other is increasing the memory throughout the day and if I restart the server works ok, this server have only (windows 2008 r2 x64 + updates, sql 2008 x32 sp1, shared folder).

    I do not understand because it only happens in one of two.

    thanks

     


    Hi, i respond me. SQL 2008 x32 sp1 has AWE option enable, I try to stop SQL service and memory has imediatly down to 1,2Gb, I try to start again and memory increasing again.

    Now I try to configure max value for AWE.

    thanks

    Saturday, April 17, 2010 7:39 AM
  • Sorry, I don't work in product support so I don't have access to customer support cases.

    How many files are you copying, and how big are they on average? Do you have any antivirus/disk encryption etc. products (basically, anything that might install a disk filter driver)? If yes, can you try uninstalling them to see if that reduces memory usage?

    Sunday, April 18, 2010 3:23 AM
  • Well, I think I have a answer to my problem, but I little more help.

    Just got back from tech-ed 2010.  Mark Russinovich had a great session on memory usage in windows, and has written a couple of new tools to help you see what the heck is going on (VMMap and RAMMap). 

    VMMap  http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx
    RAMMap  http://technet.microsoft.com/en-us/sysinternals/ff700229.aspx

     I'm running RAMmap on the 2008R2 system that is driving me nuts, and its showing that most of my RAM (2.3 gig) is being used by the Metafile,  (under "Use Counts" , Usage).  My only problem is that I'm not 100% sure what this Metafile counter is referring to. I know what a metafile is, but I'm not sure how that relates to my physical memory.  The help tells me to buy the Windows internals book 5th edition.

    Anyone know what this counter means?

    thanks.

    Tuesday, June 15, 2010 1:54 PM
  • Rexif,

    The Metafile category in RamMap is a reference to memory used by directories, NTFS metadata files (e.g. MFT), and paging files.

    In regards to the Cache behavior, you might consider applying the following kernel update.  This update contains some fixes for the Cache Manager that may help.

    979149 A computer that is running Windows 7 or Windows Server 2008 R2 becomes unresponsive when you run a large application
    http://support.microsoft.com/default.aspx?scid=kb;EN-US;979149

    -Nagorg

    Wednesday, June 30, 2010 2:03 PM