none
Get-Content cmdlet memory leak?

    Question

  • I put together this function to search on text patterns in files in certain folders and subfolders. I ran it against a local folder containing a 1.5 GB file. The machine I'm using is a Server 2012 R2 VM on a Server 2012 R2 hypervisor, both fully patched. The VM is configured with 1 GB startup dynamic RAM and 32 GB maximum. I watched the powershell_.exe process use more memory for over 30 min.:

    Is this a memory leak issue with the Get-Content cmdlet in Powershell_ise?

    Powershell_ise has used over 18 GB RAM before I had to Ctrl-C to break out of the script. It never finished..

    Script works fine on smaller files..


    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable)


    • Edited by Sam Boutros Sunday, August 31, 2014 1:41 PM
    Sunday, August 31, 2014 1:38 PM

Answers

  • Well, Windows is pretty aggressive about handing out available memory to processes that might need it. I wouldn't worry too much about the actual numbers you see in task manager.

    However, there are certainly enhancements you could make to your code to improve on this situation.  This line:

    if ((Get-Content $File.FullName -ErrorAction SilentlyContinue) -match $String)

    Is extremely slow (and a memory hog) compared to this:

    if (Select-String -Pattern $String -Path $File.FullName -Quiet)

    • Marked as answer by Sam Boutros Sunday, August 31, 2014 2:13 PM
    Sunday, August 31, 2014 1:52 PM

All replies

  • Well, Windows is pretty aggressive about handing out available memory to processes that might need it. I wouldn't worry too much about the actual numbers you see in task manager.

    However, there are certainly enhancements you could make to your code to improve on this situation.  This line:

    if ((Get-Content $File.FullName -ErrorAction SilentlyContinue) -match $String)

    Is extremely slow (and a memory hog) compared to this:

    if (Select-String -Pattern $String -Path $File.FullName -Quiet)

    • Marked as answer by Sam Boutros Sunday, August 31, 2014 2:13 PM
    Sunday, August 31, 2014 1:52 PM
  • Well, Windows is pretty aggressive about handing out available memory to processes that might need it. I wouldn't worry too much about the actual numbers you see in task manager.

    However, there are certainly enhancements you could make to your code to improve on this situation.  This line:

    if ((Get-Content $File.FullName -ErrorAction SilentlyContinue) -match $String)

    Is extremely slow (and a memory hog) compared to this:

    if (Select-String -Pattern $String -Path $File.FullName -Quiet)

    Very nice Dave! I added measure-command around the foreach loop to show how long it takes to run on a given folder. Using your command reduced processing time by 50%!

    I ran it against the 1.5 GB file with your command:

    Not only it finished, but it's done in 54 seconds!

    The odd thing is that the Get-Content cmdlet took over 18 GB RAM to process a 1.5 GB file, and could not finish..


    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable)


    • Edited by Sam Boutros Sunday, August 31, 2014 2:12 PM
    Sunday, August 31, 2014 2:08 PM
  • Also note that this line will be a killer if you have a lot of matching files.

    $FileCount = (Get-ChildItem $FolderName -Recurse -File -Include $FilePattern -ErrorAction SilentlyContinue).Count

    This will return all files and then get the count.

    I think the design of this script needfs to be adjusted for other than a small number of matches.  Also iusing Select-String as David suggested will aid performance.


    ¯\_(ツ)_/¯

    Sunday, August 31, 2014 2:11 PM
  • Also note that this line will be a killer if you have a lot of matching files.

    $FileCount = (Get-ChildItem $FolderName -Recurse -File -Include $FilePattern -ErrorAction SilentlyContinue).Count

    This will return all files and then get the count.

    I think the design of this script needfs to be adjusted for other than a small number of matches.  Also iusing Select-String as David suggested will aid performance.


    ¯\_(ツ)_/¯

    True, but I'm not sure it can be avoided. I mean if someone is searching in a million-files data set, they should expect it to take time..

    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable)

    Sunday, August 31, 2014 2:25 PM
  • This does pretty much exactly what the gallery script does but it will be much faster and it will not hog memory while it runs due to the use of the pipeline

    We really don't need a separate function but I wrapped it so you can see the how the parameters map to the Select-String CmdLet.

    function Find-text{
        Param(
            [Parameter(Mandatory=$true,
                       ValueFromPipeLine=$true,
                       ValueFromPipeLineByPropertyName=$true,
                       Position=0)]
                [ValidateNotNullorEmpty()]
                [String[]]$TextPattern, 
            [Parameter(Mandatory=$false,
                       ValueFromPipeLine=$true,
                       ValueFromPipeLineByPropertyName=$true,
                       Position=1)]
                [ValidateScript({ Test-Path $_ })]
                [String[]]$FolderName = ".\",
            [Parameter(Mandatory=$false,
                       ValueFromPipeLine=$true,
                       ValueFromPipeLineByPropertyName=$true,
                       Position=2)]
                [String[]]$FilePattern = "*"
        )
        select-string -Path $foldername -Pattern $TextPattern -Include $FilePattern -Exclude *.txt -SimpleMatch -ea 0| 
    select path -unique }

    We can really just fill in the CmdLet:

    select-string -Path c:\f1\*,c:\f2\*,c:\f3\* -Pattern 'hello','world',not' -Include *.not,*.pdf,*.log -Exclude *.txt -SimpleMatch -ea 0 |
         select path -unique  

    That is all you need.

    Notice we add a * to the file path to force the include and exclude to work as desired.


    ¯\_(ツ)_/¯


    • Edited by jrv Sunday, August 31, 2014 2:27 PM
    Sunday, August 31, 2014 2:26 PM
  • One thing you need to realize is that the file on disk is stored as a string with embedded line breaks.  Get-Content will create an array of strings using the newlines as delimiters.

    A string array is a much more complex object than a single string.  It requires more memory to represent that file as a string array than the size of the file on disk.

    As an experiment, choose some text file and run this:

    get-content $file -raw | export-clixml testout_string.xml

    get-content $file | export-clixml testout_array.xml get-childitem $file,'testout_string.xml','testout_array.xml'

    The sizes of the exported clixml files will reflect the relative complexity of the objects in memory, and consequently the amount of memory required to store them.  The effect will be more pronounced if the file has many relatively small records, as opposed the the same size file consisting of relatively few large records.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Sunday, August 31, 2014 2:35 PM
    Moderator
  • Also note that this line will be a killer if you have a lot of matching files.

    $FileCount = (Get-ChildItem $FolderName -Recurse -File -Include $FilePattern -ErrorAction SilentlyContinue).Count

    This will return all files and then get the count.

    I think the design of this script needfs to be adjusted for other than a small number of matches.  Also iusing Select-String as David suggested will aid performance.


    ¯\_(ツ)_/¯

    True, but I'm not sure it can be avoided. I mean if someone is searching in a million-files data set, they should expect it to take time..

    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable)

    The select-string method using the pipeline is about 1000 faster over very large file sets and does not hog memory.  To get a count of files that match using Get-Content requires caching all results until they are collected.  After the count is extracted you save it then go back and do the  same thing all over again.

    This is a common issue with new programmers who are not yet sensitive to how existing subsystems work and not aware of common ways to avoid the issues.  The pipeline is one method that avoids accumulating thousands of objects.

    Select-String scans a file and releases it returning only match objects.  YOu can set it to -SimpleMatch and it will return the first match and not try to match all of the file.  Using select Path returns only the file name releasing the match objects  -Unique throws away all duplicates as they are found. 

    By leveraging the different aspects of PowerShell to accomplish a task you can avoid huge memory footprints and slow tasks.  PowerShell was designed with this in mind so think about each CmdLet and syntax to understand what is happening.

    For extremely large tasks over many remote systems we would jump to Jobs and Workflow.  Access to all of these tools at a command prompt is awesome as they have only been available to compiled languages in the past.


    ¯\_(ツ)_/¯

    Sunday, August 31, 2014 2:39 PM
  • One thing you need to realize is that the file on disk is stored as a string with embedded line breaks.  Get-Content will create an array of strings using the newlines as delimiters.

    A string array is a much more complex object than a single string.  It requires more memory to represent that file as a string array than the size of the file on disk.

    As an experiment, choose some text file and run this:

    get-content $file -raw | export-clixml testout_string.xml

    get-content $file | export-clixml testout_array.xml get-childitem $file,'testout_string.xml','testout_array.xml'

    The sizes of the exported clixml files will reflect the relative complexity of the objects in memory, and consequently the amount of memory required to store them.  The effect will be more pronounced if the file has many relatively small records, as opposed the the same size file consisting of relatively few large records.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    To add to this Get-Content reads by line.  Select-String streams the file which is much more efficient.  Select-String -SimpleMatch stops reading on the first match.  Get-Content as used continues to match until the end of the file.  All of the file is loaded into memory first.  Select-String does not need to load the whole file.


    ¯\_(ツ)_/¯

    Sunday, August 31, 2014 2:42 PM
  • Select-String -SimpleMatch stops reading on the first match.


    ¯\_(ツ)_/¯

    That's the -Quiet parameter.  -SimpleMatch changes how -Pattern is interpreted (as a literal string instead of a regular expression.)

    Sunday, August 31, 2014 2:52 PM
  • Excellent points James, I have changed the main loop's if statement from Get-Content to Select-String. 

    In the example:

    $foldername = ".\*","\\xhost15\install\scripts\*"
    $TextPattern = 'hello','boutros'
    $FilePattern = "*.not","*.pdf","*.log",'*.ps1'
    Select-String -Path $foldername -Pattern $TextPattern -Include $FilePattern -Exclude *.txt -SimpleMatch -ea 0 | select path -unique 

    I take it -ea 0 is short for -ErrorAction SilentlyContinue, right?

    What's missing here compared to the original function intent is information to identify which pattern was found in which file, and I'm not sure if "select path -unique" is needed..


    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable)

    Sunday, August 31, 2014 2:55 PM
  • One thing you need to realize is that the file on disk is stored as a string with embedded line breaks.  Get-Content will create an array of strings using the newlines as delimiters.

    A string array is a much more complex object than a single string.  It requires more memory to represent that file as a string array than the size of the file on disk.

    As an experiment, choose some text file and run this:

    get-content $file -raw | export-clixml testout_string.xml

    get-content $file | export-clixml testout_array.xml get-childitem $file,'testout_string.xml','testout_array.xml'

    The sizes of the exported clixml files will reflect the relative complexity of the objects in memory, and consequently the amount of memory required to store them.  The effect will be more pronounced if the file has many relatively small records, as opposed the the same size file consisting of relatively few large records.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Tested:

    550 byte file on disk ==> 6,590 bytes string in memory ==> 24,052 array in memory !!??

    That's 1:12:44 ratio !!??

    WOW, thanks for the insight!


    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable)

    Sunday, August 31, 2014 3:04 PM
  • Excellent points James, I have changed the main loop's if statement from Get-Content to Select-String. 

    In the example:

    $foldername = ".\*","\\xhost15\install\scripts\*"
    $TextPattern = 'hello','boutros'
    $FilePattern = "*.not","*.pdf","*.log",'*.ps1'
    Select-String -Path $foldername -Pattern $TextPattern -Include $FilePattern -Exclude *.txt -SimpleMatch -ea 0 | select path -unique 

    I take it -ea 0 is short for -ErrorAction SilentlyContinue, right?

    What's missing here compared to the original function intent is information to identify which pattern was found in which file, and I'm not sure if "select path -unique" is needed..


    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable)


    PS C:\scripts> select-string -Path c:\scripts\* -Pattern hello,test -Include *.txt |select path, pattern
    
    Path                                                        Pattern
    ----                                                        -------
    C:\scripts\alert.txt                                        hello
    C:\scripts\bios_sn.txt                                      test
    C:\scripts\certs.txt                                        test
    C:\scripts\certs.txt                                        test
    C:\scripts\certs.txt                                        test
    C:\scripts\certs.txt                                        test
    C:\scripts\certs.txt                                        test
    C:\scripts\certs.txt                                        test
    C:\scripts\computers.txt                                    test
    C:\scripts\computers.txt                                    test
    C:\scripts\computers.txt                                    test
    C:\scripts\copylog.txt                                      hello
    C:\scripts\copylog.txt                                      test
    C:\scripts\copylog.txt                                      test


    ¯\_(ツ)_/¯

    Sunday, August 31, 2014 3:15 PM
  • and the verdict is:

    Select-String -Pattern $String -Path $File.FullName -Quiet ==> 54 seconds

    Select-String -Pattern $String -Path $File.FullName -SimpleMatch ==> 76 seconds

    Select-String -Pattern $String -Path $File.FullName -SimpleMatch  | select path,pattern -unique ==> 77 seconds

    (same data set, same machine, all other factors the same..)

    Again, thank you Dave, James, and Rob for you helpful comments and insights. It's been most helpful.


    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable)

    Sunday, August 31, 2014 3:25 PM
  • -Path should be the path to the folder and not to just one file.  Use it instead of GCI or Get-Content.

    Look at what I did above.  If you have extremely large files then scans of any kind will take time.


    ¯\_(ツ)_/¯

    Sunday, August 31, 2014 3:32 PM
  • -Path should be the path to the folder and not to just one file.  Use it instead of GCI or Get-Content.

    Look at what I did above.  If you have extremely large files then scans of any kind will take time.


    ¯\_(ツ)_/¯

    Yep, all testing is now without the Get-Content cmdlet. 

    I'll do a re-write to see if I can remove the get-ChildItem as well..


    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable)

    Sunday, August 31, 2014 3:35 PM
  • There is GCI.  This is built into Select-String.  If you want a file count just collect the objects and count them at the end.


    ¯\_(ツ)_/¯

    Sunday, August 31, 2014 3:37 PM
  • There is GCI.  This is built into Select-String.  If you want a file count just collect the objects and count them at the end.


    ¯\_(ツ)_/¯

    I need a file count before the main loop that does the actual finding of the text patterns in the search folders. I need the count for the progress bar...

    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable)

    Sunday, August 31, 2014 3:41 PM
  • Forget the progress bar if you want performance. 


    ¯\_(ツ)_/¯

    Sunday, August 31, 2014 3:45 PM
  • There is GCI.  This is built into Select-String.  If you want a file count just collect the objects and count them at the end.


    ¯\_(ツ)_/¯

    I need a file count before the main loop that does the actual finding of the text patterns in the search folders. I need the count for the progress bar...

    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable)

    Use something like:

    $FileCount = (Get-Childitem <filespec>).count

    Rather than:

    $Files = Get-Childitem <filespec>
    and then testing for $Files.count.  Then you're not storing fileinfo objects for the entire file collection in memory.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Sunday, August 31, 2014 3:46 PM
    Moderator
  • There is GCI.  This is built into Select-String.  If you want a file count just collect the objects and count them at the end.


    ¯\_(ツ)_/¯

    I need a file count before the main loop that does the actual finding of the text patterns in the search folders. I need the count for the progress bar...

    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable)

    Use something like:

    $FileCount = (Get-Childitem <filespec>).count

    Rather than:

    $Files = Get-Childitem <filespec>
    and then testing for $Files.count.  Then you're not storing fileinfo objects for the entire file collection in memory.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    That still stores all of the file objects in an "anonymous" variable then extracts the count.  This doesn't:

    $count=0
    GCI <filespec> -file |%{$count++}

    Much depends on the optimizer as to how this measures but on a first time basis it will be faster with very large numbers of files.


    ¯\_(ツ)_/¯


    • Edited by jrv Sunday, August 31, 2014 4:13 PM
    Sunday, August 31, 2014 4:09 PM
  • Yes Rob, that's what I had in the code to start with - line 95 that gets the count..

    4 lines later, I do another CGI: 

    Get-ChildItem -Path $FolderName -Recurse -File -Include $FilePattern -ErrorAction SilentlyContinue

    for the main loop through the file list..

    Here's something odd. I tested your hypothesis above, which was exactly what I thought before:

    Measure-Command { $FileCount = (Get-Childitem d:\ -Recurse).count; $FileCount } # ==> this shows 229 milliseconds Measure-Command { $Files = Get-Childitem d:\ -recurse;  $Files.count }
    # ==> this shows 203 milliseconds

    However, these results suggest that 

    $Files = Get-Childitem d:\ -Recurse;  $Files.count 

    is a bit faster than

    $FileCount = (Get-Childitem d:\ -Recurse).count; $FileCount 

    However, the former comes with a memory penalty of 12 MB (for 1500 files)..


    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable)

    Sunday, August 31, 2014 4:15 PM
  • Memory penalty:

    gci 'c:\program files*\*' -rec).Count   ==> 330Mb after one pass ==  accumulates memory penalty on repeated usages and GC adjust only after free memory is low.

    gci 'c:\program files*\*' -rec |%{$i++} ==> 21 Mb after one pass  ==  stable memory at 49Mb after multiple passes.


    ¯\_(ツ)_/¯

    Sunday, August 31, 2014 4:22 PM
  • Yes Rob, that's what I had in the code to start with - line 95 that gets the count..

    4 lines later, I do another CGI: 

    Get-ChildItem -Path $FolderName -Recurse -File -Include $FilePattern -ErrorAction SilentlyContinue

    for the main loop through the file list..

    Here's something odd. I tested your hypothesis above, which was exactly what I thought before:

    Measure-Command { $FileCount = (Get-Childitem d:\ -Recurse).count; $FileCount } # ==> this shows 229 milliseconds Measure-Command { $Files = Get-Childitem d:\ -recurse;  $Files.count }
    # ==> this shows 203 milliseconds

    However, these results suggest that 

    $Files = Get-Childitem d:\ -Recurse;  $Files.count 

    is a bit faster than

    $FileCount = (Get-Childitem d:\ -Recurse).count; $FileCount 

    However, the former comes with a memory penalty of 12 MB (for 1500 files)..


    Sam Boutros, Senior Consultant, Software Logic, KOP, PA http://superwidgets.wordpress.com (Please take a moment to Vote as Helpful and/or Mark as Answer, where applicable)

    If you're really wanting to squeeze it, try these:

    Filter Count-Files {$script:FileCount++}
    
    
    $FileCount = 0
    measure-command { gci d:\ -Recurse -ea 0 |% {$filecount++} }
    $filecount
    
    $FileCount = 0
    Measure-Command { gci d:\ -Recurse -ea 0 | Count-Files }
    $FileCount
    
    $FileCount = 0
    Measure-Command { cmd /c dir d:\ /b /s | Count-Files }
    $FileCount


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Sunday, August 31, 2014 5:57 PM
    Moderator
  • Ah yes...the old DOS "dir" command to the rescue.  The WIn32 API is still faster than then Net.


    ¯\_(ツ)_/¯

    Sunday, August 31, 2014 6:08 PM
  • And a filter is marginally faster than foreach-object.  Not much, but for a large collection it adds up.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Sunday, August 31, 2014 6:28 PM
    Moderator
  • And a filter is marginally faster than foreach-object.  Not much, but for a large collection it adds up.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Curious but good to know.  Its a function which is lower overhead than a CmdLet or Advanced Function I suppose.


    ¯\_(ツ)_/¯

    Sunday, August 31, 2014 6:32 PM