locked
Maximum size of CSV/Text file as an input to PowerShell RRS feed

  • Question

  • Dear friends,

    I need to analyze the ACL/ACE migration LOG generated through SubInACL to get the count of successful migration. With Regex in PowerShell, I am able to find successful changes done by SubInACL. But the point is the generated logs are too large like 150 GB to 200 GB, which I need as an input to PS for finding the successful changes. I need to know that what is the maximum size of a text file/log, the PowerShell will be able to take as an input for this analysis. How much system resource will be required to use 200 GB of log as an input to PS. Please help me in this regard. Thank you.

    Regards

    Bedanta Mishra


    Sunday, December 14, 2014 12:52 PM

Answers

  • OK. 

    I've got this:

    $Changes = @{Count=0}
    Filter Get-Changes { $_ -like '*change(s)*' -replace '.+ : (\d+) change.+','$1' }
    Filter Count-Changes {$Changes.Count += $_}
    
    get-content .\infoblox_log.txt -ReadCount 7000 |
     Get-Changes |
     Count-Changes 
    
    
     $Changes.Count

    Using a 100MB test file of about 1.2 million lines, extrapolated from your posted test data, it counted the changes in about 10 seconds.

    If my calculations are correct, it's still going to take about 5.5 hours for a 200GB file, but I suspect that's considerably faster than what you're using now.

    Edit: Just noticed you have separate counts for Count and Sum

    Updated code:

    $Changes = @{Count=0;Sum=0}
    Filter Get-Changes { $_ -like '*change(s)*' -replace '.+ : (\d+) change.+','$1' }
    filter Count-Changes {$Changes.Count++;$Changes.Sum += $_}
    
    get-content .\infoblox_log.txt -ReadCount 7000 |
     Get-Changes |
     Count-Changes 
     
    
     $Changes.Count
     $Changes.Sum
    Doesn't seem to have much impact on the run time.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "



    • Edited by mjolinor Sunday, December 14, 2014 5:55 PM
    • Marked as answer by Bedanta Mishra Sunday, December 14, 2014 7:52 PM
    • Unmarked as answer by Bill_Stewart Tuesday, February 10, 2015 6:54 PM
    • Marked as answer by Bill_Stewart Tuesday, February 10, 2015 6:54 PM
    Sunday, December 14, 2014 5:42 PM

All replies

  • For this kind of application, I use a strategy or reading in the data using Get-Content, with a ReadCount of usually between 1000 and 5000 records.

    The pattern will look something like this:

    Get-Content <filename> -ReadCount 2000 |
    foreach {
     $_ -match <regex> -replace <regex> |
     Add-Content <outputfile>
     }

    You cannot read the entire file into memory - it's simply too big.  You have to use a pipeline solution.

    The default ReadCount for Get-Content is 1.  If you use that then you keep your memory consumption under control, but you're only working with one record at a time and its too slow to be practical.

    You can use the -match and -replace operators to filter and extract data, and they will work as array operators, so by using the appropriate -ReadCount setting you can work the file in 'chunks' of data rather than one record at a time.

    If the input is in multiple files you can multi-thread it so that it's working on multiple files simultaneously, and then merge the data when they're done.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "


    • Edited by mjolinor Sunday, December 14, 2014 1:30 PM
    Sunday, December 14, 2014 1:27 PM
  • Hi mjolinor,

    Thank you for reply. I got your point that I have to multi-thread the read operation from input file.  Could you please help me to setup "-ReadCount" in my script ! This will be really a great help.

    PS Script:

    Set-ExecutionPolicy -Force Unrestricted
    $input_path = ‘.\infoblox_log.txt’
    $output_file = ‘.\info.txt’
    $regex ='\d{1,}\s+\w+\W\w\W'
    select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_file
    $newinput_path ='.\info.txt'
    $final_output ='finalextract.txt'
    $regex ='\d{1,}'
    select-string -Path $newinput_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $final_output
    $countfile =Get-Content -Path $final_output | select –Property @{name='COUNT';expression={$_.Substring(0).Trim()}}
    $countfile | Measure-Object COUNT -Sum | Select-Object Count,Sum | Format-List > SuccessfulMigration_Count.txt
    

    Generated Log file :

    \\server.com\Infoblox : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox : 1 change(s)
    
    \\server.com\Infoblox\12-11-2012.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\12-11-2012.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\12-11-2012.bak : 1 change(s)
    \\server.com\Infoblox\05-09-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\05-09-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\05-09-2013.bak : 1 change(s)
    \\server.com\Infoblox\05-18-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\05-18-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\05-18-2013.bak : 1 change(s)
    \\server.com\Infoblox\05-28-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\05-28-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\05-28-2013.bak : 1 change(s)
    \\server.com\Infoblox\06-06-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\06-06-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\06-06-2013.bak : 1 change(s)
    \\server.com\Infoblox\06-12-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\06-12-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\06-12-2013.bak : 1 change(s)
    \\server.com\Infoblox\06-13-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\06-13-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\06-13-2013.bak : 1 change(s)
    \\server.com\Infoblox\01-07-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\01-07-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\01-07-2013.bak : 1 change(s)
    \\server.com\Infoblox\Mazu Backup : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\Mazu Backup replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\Mazu Backup : 1 change(s)
    \\server.com\Infoblox\02-01-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\02-01-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\02-01-2013.bak : 1 change(s)
    \\server.com\Infoblox\02-07-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\02-07-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\02-07-2013.bak : 1 change(s)
    \\server.com\Infoblox\03-12-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\03-12-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\03-12-2013.bak : 1 change(s)
    \\server.com\Infoblox\04-10-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\04-10-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\04-10-2013.bak : 1 change(s)
    \\server.com\Infoblox\04-11-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\04-11-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\04-11-2013.bak : 1 change(s)
    \\server.com\Infoblox\06-28-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\06-28-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\06-28-2013.bak : 1 change(s)
    \\server.com\Infoblox\07-01-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\07-01-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\07-01-2013.bak : 1 change(s)
    \\server.com\Infoblox\07-09-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\07-09-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\07-09-2013.bak : 1 change(s)
    \\server.com\Infoblox\09-19-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\09-19-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\09-19-2013.bak : 1 change(s)
    \\server.com\Infoblox\07-24-2013.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\07-24-2013.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\07-24-2013.bak : 1 change(s)
    \\server.com\Infoblox\license_1384446190.txt : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\license_1384446190.txt replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\license_1384446190.txt : 1 change(s)
    \\server.com\Infoblox\02-07-2014.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\02-07-2014.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\02-07-2014.bak : 1 change(s)
    \\server.com\Infoblox\08-22-2014.bak : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\08-22-2014.bak replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\08-22-2014.bak : 1 change(s)
    \\server.com\Infoblox\Mazu Backup\mazu_bkp_20121130_C15570.tar.gz : newdomain\domain users is the new Primary Group 
    \\server.com\Infoblox\Mazu Backup\mazu_bkp_20121130_C15570.tar.gz replace Primary Group olddomain\domain users with newdomain\domain users
    \\server.com\Infoblox\Mazu Backup\mazu_bkp_20121130_C15570.tar.gz : 1 change(s)
    
    
    
    


    Thanks & Regards Bedanta S Mishra

    Sunday, December 14, 2014 1:44 PM
  • If there's only one file, it's not "multi-threading".  It's a single thread, but it's working with multiple records at a time.

    I'm leaving for work right now, but I'll have a look at this later and come back with some recommendations.  


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Sunday, December 14, 2014 1:51 PM
  • Sure, I will wait for that. Thanks.

    Thanks & Regards Bedanta S Mishra

    Sunday, December 14, 2014 1:53 PM
  • Can you tell me exactly what you're trying to accomplish there?

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Sunday, December 14, 2014 4:04 PM
  • From the Log, I need to extract the number of changes done by SubInACL for each instance, which is the string "1 change(s)". Here in the given log there are 24 such instances.

    From this string I need to extract the integer part and make a total of those integers. 

    The Output from the Regex operation is as below :

    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    1 change(s)
    

    After extracting the integer part :

    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    1
    

    Then I need to calculate the Sum and Count of changes through Measure-Object. Output is as below.

    Count : 24
    Sum   : 24
    
    
    
    

    I have processed a small log file of 12 KB with the above process successfully. But the concern is for the logs larger than 150 GB or like that, which may down the system. Hope you got my requirement.



    Thanks & Regards Bedanta S Mishra

    Sunday, December 14, 2014 4:17 PM
  • Well there ever be more than 1 change per instance?

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Sunday, December 14, 2014 4:45 PM
  • Yes there may be more than 1 change, as that depends on the no. of PACE value. That is why I need the Sum of this values, to confirm what is the exact amount of changes have been made.

    Thanks & Regards Bedanta S Mishra

    Sunday, December 14, 2014 4:55 PM
  • OK. 

    I've got this:

    $Changes = @{Count=0}
    Filter Get-Changes { $_ -like '*change(s)*' -replace '.+ : (\d+) change.+','$1' }
    Filter Count-Changes {$Changes.Count += $_}
    
    get-content .\infoblox_log.txt -ReadCount 7000 |
     Get-Changes |
     Count-Changes 
    
    
     $Changes.Count

    Using a 100MB test file of about 1.2 million lines, extrapolated from your posted test data, it counted the changes in about 10 seconds.

    If my calculations are correct, it's still going to take about 5.5 hours for a 200GB file, but I suspect that's considerably faster than what you're using now.

    Edit: Just noticed you have separate counts for Count and Sum

    Updated code:

    $Changes = @{Count=0;Sum=0}
    Filter Get-Changes { $_ -like '*change(s)*' -replace '.+ : (\d+) change.+','$1' }
    filter Count-Changes {$Changes.Count++;$Changes.Sum += $_}
    
    get-content .\infoblox_log.txt -ReadCount 7000 |
     Get-Changes |
     Count-Changes 
     
    
     $Changes.Count
     $Changes.Sum
    Doesn't seem to have much impact on the run time.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "



    • Edited by mjolinor Sunday, December 14, 2014 5:55 PM
    • Marked as answer by Bedanta Mishra Sunday, December 14, 2014 7:52 PM
    • Unmarked as answer by Bill_Stewart Tuesday, February 10, 2015 6:54 PM
    • Marked as answer by Bill_Stewart Tuesday, February 10, 2015 6:54 PM
    Sunday, December 14, 2014 5:42 PM
  • Thanks a lot mjolinor, it works like a charm. As per the concern time is not an issue. Could you please help me a little more, so that I can get the both final output for Count and Sum in proper format like it will show,

    Count = 24

    Successful Migration = 24

    So that I can store the above result in to a text file.

    Also it will be a great help if you please let me understand how these two lines (Regex) work.

    "Filter Get-Changes { $_ -like '*change(s)*' -replace '.+ : (\d+) change.+','$1' }
    filter Count-Changes {$Changes.Count++;$Changes.Sum += $_}"

    Awaiting your valuable reply :)


    Thanks & Regards Bedanta S Mishra

    Sunday, December 14, 2014 6:38 PM