none
Fastest filter out files with one time use GCI only RRS feed

  • Question

  • cls
    $p='E:\gcp'
    $t=GCI -path $p  | % { if($_.LastWriteTime -lt (Get-Date).Adddays(-30)){
    return $_.FullName}
     }
     echo $t
     echo $t.count
     #-------------------------------------------------------------
     $r=$t  | sls -Pattern ".pdf"
     echo $r.count
     $y=$t  | sls -Pattern "artifact"
    
     echo $y
     echo $y.count
    
     $tt=GCI $y
     echo $tt
     echo $tt.count

    I try to filter files last 30 days and I understand above filtration, is it right or any better way to manger ? 

    I used where and sort desc also include pdf files or include only folder (artifacts) but it take too much time

    I have to 1.2TB data filter out


    Thursday, October 15, 2020 5:38 PM

Answers

  • I'm not sure there's a fast way to do the filtering (like there is with getting AD objects), but there's a faster way to build the lists.

    $path='c:\junk'
    
    [System.Collections.Generic.List[object]]$artifacts = @()
    [System.Collections.Generic.List[object]]$pdfs = @()
    
    $targetdate = (Get-Date).Adddays(-30)   # find files writen to more than 30 days ago
    $allcount = 0
    Get-ChildItem -path $path |          # maybe add '-Recurse'?
        Where-Object LastWriteTime -lt $targetdate |
            ForEach-Object{
                $allcount++
                if ($_.FullName -like "*artifact*"){
                    $artifacts.Add($_.FullName)
                }
                elseif ($_.Extension -like ".pdf"){
                    $pdfs.Add($_.FullName)
                }
            }
    Write-Host "Count of all in date range: $allcount"
    Write-Host "Pdf count $($pdfs.count)"
    # Un-comment the lines below to see what PDF files were found
    # $pdfs |
    #     ForEach-Object{
    #         Write-Host "`t$_"
    #     }
    Write-Host "Artifacts count: $($artifacts.count)"
    Write-Host "Artifacts:"
    $artifacts |
            ForEach-Object{
                Write-Host "`t$_"
            }
    if ($artifacts.count -gt 0){
            $artifactchildren = $artifacts | Get-ChildItem
            Write-Host "Artifact children count: $($artifactchildren.count)"
            Write-Host "Artifact children:"
        $artifactchildren |
                ForEach-Object{
                    "`t$($_.FullName)"
                }
    }
    else{
        Write-Host "No Artifacts found"
    }



    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Friday, October 16, 2020 2:49 AM

All replies

  • your variables have bad naming format

    also aliases ok, for oneliners which you will not show to anyone. Please correct your code

    mistakes in your script:

    1 gci -path $p will not go deaper 1st layer because -recurse switch was not used

    2 $t | sls -pattern '.pdf' do you really nead read all files for find .pdf inside the files content? if you are trying to get all pdf files you can done this in many other ways. 

    3 gci $y will do nothing because your $y will contain not paths but strings from some files which include 'artifact' in their content


    The opinion expressed by me is not an official position of Microsoft

    • Edited by Vector BCO Friday, October 16, 2020 7:12 AM
    Thursday, October 15, 2020 7:31 PM
  • I'm not sure there's a fast way to do the filtering (like there is with getting AD objects), but there's a faster way to build the lists.

    $path='c:\junk'
    
    [System.Collections.Generic.List[object]]$artifacts = @()
    [System.Collections.Generic.List[object]]$pdfs = @()
    
    $targetdate = (Get-Date).Adddays(-30)   # find files writen to more than 30 days ago
    $allcount = 0
    Get-ChildItem -path $path |          # maybe add '-Recurse'?
        Where-Object LastWriteTime -lt $targetdate |
            ForEach-Object{
                $allcount++
                if ($_.FullName -like "*artifact*"){
                    $artifacts.Add($_.FullName)
                }
                elseif ($_.Extension -like ".pdf"){
                    $pdfs.Add($_.FullName)
                }
            }
    Write-Host "Count of all in date range: $allcount"
    Write-Host "Pdf count $($pdfs.count)"
    # Un-comment the lines below to see what PDF files were found
    # $pdfs |
    #     ForEach-Object{
    #         Write-Host "`t$_"
    #     }
    Write-Host "Artifacts count: $($artifacts.count)"
    Write-Host "Artifacts:"
    $artifacts |
            ForEach-Object{
                Write-Host "`t$_"
            }
    if ($artifacts.count -gt 0){
            $artifactchildren = $artifacts | Get-ChildItem
            Write-Host "Artifact children count: $($artifactchildren.count)"
            Write-Host "Artifact children:"
        $artifactchildren |
                ForEach-Object{
                    "`t$($_.FullName)"
                }
    }
    else{
        Write-Host "No Artifacts found"
    }



    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Friday, October 16, 2020 2:49 AM
  • Friday, October 16, 2020 11:35 AM
  • You could be a bit clearer about the  "artifacts". Is that a directory name (or part of a directory name)? If it is, the building of the $artifacts list can be drastically reduced by making a minor change to one line:

    if ($_.PSIsContainer -and $_.FullName -like "*artifact*"){

    With that change only directories whose fullname property contains the string "artifact" will be placed into the list, and this line won't have to needlessly deal with files:

    $artifactchildren = $artifacts | Get-ChildItem

    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Friday, October 16, 2020 3:14 PM
  • You could be a bit clearer about the  "artifacts". Is that a directory name (or part of a directory name)? If it is, the building of the $artifacts list can be drastically reduced by making a minor change to one line:

    if ($_.PSIsContainer -and $_.FullName -like "*artifact*"){

    With that change only directories whose fullname property contains the string "artifact" will be placed into the list, and this line won't have to needlessly deal with files:

    $artifactchildren = $artifacts | Get-ChildItem

    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    mb the case is fullname match will return all files within artifacts folder instead of returning just artifacts folder. It just my gues because i have no chance to get what should be done reading description :(

    If I'm right than $_.name shoul be used instead of $_.fullname


    The opinion expressed by me is not an official position of Microsoft

    Friday, October 16, 2020 3:49 PM
  • #Need recurse switch for pdfs only and not for artifact folder which is inside in $path 
    #can we combine these two loops into one
    
    Get-ChildItem -path $path -Recurse -file|
        Where-Object LastWriteTime -lt $targetdate |
            ForEach-Object{            
                if ($_.Extension -like ".pdf"){
                    $pdfs.Add($_.FullName)
                }
            }
    
    
    Get-ChildItem -path "$path\artifact" -file|
        Where-Object LastWriteTime -lt $targetdate |
                if ($_.FullName -like "*artifact*"){
                    $artifacts.Add($_.FullName)
                }
            }


    Saturday, October 17, 2020 3:47 PM
  • all files and foldes from this loop will be returned

    Get-ChildItem -path "$path\artifact" -file|
        Where-Object LastWriteTime -lt $targetdate | # foreach losted
                if ($_.FullName -like "*artifact*"){
                    $artifacts.Add($_.FullName)
                }
            }

    above code will do exact the same as 

    $artifacts = Get-ChildItem -path "$path\artifact" -file | Where-Object LastWriteTime -lt $targetdate 
            

    if for some reasone you need to place 2 if statements inside 1 foreach loop, for sure u can do it. In case if would be needed you could place 3 if/else constructions or more.

    Get-ChildItem -path $path -Recurse -file| Where-Object LastWriteTime -lt $targetdate | ForEach-Object{ if ($_.Extension -like ".pdf"){ $pdfs.Add($_.FullName) }

    if ($_.Directory -eq "$path\artifact"){

    $artifacts.add($_.fullname)

    } }

    but once again you need take a look on logic structure of this code - its not optimized and could produce "random issues" if you do not understand what ther is going on.


    The opinion expressed by me is not an official position of Microsoft

    • Edited by Vector BCO Saturday, October 17, 2020 9:23 PM
    Saturday, October 17, 2020 4:31 PM
  • By using the -Recurse switch you're going to include the "artifact" directory and its contents in your example's first loop.

    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Saturday, October 17, 2020 6:39 PM