locked
Readcount 0 not working? RRS feed

  • Question

  • I've recently been playing with large sets of text data.  While doing this my first inclination to get a text file into a single string was to do 

    get-content -readcount 0 test.txt

    For some reason that wasn't working for me.  I didn't think anything of it b/c I quickly did something like this:

    $text = "";get-content test.txt %{$text +=$_}

    Tonight I decided to play with it a bit more.  I believe it's definitely not working.  If I do a readcount of the number of lines minus 1 I get two strings in my array: The first is all of the lines except the last line - the second is the last line.  However if I set it to 0 or to the number of lines in the file I get each line as an element of the collection:

    23:50:47 PS C:\Dropbox\My Dropbox\scripts> $t = Get-Content .\dict.txt -ReadCount 0;$t.count
    234726
    
    ____________________________________________________________________________________________________________________________________________________________________________________
    23:54:37 PS C:\Dropbox\My Dropbox\scripts> $t = Get-Content .\dict.txt -ReadCount 234726;$t.count
    234726
    
    ____________________________________________________________________________________________________________________________________________________________________________________
    23:54:46 PS C:\Dropbox\My Dropbox\scripts> $t = Get-Content .\dict.txt -ReadCount 234725;$t.count
    2

    What's the story here?  According to the parameter info:

    -ReadCount <Int64>

    Specifies how many lines of content are sent through the pipeline at a time. The default value is 1. A value of 0 (zero) sends all of the content at one time. 

     


    http://twitter.com/toenuff
    write-host ((0..56)|%{if (($_+1)%3 -eq 0){[char][int]("116111101110117102102064103109097105108046099111109"[($_-2)..$_] -join "")}}) -separator ""
    Wednesday, October 27, 2010 3:57 AM

Answers

  • I think it's the typecasting.  When you do gc with a -readcount you get back an array of arrays (do a get-type on one of the returned elements).  When you do -readcount 0, it reads all of it in at once, so it's only going to return a single object (an array). 

    It's the equivalent of doing @(@(1,2,3)).  The end result is an array of 3 intergers, not a single element array of an array of 3 intergers.

    Hope that makes sense.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "
    Wednesday, October 27, 2010 12:59 PM

All replies

  • So, I tested on a 7 line file (seemed easier ;).

    $x = get-content -readcount <val> .\test.txt ; $x.count

    Results:
    <val> = 0, $x.count = 7
    <val> = 1, $x.count = 7
    <val> = 2, $x.count = 4
    <val> = 3, $x.count = 3
    <val> = 4, $x.count = 2
    <val> = 5, $x.count = 2
    <val> = 6, $x.count = 2
    <val> = 7, $x.count = 7
    <val> = 8, $x.count = 7

    So, it appears that it's 'grouping' the lines with the case of 0 and NumberOfLinesInFile (7 in this case) grouping by 1.

    Is this correct behavior? Hard to tell with very little documentation :O


    GregM
    Wednesday, October 27, 2010 12:37 PM
  • I think it's the typecasting.  When you do gc with a -readcount you get back an array of arrays (do a get-type on one of the returned elements).  When you do -readcount 0, it reads all of it in at once, so it's only going to return a single object (an array). 

    It's the equivalent of doing @(@(1,2,3)).  The end result is an array of 3 intergers, not a single element array of an array of 3 intergers.

    Hope that makes sense.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "
    Wednesday, October 27, 2010 12:59 PM
  • Of course!  Bah!  thx.

    I could get it as a single object like this:

    $file = @(gc t.txt -readcount 0); $file.count

    But, that's not going to work for my purposes.  I want to read the whole thing and have it be a single string so I can do multiline regexes.  I can do it with .Net:

    $text = [System.IO.File]::OpenText("C:\test.csv").ReadToEnd()

    I guess that can do when I need to do it.  Unless anyone can think of a better way to do it via PowerShell that will not be line by line.  In other words, something that isn't: $text = "";get-content test.txt -ReadCount 0 %{$text +=$_}


    http://twitter.com/toenuff
    write-host ((0..56)|%{if (($_+1)%3 -eq 0){[char][int]("116111101110117102102064103109097105108046099111109"[($_-2)..$_] -join "")}}) -separator ""
    Wednesday, October 27, 2010 2:13 PM
  •  

    [string](gc test.txt)

    or

    gc test.txt | join ""

     


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "
    Wednesday, October 27, 2010 2:23 PM
  • Try this it should perform fatser and get all content as string:


    [io.file]::ReadAllText($path)


    Shay Levy [MVP]
    PowerShay.com
    PowerShell Toolbar
    Wednesday, October 27, 2010 2:53 PM
  • Thanks Shay.  Both that and readtoend() work, but the new problem is they die on large files.  I'm in the process of writing my own function that takes a filestream and reads it in 10 MB chunks into a string.

    Still, for a quick utility the System.IO.File functions will do the trick and will probably be the one I turn to going forward.


    http://twitter.com/toenuff
    write-host ((0..56)|%{if (($_+1)%3 -eq 0){[char][int]("116111101110117102102064103109097105108046099111109"[($_-2)..$_] -join "")}}) -separator ""
    Wednesday, October 27, 2010 5:58 PM
  • Would love to see that function when you are done.

    Justin

    Thursday, October 28, 2010 12:47 PM