Answered by:
Readcount 0 not working?

Question
-
I've recently been playing with large sets of text data. While doing this my first inclination to get a text file into a single string was to do
get-content -readcount 0 test.txt
For some reason that wasn't working for me. I didn't think anything of it b/c I quickly did something like this:
$text = "";get-content test.txt %{$text +=$_}
Tonight I decided to play with it a bit more. I believe it's definitely not working. If I do a readcount of the number of lines minus 1 I get two strings in my array: The first is all of the lines except the last line - the second is the last line. However if I set it to 0 or to the number of lines in the file I get each line as an element of the collection:
23:50:47 PS C:\Dropbox\My Dropbox\scripts> $t = Get-Content .\dict.txt -ReadCount 0;$t.count 234726 ____________________________________________________________________________________________________________________________________________________________________________________ 23:54:37 PS C:\Dropbox\My Dropbox\scripts> $t = Get-Content .\dict.txt -ReadCount 234726;$t.count 234726 ____________________________________________________________________________________________________________________________________________________________________________________ 23:54:46 PS C:\Dropbox\My Dropbox\scripts> $t = Get-Content .\dict.txt -ReadCount 234725;$t.count 2
What's the story here? According to the parameter info:
-ReadCount <Int64>
Specifies how many lines of content are sent through the pipeline at a time. The default value is 1. A value of 0 (zero) sends all of the content at one time.
http://twitter.com/toenuff
write-host ((0..56)|%{if (($_+1)%3 -eq 0){[char][int]("116111101110117102102064103109097105108046099111109"[($_-2)..$_] -join "")}}) -separator ""Wednesday, October 27, 2010 3:57 AM
Answers
-
I think it's the typecasting. When you do gc with a -readcount you get back an array of arrays (do a get-type on one of the returned elements). When you do -readcount 0, it reads all of it in at once, so it's only going to return a single object (an array).
It's the equivalent of doing @(@(1,2,3)). The end result is an array of 3 intergers, not a single element array of an array of 3 intergers.
Hope that makes sense.
[string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "- Marked as answer by Tome Tanasovski Wednesday, October 27, 2010 2:13 PM
Wednesday, October 27, 2010 12:59 PM
All replies
-
So, I tested on a 7 line file (seemed easier ;).
$x = get-content -readcount <val> .\test.txt ; $x.count
Results:
<val> = 0, $x.count = 7
<val> = 1, $x.count = 7
<val> = 2, $x.count = 4
<val> = 3, $x.count = 3
<val> = 4, $x.count = 2
<val> = 5, $x.count = 2
<val> = 6, $x.count = 2
<val> = 7, $x.count = 7
<val> = 8, $x.count = 7So, it appears that it's 'grouping' the lines with the case of 0 and NumberOfLinesInFile (7 in this case) grouping by 1.
Is this correct behavior? Hard to tell with very little documentation :O
GregMWednesday, October 27, 2010 12:37 PM -
I think it's the typecasting. When you do gc with a -readcount you get back an array of arrays (do a get-type on one of the returned elements). When you do -readcount 0, it reads all of it in at once, so it's only going to return a single object (an array).
It's the equivalent of doing @(@(1,2,3)). The end result is an array of 3 intergers, not a single element array of an array of 3 intergers.
Hope that makes sense.
[string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "- Marked as answer by Tome Tanasovski Wednesday, October 27, 2010 2:13 PM
Wednesday, October 27, 2010 12:59 PM -
Of course! Bah! thx.
I could get it as a single object like this:
$file = @(gc t.txt -readcount 0); $file.count
But, that's not going to work for my purposes. I want to read the whole thing and have it be a single string so I can do multiline regexes. I can do it with .Net:
$text = [System.IO.File]::OpenText("C:\test.csv").ReadToEnd()
I guess that can do when I need to do it. Unless anyone can think of a better way to do it via PowerShell that will not be line by line. In other words, something that isn't: $text = "";get-content test.txt -ReadCount 0 %{$text +=$_}
http://twitter.com/toenuff
write-host ((0..56)|%{if (($_+1)%3 -eq 0){[char][int]("116111101110117102102064103109097105108046099111109"[($_-2)..$_] -join "")}}) -separator ""Wednesday, October 27, 2010 2:13 PM -
[string](gc test.txt)
or
gc test.txt | join ""
[string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "Wednesday, October 27, 2010 2:23 PM -
Try this it should perform fatser and get all content as string:
[io.file]::ReadAllText($path)
Shay Levy [MVP]
PowerShay.com
PowerShell ToolbarWednesday, October 27, 2010 2:53 PM -
Thanks Shay. Both that and readtoend() work, but the new problem is they die on large files. I'm in the process of writing my own function that takes a filestream and reads it in 10 MB chunks into a string.
Still, for a quick utility the System.IO.File functions will do the trick and will probably be the one I turn to going forward.
http://twitter.com/toenuff
write-host ((0..56)|%{if (($_+1)%3 -eq 0){[char][int]("116111101110117102102064103109097105108046099111109"[($_-2)..$_] -join "")}}) -separator ""Wednesday, October 27, 2010 5:58 PM -