none
Very large text that needs to be re ordered.. RRS feed

  • Question

  • Hi all, I am currently working with a MASSIVE text file, it is 944mb and has 2 million or so rows. The data in the file is in 2 columns 

    Column1: title

    Column2: data

    Column1 contains the following fields one after another, Name, Address, Address1, City, State, Zip 

    Column 2 contains the relevant information based on the above data. 

    I want my output text file to have 6 columns with the headers stated in column 1 and underneath each header the relevant information thats in column 2. I dont really know where to begin with this so any help would be great...

    Friday, January 9, 2015 4:08 PM

Answers

  • Try this:

    $InputFile = 'c:\somedir\inputfile.txt'

    $OutputFile = 'c:\somedir\outputfile.csv' $LineNumber = 0 Get-Content $InputFile -ReadCount 1000 | foreach { foreach ($line in $_) { $LineNumber++ $line = $line.split("`t") if ($line[0] -eq 'Name') { $obj $obj = [PSCustomObject]@{ name = $line[1] address = $Null address1 = $Null city = $Null state = $Null zip = $Null phone = $Null fax = $Null email = $Null } } else { Try{ $obj.$($line[0]) = $line[1] } Catch {Write-Warning " Error parsing line number $LineNumber - $Line"} }

    } } | Export-CSV $OutputFile



    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "







    Friday, January 9, 2015 5:29 PM
    Moderator

All replies

  • 


    Friday, January 9, 2015 4:13 PM
  • As in my screenshot above, columns A and B are currently how the data is stored on the text file. The other issue is that for some people we have more information than others so as you see in the above screen shot i have highlighted the cell labeled phone because that person has a phone number on file. There will be others with more information such as fax number, email etc.... so i will need a new column/header created PER new occurrence based on colA.
    Friday, January 9, 2015 4:15 PM
  • Can you post a few rows of sample data?  This is going to have to be parsed, and the data format will dictate what parsing methods will be appropriate.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Friday, January 9, 2015 4:15 PM
    Moderator
  • https://www.dropbox.com/s/5070v3z07n34if3/sample.txt?dl=0

    here you go.. you'll notice in there some people have email, and others done so basically the start of a new person data will ALWAYS begin with the field 'name'

    Friday, January 9, 2015 4:28 PM
  • Can you post it with the question?  Lots of us are behind corp firewalls that don't allow access to file sharing apps.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Friday, January 9, 2015 4:35 PM
    Moderator
  • name	********
    address	********
    address1********
    city	********
    state	********
    zip	********
    phone	********
    name	********
    address	********
    address1********
    city	********
    state	********
    zip	********
    phone	********
    fax	********
    name	********
    address	********
    address1********
    city	********
    state	********
    zip	********
    name	********
    address	********
    address1********
    city	********
    state	********
    zip	********
    phone	********
    email	********
    name	********
    address	********
    address1********
    city	********
    state	********
    zip	********
    phone	********
    name	********
    address	********
    address1********
    city	********
    state	********
    zip	********
    phone	********
    fax	********
    email	********
    name	********
    address	********
    address1********
    city	********
    state	********
    zip	********
    name	********
    address	********
    address1********
    city	********
    state	********
    zip	********
    phone	********
    email	********
    Here you go, note that the columns are TAB delimited
    Friday, January 9, 2015 4:49 PM
  • Where is the data coming from?

    Wouldn't it be easier to configure that application to output in CSV format instead of munging it after the fact?


    -- Bill Stewart [Bill_Stewart]

    Friday, January 9, 2015 5:17 PM
    Moderator
  • The data is coming from a 3rd party and they have been a nightmare to work with, this is unfortunately the best they can do...
    Friday, January 9, 2015 5:22 PM
  • Keep in mind that this isn't a "someone please design a custom data parsing script for me" forum.

    What does your script look like so far?


    -- Bill Stewart [Bill_Stewart]

    Friday, January 9, 2015 5:28 PM
    Moderator
  • Try this:

    $InputFile = 'c:\somedir\inputfile.txt'

    $OutputFile = 'c:\somedir\outputfile.csv' $LineNumber = 0 Get-Content $InputFile -ReadCount 1000 | foreach { foreach ($line in $_) { $LineNumber++ $line = $line.split("`t") if ($line[0] -eq 'Name') { $obj $obj = [PSCustomObject]@{ name = $line[1] address = $Null address1 = $Null city = $Null state = $Null zip = $Null phone = $Null fax = $Null email = $Null } } else { Try{ $obj.$($line[0]) = $line[1] } Catch {Write-Warning " Error parsing line number $LineNumber - $Line"} }

    } } | Export-CSV $OutputFile



    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "







    Friday, January 9, 2015 5:29 PM
    Moderator
  • mjolinor, thank you. I tried the code however i need to save transposition of the data to a new file. I tried 

        Set-Content c:\temp.txt

    and it isn't working, it asks to "Supply values for the following parameters:" Value[0]

    Friday, January 9, 2015 5:58 PM
  • That's creating objects.  You'll want to use Export-CSV, not Set-Content.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Friday, January 9, 2015 6:04 PM
    Moderator
  • Sorry for the dumb q question but where do i use the command? i tried putting it right after the IF statement and it keeps asking to Supply values for the following parameters:
    InputObject:

    UPDATE: 

       else { 
              Try{
                  $obj.$($line[0]) = $line[1]
                  export-csv -InputObject $obj c:\test.txt
    
                 }
              Catch {Write-Warning " Error parsing line number $LineNumber - $Line"}
            }
    
      }
    }
    That will only print the headers and 1 row of information in the CSV, what am i doing wrong?


    Friday, January 9, 2015 6:10 PM
  • See updated answer.



    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Friday, January 9, 2015 6:16 PM
    Moderator
  • Sir, if you drink then I owe you a beer.

    Cheers bud...Thanks  a lot!!!

    Friday, January 9, 2015 6:22 PM
  • Welcome!

    Powershell 1

    Stupid Vendor 0


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Friday, January 9, 2015 6:27 PM
    Moderator