none
How to import legacy text file (having two rows ) using Power Query RRS feed

  • Question

  • I want to import a legacy text file using power query. But the text file has two rows for each field.

    The first row has 111 characters and the second row has 123 characters. Also the headers are repeated in every page.

    Sample Data

    Thank you


    • Edited by PSRE Sunday, January 15, 2017 6:59 AM
    Sunday, January 15, 2017 6:58 AM

Answers

All replies

  • Hi PSRE,

    that's a very complex transformation ahead. You need to identify the pattern of this file in order to define the repeatable steps that can then be transformed into a function in order to be applied repeatedly (for every page).

    This book will take you through the necessary steps to do here: http://www.excelguru.ca/content.php?293-M-is-for-(Data)-Monkey

    edit: Actually, this might be a good start:https://www.powerpivotpro.com/2016/03/power-query-fixed-width-magic/ but recommendation for the book above remains :-)


    Imke Feldmann TheBIccountant.com


    Sunday, January 15, 2017 7:39 AM
    Moderator
  • Hi Imke Feldmann

    Thank you for suggesting the book and the link. I tried the following

     The format of the data is such that the first row has 111 characters and the second row has 123 characters. I can manage the headers later. I am herewith attaching the image.

    Could you please help me to join two rows..(1&2, 3&4 , 4&5 etc,)

    I didn't find any option to merge alternate rows in power query. Also I tried to split the original column using Special characters "carriage return and line feed"..but it merged all the rows into a single row.

    Also I Opened the text file using notepad and when ever I pressed the delete key after the 111th Character the next row returned to the first line. ( Is it Carriage return or Line feed?) Is there any other way to convert such text files before importing them to power query? How do identify the patterns of text files?

    Mcode


    let
        Source = Csv.Document(File.Contents("C:\Users\Windows-10\Desktop\Sample.out,3582"),[Delimiter=":", Columns=3, Encoding=1252, QuoteStyle=QuoteStyle.None]),
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
        #"Removed Top Rows" = Table.Skip(#"Changed Type",17),
        #"Filtered Rows" = Table.SelectRows(#"Removed Top Rows", each ([Column1] <> " " and [Column1] <> " Product  Sub Totals                    38,533.54                  0.00                0.00                0.00" and [Column1] <> " Total no. of accounts  4                                   0.00               0.00          19,662.03            18,871.51" and [Column1] <> " ____________________________________________________________________________________________________________________________________________________"))
        
    in
        #"Filtered Rows"


    • Edited by PSRE Sunday, January 15, 2017 8:49 AM
    Sunday, January 15, 2017 8:47 AM
  • There are multiple ways to do this. This 1st version here might be more approachable if you come from an Excel-Background:

    let
        Source = #table({"MyColumn"}, {{"Some Text that belongs to the 1st item"}, {"More Text for the 1st item that shall be merged to the previous row"}, {"Some Text that belongs to the 2nd item"}, {"More Text for the 2nd item that shall be merged to the previous row"}}),
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"MyColumn", type text}}),
        AddIndex = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
        CreatePairsIndex = Table.AddColumn(AddIndex, "PairsIndex", each Number.IntegerDivide([Index], 2), Int64.Type),
        CombineWithPreviousRow = Table.AddColumn(CreatePairsIndex, "Merge", each if [PairsIndex]=CreatePairsIndex[PairsIndex]{[Index]-1} then CreatePairsIndex[MyColumn]{[Index]-1}&[MyColumn] else "")
    in
        CombineWithPreviousRow

    This is the "cell-reference"-approach, which is beautifully described here: http://excel-inside.pro/blog/2015/11/05/absolute-and-relative-references-in-power-query/

    If you run into performance-problems here, this one might be faster:

    let
        Source = #table({"MyColumn"}, {{"Some Text that belongs to the 1st item"}, {"More Text for the 1st item that shall be merged to the previous row"}, {"Some Text that belongs to the 2nd item"}, {"More Text for the 2nd item that shall be merged to the previous row"}}),
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"MyColumn", type text}}),
        AddIndex = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
        CreatePairsIndex = Table.Buffer(Table.AddColumn(AddIndex, "PairsIndex", each Number.IntegerDivide([Index], 2), Int64.Type)),
        MagicGrouping = Table.Group(CreatePairsIndex, {"PairsIndex"}, {{"MagicGrouping", each Text.Combine(_[MyColumn], "")}})
    in
        MagicGrouping

    You'll find some explanation of how this works here: https://blog.crossjoin.co.uk/2015/05/11/nested-calculations-in-power-query/


    Imke Feldmann TheBIccountant.com



    Sunday, January 15, 2017 9:16 AM
    Moderator
  • Hello,

    Thank you very much for your kind help.

    I am new to M language. I don't know how to integrate  the "M" code posted by you in my example. Could you explain in simple steps

    Sunday, January 15, 2017 11:55 AM
  • Dear Imke,

    Interesting to see our 2 complete different solutions.

    (By the way: it looks like your second solution has a small mistake in the last step: "type table" should be "type text". It looks fine in the Query Editor, but it won't load into a table). 

    I wonder which one would be faster:
    yours using grouping, or
    mine in which 2 tables are created, combined (Table.FromColums(tables turned into records)) and record columns expanded.

    Warm Regards,

    Marcel


    • Edited by MarcelBeug Sunday, January 15, 2017 1:26 PM
    Sunday, January 15, 2017 1:25 PM
  • Hi Marcel,

    thx, good point - I've just deleted this argument, as it isn't necessary (it just sneaks in by default if one uses the UI for this command).

    Re performance I'd expect yours to be faster (as the grouping also costs, probably also due to some iterative process - but it will be faster than the 1st cell-reference-approach) :-)


    Imke Feldmann TheBIccountant.com

    Sunday, January 15, 2017 1:33 PM
    Moderator
  • I tested in Excel and both solutions have comparable runtimes.

    An alternative solution which is probably even much faster, is duplicating the source into 2 table columns, with the 2nd column shifted 1 row.

    See cross posted topic for details.

    Sunday, January 15, 2017 2:56 PM