locked
How to parse a custom text file (with custom separators) to a list inside a table RRS feed

  • Question

  • Hi,

    i'm trying to parse +/-50 product detail html web pages to a combined PQ table showing each product and all sub-products inside their package :

    let
        Source = Folder.Files("N:\sample\Product_Details"),
        TransformedColumn = Table.TransformColumns(Source,{{"Content", Lines.FromBinary}}),
        RemovedOtherColumns = Table.SelectColumns(TransformedColumn,{"Content", "Name"}),
        DuplicatedColumn = Table.DuplicateColumn(RemovedOtherColumns, "Content", "Copy of Content"),
        #"Expand Content1" = Table.ExpandListColumn(DuplicatedColumn, "Content"),
        FilteredRows = Table.SelectRows(#"Expand Content1", each Text.Contains([Content], "/h1")),
        #"Expand Copy of Content" = Table.ExpandListColumn(FilteredRows, "Copy of Content"),
        FilteredRows1 = Table.SelectRows(#"Expand Copy of Content", each Text.Contains([Copy of Content],">• ")),
    ---> try outs :
        TransformedColumn2 = Table.TransformColumns(FilteredRows1,{{"Copy of Content",Lines.FromText}})
         TransformedColumn3 = Table.TransformColumns(TransformedColumn2,{{{"Copy of Content",">• "},Text.splitAny}}),
         #"Expand Copy of Content1" = Table.ExpandListColumn(TransformedColumn2,{"Copy of Content",">• "})
    in

        #"Expand Copy of Content1"

    So the code here above...

    _ list all HTML Files of the folder

    _ create, for each file of the table, 1 row of data per line from inside the related HTML page

    _ filter the lines to retrieve the Product Package name of each HTML page

    _ create, for each "package entry" of the table, 1 row of data per line from inside the related HTML page

    _ filter the lines to retrieve the sub-product Package details of each HTML page (one single line without carriage return)

    ---> stuck

    So now for each "Package" entry row I've a text cell containing a list of sub-products separated by ">• " characters and I would like to convert this text to a list separated at each >• so I could afterward expand it to 1 row per sub-product (with package name as first cell of the row)

    normally Lines.FromText( sould provide the option to define a custom separator but when nested inside Table.TransformColumns( I cannot find where to put this optionnal field !

    I've search for some explanations on http://office.microsoft.com/en-us/excel-help/power-query-formula-categories-HA104122363.aspx but syntax transformation due to nesting isn't explained and samples doesn't cover much cases of usage they only cover obvious usage with no option !

    Can somebody help me on this ?



    • Edited by opesch Friday, May 9, 2014 3:29 PM
    Friday, May 9, 2014 3:25 PM

Answers

  • Oh I see. I missed the part about it being separate rows when I read your post the first time.

    How about this... first do the Table.SplitColumn operation and then use the Unpivot operation. Doing this through the UI is pretty simple, but you can go straight through the formula language if you want to. The formula is Table.UnpivotOtherColumns.

    Here's a simplified example:

    let
        Source = #table({"Column1","Column2"},{{1,"a,b,c,d"},{2,"e,f,g"},{3,"h"}}),
        SplitColumnDelimiter = Table.SplitColumn(Source,"Column2",Splitter.SplitTextByDelimiter(","),{"Column2.1", "Column2.2", "Column2.3", "Column2.4"}),
        Unpivot = Table.UnpivotOtherColumns(SplitColumnDelimiter,{"Column1"},"Attribute","Value"),
        RemovedColumns = Table.RemoveColumns(Unpivot,{"Attribute"})
    in
        RemovedColumns

    There's probably a way to do it with Lines.FromText, but I think this is a bit simpler. It can all be done with clicks in the UI.

    Friday, May 9, 2014 6:11 PM

All replies

  • Does Table.SplitColumn give you what you want?

    = Table.SplitColumn(FilteredRows1,"MySubProductsColumn",Splitter.SplitTextByDelimiter(">•"),{"MySubProductsColumn.1", "MySubProductsColumn.2"})

    If that's not helpful, let me know and we can dig into this more.

    Friday, May 9, 2014 3:51 PM
  • Thanks but no,

    This split the text into columns... not rows and as they are unconstant amout of sub-products it will be a mess event after an unpivot

    That's why I would like to transform the text to a list....

    If it is not possible to define the list separator for the Lines.FromText( , than the solution could be to replace the actual list separator string by a "carriage return" list separator that Lines.FromText( will interprets by default as a row maker !

    Do you know if its possible by a replace.string like function ?

    Friday, May 9, 2014 5:55 PM
  • Oh I see. I missed the part about it being separate rows when I read your post the first time.

    How about this... first do the Table.SplitColumn operation and then use the Unpivot operation. Doing this through the UI is pretty simple, but you can go straight through the formula language if you want to. The formula is Table.UnpivotOtherColumns.

    Here's a simplified example:

    let
        Source = #table({"Column1","Column2"},{{1,"a,b,c,d"},{2,"e,f,g"},{3,"h"}}),
        SplitColumnDelimiter = Table.SplitColumn(Source,"Column2",Splitter.SplitTextByDelimiter(","),{"Column2.1", "Column2.2", "Column2.3", "Column2.4"}),
        Unpivot = Table.UnpivotOtherColumns(SplitColumnDelimiter,{"Column1"},"Attribute","Value"),
        RemovedColumns = Table.RemoveColumns(Unpivot,{"Attribute"})
    in
        RemovedColumns

    There's probably a way to do it with Lines.FromText, but I think this is a bit simpler. It can all be done with clicks in the UI.

    Friday, May 9, 2014 6:11 PM