none
Power Query Merge queries using contains RRS feed

Answers

  • I would try creating 2 Queries, one from "Ref Path" and one from "Data Table", adding a "Dummy Merge Key" column (e.g. static value of "= 1") to both queries.

    Then I would add a Merge step to the "Ref Path" query to Merge with "Data Table" on the "Dummy Merge Key".  Then expand the "Data Table" columns.  At this point you have a massive "cross join" result of every combination (e.g. 90 x 600,000 rows).  Don't panic (but do be concerned).

    Next I would add a Calculated column (e.g. "FilePath Matched") using the Text.Contains function to compare the Ref Path with the FilePath column.  This will return TRUE or FALSE.

    Next I would filter on that "FilePath Matched" column to only keep the TRUE rows.

    Finally I would Group By Ref Path, and Sum FileSize.

    The advantage of this approach is that the subpaths can vary in depth. You can also add to the "FilePath Matched" logic if other requirements come up.

    I've done some operations at similar scale successfully - PQ seems quite good at chugging through the rows without completely choking.  Obviously you would build/test on a subset of "Data Table".

    Thursday, October 29, 2015 12:52 AM

All replies

  • Hi Alex,

    you can use the RefTable as a Lookup table provided you split the column in your data table so that the filepath is extracted accorrdingly. This can even be done using the UI: Split Column By Delimiter - at the right-most delimiter ("\"). An InnerJoin will make it act as a filter:

    let
        Source = Excel.CurrentWorkbook(){[Name="Tabelle2"]}[Content],
        #"Split Column by Delimiter" = Table.SplitColumn(Source,"Spalte1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Spalte1.1", "Spalte1.2"}),
        #"Split Column by Delimiter1" = Table.SplitColumn(#"Split Column by Delimiter","Spalte1.1",Splitter.SplitTextByEachDelimiter({"\"}, QuoteStyle.Csv, true),{"Spalte1.1.1", "Spalte1.1.2"}),
        #"Merged Queries" = Table.NestedJoin(#"Split Column by Delimiter1",{"Spalte1.1.1"},RefTable,{"Spalte1"},"NewColumn",JoinKind.Inner),
        #"Changed Type" = Table.TransformColumnTypes(#"Merged Queries",{{"Spalte1.2", type number}}),
        #"Grouped Rows" = Table.Group(#"Changed Type", {"Spalte1.1.1"}, {{"Sum", each List.Sum([Spalte1.2]), type number}})
    in
        #"Grouped Rows"

    LinkToFile


    Imke Feldmann TheBIccountant.com


    Wednesday, October 28, 2015 9:46 PM
    Moderator
  • I would try creating 2 Queries, one from "Ref Path" and one from "Data Table", adding a "Dummy Merge Key" column (e.g. static value of "= 1") to both queries.

    Then I would add a Merge step to the "Ref Path" query to Merge with "Data Table" on the "Dummy Merge Key".  Then expand the "Data Table" columns.  At this point you have a massive "cross join" result of every combination (e.g. 90 x 600,000 rows).  Don't panic (but do be concerned).

    Next I would add a Calculated column (e.g. "FilePath Matched") using the Text.Contains function to compare the Ref Path with the FilePath column.  This will return TRUE or FALSE.

    Next I would filter on that "FilePath Matched" column to only keep the TRUE rows.

    Finally I would Group By Ref Path, and Sum FileSize.

    The advantage of this approach is that the subpaths can vary in depth. You can also add to the "FilePath Matched" logic if other requirements come up.

    I've done some operations at similar scale successfully - PQ seems quite good at chugging through the rows without completely choking.  Obviously you would build/test on a subset of "Data Table".

    Thursday, October 29, 2015 12:52 AM
  • Hi Mike and Imke,

    Really appreciate the suggestions and examples.  I realised that the data table paths are of all different lengthsin terms of number of folders, so I went with Mikes solution but will also test Imke's version as well.  It looks like a really versatile solution for diff scenarios.

    Thanks again

    Alex

     

    Thursday, October 29, 2015 10:25 AM