locked
How to filter results by format? RRS feed

  • Question

  • I'm new to Power Query and am doing an analysis project which requires a large number of values from many, multi-tabbed workbooks. The data I want is in outline format but there is a LOT of other material surrounding it.  I quickly figured out how to get all the tabs sucked in to my query and easily deleted the extraneous columns.  My problem is finding the rows that are of interest.  The outline number is in a consistent column across all tabs (e.g., 1.1, 1.1.1, 1.1.2 and so forth).  I want only the 3rd-level outline entries (e.g., "2.3.4").  In my unix days, regular-expression processing made this sort of filtering a snap (I want to take all the query results where column D has the format "[1-9]\.[1-9]\.[1-9]"). Ideally, I would like only PART of the outline ([1-8]\.[1-7]\.[1-4]) but this subset is close enough to the total level-3 entries that I can work with grabbing everything. I can't seem to find any way to do this in Power Query (or, to be honest, in much of ANY so-called "modern" tools...(and yes, I'm a bit of a geezer)).  

    THe only thing I can think of is to build a very messy compound set of filters (something like "length=5 and substr(2,1)="." and substr(4,1)="." and isnum(substr(1,1)), etc.).  but is there an easier way to do this?

    FWIW, my only options for doing this (imposed by my organization) are to do this in Excel or Access in a Windows environment.

    I'm very grateful for any guidance or suggestions.

    Wednesday, December 5, 2018 6:11 PM

Answers

  • Hi David,

    Unfortunately, Power Query does not support regex type matching, which is one of the most egregious omissions in the M language.

    I've created many custom functions to partially manage this gross oversight - one being a function that matches a fixed-length text pattern that I call Text_MatchesPattern:

    (text as text, matchPattern as list) =>
    let
       textList = Text.ToList(text),
       matchedPattern = if List.Count(textList) <> List.Count(matchPattern) then {false}
                        else List.Transform(List.Positions(textList), (current)=> List.Contains(matchPattern{current}, textList{current})),
       allTrue = List.AllTrue(matchedPattern)
    in
       allTrue

    You can then apply this function as follows (using your sample table as an example):

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        ChangedType = Table.TransformColumnTypes(Source,{{"Row#", Int64.Type}, {"Column D", type text}}),
        SelectedRows = Table.SelectRows(ChangedType, each Text_MatchesPattern([Column D], {{"1".."8"}, {"."}, {"1".."7"}, {"."}, {"1".."4"}}))
    in
        SelectedRows


    Thursday, December 6, 2018 4:11 PM

All replies

  • "The outline number is in a consistent column across all tabs (e.g., 1.1, 1.1.1, 1.1.2 and so forth)."

    Hi David, I'm not sure that I understand the above statement. In Power Query, are all of the outline numbers in one column, and you want to filter the values that represent the 3rd level?


    Thursday, December 6, 2018 1:01 AM
  • Pretty much, Colin.  The outline numbers are consistently in the same column (although there's additional text above and below the outline section of each sheet) and I want only those rows that are on the 3rd level of the outline (I should also mention that the sheets are NOT set up as a formal Excel outline...they simply have the strings  1.1, 1.1.2, etc. in Column D) so I basically want to skip over rows that have no outline number or that have a level 1 or level 2 number.  For instance

    Row# ... Column D

    1            name of sheet

    2            explanation

    3            1.

    4             1.1

    5.            1.1.1

    6.            1.1.2

    7.           supplemental information

    8.          signature line

    so, in the above, I'd want to select rows 5 & 6.  The actual spreadsheets are quite a bit more involved but they all have the consistency that any row which is part of the outline has a string representing the outline level in Column D

    Thursday, December 6, 2018 2:45 PM
  • Hi David,

    Unfortunately, Power Query does not support regex type matching, which is one of the most egregious omissions in the M language.

    I've created many custom functions to partially manage this gross oversight - one being a function that matches a fixed-length text pattern that I call Text_MatchesPattern:

    (text as text, matchPattern as list) =>
    let
       textList = Text.ToList(text),
       matchedPattern = if List.Count(textList) <> List.Count(matchPattern) then {false}
                        else List.Transform(List.Positions(textList), (current)=> List.Contains(matchPattern{current}, textList{current})),
       allTrue = List.AllTrue(matchedPattern)
    in
       allTrue

    You can then apply this function as follows (using your sample table as an example):

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        ChangedType = Table.TransformColumnTypes(Source,{{"Row#", Int64.Type}, {"Column D", type text}}),
        SelectedRows = Table.SelectRows(ChangedType, each Text_MatchesPattern([Column D], {{"1".."8"}, {"."}, {"1".."7"}, {"."}, {"1".."4"}}))
    in
        SelectedRows


    Thursday, December 6, 2018 4:11 PM
  • Wow, thank you so much for sharing that, Colin!  I'll give it a try.
    Friday, December 7, 2018 12:40 PM