none
Power Query - Insert Index by subcategory ie clustered index RRS feed

  • Question

  • Looking for some help re Power Query "M" code that would insert a 'clustered index' or index by subcategory - sample below


    aabc


    • Edited by drewbbc Thursday, September 4, 2014 4:32 PM
    Thursday, September 4, 2014 4:22 PM

Answers

  • ClusteredIndex is a separate Query. Start with a blank query, in the Advanced Editor, paste in the code I provided for this function, and name the query ClusteredIndex.

    Next create another query from your Excel table. Then, in the Power Query Query Editor, select Add Column-->Insert Custom Column. In the Insert Custom Column dialog box, type a name for the custom column (I called it Index). Type the custom column formula as:

    ClusteredIndex(Source,"Name",[Name])

    The new Index column displays "List" in each row. On the Index column header, click the icon (two arrows) on the right to expand the lists . After expansion, you will have many duplicate rows. Select all of the columns and select Home-->Remove Duplicates. You should now have the results that you're after. All of these actions in the UI, starting from creating a query from your Excel table, generates the second script I provided (no need to paste into the advanced editor).

    Friday, September 5, 2014 2:53 AM

All replies

  • One approach would be to create a custom "ClusteredIndex" function and add as a custom column in the table e.g.

    ClusteredIndex function:

    (table as table, groupColumnName as text, groupNameValue as any)=>
    let
        GroupNameList = Table.Column(table,groupColumnName),
        PositionList = List.Positions(List.Select(GroupNameList, each _=groupNameValue))
    in
        PositionList

    Your table script would be (assuming an Excel table name of Table1):

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        #"Inserted Custom" = Table.AddColumn(Source, "Index", each ClusteredIndex(Source,"Name",[Name])),
        #"Expand ClusteredIndex" = Table.ExpandListColumn(#"Inserted Custom", "Index"),
        #"Removed Duplicates" = Table.Distinct(#"Expand ClusteredIndex")
    in
        #"Removed Duplicates"

    Note that the entire table script was generated through the UI.
    • Edited by Colin Banfield Thursday, September 4, 2014 8:45 PM Added detail
    Thursday, September 4, 2014 8:35 PM
  • Colin -- appreciate the function code -- for some reason getting a "Token Eof Expected' error at line 12 - not obvious why - syntax is verbatim from your post and consistent with standard PQ syntax guidelines. Any thoughts? 


    aabc

    Friday, September 5, 2014 1:52 AM
  • ClusteredIndex is a separate Query. Start with a blank query, in the Advanced Editor, paste in the code I provided for this function, and name the query ClusteredIndex.

    Next create another query from your Excel table. Then, in the Power Query Query Editor, select Add Column-->Insert Custom Column. In the Insert Custom Column dialog box, type a name for the custom column (I called it Index). Type the custom column formula as:

    ClusteredIndex(Source,"Name",[Name])

    The new Index column displays "List" in each row. On the Index column header, click the icon (two arrows) on the right to expand the lists . After expansion, you will have many duplicate rows. Select all of the columns and select Home-->Remove Duplicates. You should now have the results that you're after. All of these actions in the UI, starting from creating a query from your Excel table, generates the second script I provided (no need to paste into the advanced editor).

    Friday, September 5, 2014 2:53 AM
  • Colin.  I too had a problem getting a query reduced down to just a unique set to be able to put into PowerPivot as a dimension.  In my case, my data was formatted like:

    ---------------

    Name - Active (Y/N) - Last Updated Date

    Bob - Y - 1/1/2015

    Bob - N- 2/1/2015

    Sally - Y - 1/1/2015

    Lou - N - 2/1/2015

    ----------------

    I haven't fully mastered the ClusteredIndex function above - but my plea is that you/MS might want to include a good UI for reducing a query down to a unique set.

    Perhaps the UI could prompt users to identify the columns that should drive the resultant uniqueness.  E.g. In my case.  I want a unique list of Names, where Active=Y and Last Updated Date is MAX or most recent for that person..

    -IMHO

    Eric.

    Tuesday, March 17, 2015 4:51 PM
  • Hi Eric,

    You can get the results you want through the UI. In the Power Query Editor:

    1) Filter the "Active (Y/N)" column to select "Y" only, using the arrow on the column name.

    2) Select the "Name" and "Active (Y/N)" columns (Shift+).

    3) On the Home tab, click Group By. In the Group By dialog box that appears, the selected columns should appear under "Group by."

    4) Under "New column name," enter "Last Updated Date."

    5) Under "Operation," choose Max from the drop-down list.

    6) Under "Column," choose the name of the date column for which "Max" will apply.

    7) Click OK.

    Tuesday, March 17, 2015 6:07 PM
  • All - solved it by:

    • establishing a double index -
    • then an internal merge to reference the previous row (ken puls article)
    • then some logic to parse the duplicates.

    the code looks like this

    let

        Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],

        AddedIndex = Table.AddIndexColumn(Source, "Index", 1, 1),

        AddedIndex1 = Table.AddIndexColumn(AddedIndex, "Index.1", 2, 1),

        Merge = Table.NestedJoin(AddedIndex1,{"Index"},AddedIndex1,{"Index.1"},"NewColumn"),

        ExpandNewColumn1 = Table.ExpandTableColumn(Merge, "NewColumn", {"CGI"}, {"CGI.1"}),

        SortedRows = Table.Sort(ExpandNewColumn1,{{"Index", Order.Ascending}}),

        AddedCustom = Table.AddColumn(SortedRows, "Custom", each if [CGI] = [CGI.1] then 0 else 1),

        AddedCustom1 = Table.AddColumn(AddedCustom, "Custom.1", each [Rev] * [Custom])

    in

        AddedCustom1

    ________________________________

    For more complex structures, there's another approach using using Mod/Integer functions (suggested by Chris Webb) that's simple, elegant & very cool.


    • Edited by drewbbc Tuesday, March 24, 2015 2:47 PM
    Tuesday, March 24, 2015 12:56 PM
  • I am having a really hard time understanding your solution, because my circumstances are a bit different. This is the only resource I can find related to clustered indexes, so I am in great need of your help.

    In my case, I have queried an sql database. I have no tables in my workbook. I wish to create a clustered index for a column called "Case_Number". How might I accommodate these differences?

    (I have tried altering your function with no success. Please Please Please help. This is extremely important/ urgent) Thank you in advance.

    Monday, November 23, 2015 5:10 PM
  • Hi Braden,

    Is the problem that the custom function does not return the expected results? If so, could you provide a table that includes a column with the expected index values?

    Monday, November 23, 2015 7:22 PM
  • Thanks for the ClusteredIndex function, it's very helpful. Is there a way to start the index with 1 instead of 0?
    Monday, December 21, 2015 4:15 PM
  • When you're adding the index column from the UI, click the arrow on the right of the Add Index Column button and choose 1 from the menu.

    The third parameter of the Table.AddIndexColumn function controls the start number of the index.

    Monday, December 21, 2015 5:27 PM
  • Sorry I meant using the ClusteredIndex function and starting with 1.

    After testing the function, I realised it didn't work as I needed.

    Here is the data I have :

    What I want :

    And what I get once I get rid of the duplicates :

    Thanks for the help!




    • Edited by PLRD Monday, December 21, 2015 6:20 PM
    Monday, December 21, 2015 5:43 PM
  • I just looked at the function in my example workbook, and it appears that at some point in time, I modified the original function. :) Anyway, the code for the modified Clustered Index function is:

    (ColumnToIndexList as list, ColumnToIndexCurrentValue)=>
        List.Positions(
           List.PositionOf(
               ColumnToIndexList,
               ColumnToIndexCurrentValue,
               Occurrence.All
           )
        )

    The above eliminates the need to pass in a table. You just need a list of the column to be indexed in the form table[<ColumnName>] e.g. Source[Name], and the current value of the same column i.e. [Name]

    The modification to start at 1 adds an additional step:

    (columnToIndexList as list, columnToIndexCurrentValue)=>
    let  
        ClusteredIndex0 = List.Positions(
                                        List.PositionOf(
                                            columnToIndexList,
                                            columnToIndexCurrentValue,
                                            Occurrence.All
                                       )
                                    ),
        ClusteredIndex1 = List.Transform(
                                        ClusteredIndex0,
                                        (current)=>current+1
                                     )
    in
        ClusteredIndex1

    Code has been reformatted for better readability.

    Monday, December 21, 2015 6:45 PM
  • I came up a with a technique you can build without writing any custom code - you just click around in the Query Editor window and add one very simple Custom Column.

    The key steps are:

    • Add an Index column
    • Group By the column(s) that define your groups, and add an "All Rows" column
    • Duplicate that "All Rows" column
    • Expand both the original and copy of the "All Rows" column (at this point you have a "cross-product" result of every combination of rows, within each group)
    • Add a Custom Column "Cumulative" to determine which of the combination rows to include in the Running Total, e.g. [Index] >= [#"All Rows - Copy.Index"]
    • Filter on the Custom Column "Cumulative" = TRUE
    • Group By the original columns and Count Rows

    I built a working solution which you can download from my OneDrive and try out:

    http://1drv.ms/1AzPAZp

    It's the file: Power Query demo - Running Total. 

    That example calculates a Running Sum.  To get a Running Count, just change the Grouped Rows Details step (2nd last) to Count Rows instead of Sum.

    Tuesday, December 22, 2015 3:50 AM
  • Here's an alternative to Mike's suggestion, and to my own custom function (which was specific to a single column cluster and thus a poor general solution).

    1. Add an index column - this column will be used subsequently for sorting
    2. Group by the columns for which you will create the clustered index. The grouping operation will generate a column with table values.
    3. Add an index column to each of the table values. This will be the clustered index
    4. Expand the table column, omitting the group-by columns
    5. Sort the table by the index created in step 1. This operation will return the table to it's original row order
    6. Remove the sort index column

    Using PLRD's table as an example, the scripts steps will be as follows:

    let
        Source = Excel.CurrentWorkbook(){[Name="ClusteredIndex"]}[Content],
        ChangedType = Table.TransformColumnTypes(Source,{{"Name", type text}, {"Factory", type text}, {"ReplacementOf", type text}}),
        AddedSortIndex = Table.AddIndexColumn(ChangedType, "SortIndex", 0, 1),
        GroupedRows = Table.Group(AddedSortIndex, {"Name", "Factory"}, {{"FilteredTable", each _, type table}}),
        AddedClusteredIndex = Table.TransformColumns(GroupedRows, {"FilteredTable", each Table.AddIndexColumn(_,"ClusteredIndex",1,1)}),
        ExpandedTable = Table.ExpandTableColumn(AddedClusteredIndex, "FilteredTable", {"ReplacementOf", "SortIndex", "ClusteredIndex"}),
        SortedRows = Table.Sort(ExpandedTable,{{"SortIndex", Order.Ascending}}),
        RemovedColumns = Table.RemoveColumns(SortedRows,{"SortIndex"})
    in
        RemovedColumns


    Monday, December 28, 2015 1:52 PM
  • may be you can try this alternative function:

    let
        Suoyin = (table as table, text as text) as table =>
         let AddedSortIndex  = Table.AddIndexColumn(table, text, 1, 1),
             RemovedColumns = Table.ReorderColumns(AddedSortIndex,List.Combine({{text},List.RemoveLastN(Table.ColumnNames(AddedSortIndex))}))
         in  RemovedColumns,
        Clsuoyin = (table as table, list as list, text as text) as table =>
         let AddedSortIndex = Table.AddIndexColumn(table, "Orindex", 1, 1),
             GroupedRows = Table.Group(AddedSortIndex, list, {"FilteredTable", each Suoyin(_,text)}),
             Colnames = List.Combine({{text},Table.ColumnNames(AddedSortIndex)}),
             ExpandedTable = Table.ExpandTableColumn(Table.SelectColumns(GroupedRows,"FilteredTable"),"FilteredTable",Colnames,Colnames),
             SortRemove = Table.RemoveColumns(Table.Sort(ExpandedTable,{"Orindex", text}),{"Orindex"})
         in  SortRemove
    in
        Clsuoyin

    There are several advantages for this approach:

    1. the order of the original table columns will be left unchanged;

    2. the index column will be inserted at the first column of the table, which is a more common situation;

    3. the list parameter make the function applicable for the multiple columns supposed be grouped.

    copy the code above to a blank query in the anvanced editor, name it whatever names you like, such as Clsuoyin, and the final operation code should be something like this:

    let
        Source = Excel.CurrentWorkbook(){[Name="ClusteredIndex"]}[Content],
        addsuoyin = Clsuoyin(Source, {"Name","Factory"}, "Index")
    in
        addsuoyin

    Monday, May 16, 2016 8:50 AM
  • Thanks to your idea of duplicating that AllRows, I have found an easy way to do that, but without a cross product.

    The idea here is to add a normal tempIndex, then group by category (here the name) , and then add a column with the minimum value of this tempIndex for each category to subtract this minimum to the tempIndex, and have an index starting from 0 for each category ;)

    I even add 1 to have the index starting at 1...

    let
        Source = MyTable,
        #"Filtered Rows" = Table.SelectRows(Source, ... do here your filtering ...),
        #"Sorted Rows" = Table.Sort(Source,{{"Name", Order.Ascending}, ... you may put here other criterias ...}),
        #"Added TempIndex" = Table.AddIndexColumn(#"Sorted Rows", "TempIndex", 1, 1),
        #"Grouped Rows" = Table.Group(#"Added Index", {"Name"}, {{"All Rows", each _, type table}}),
        #"Duplicated Column" = Table.DuplicateColumn(#"Grouped Rows", "All Rows", "All Rows - Copy"),
        #"Expanded All Rows" = Table.ExpandTableColumn(#"Duplicated Column", "All Rows", {"value", ... you may have other columns, "TempIndex"}, {"value", ... you may have other columns, "TempIndex"}),
        #"Aggregated All Rows - Copy" = Table.AggregateTableColumn(#"Expanded All Rows", "All Rows - Copy", {{"TempIndex", List.Min, "Min of All Rows - Copy.TempIndex"}}),
        #"Added Index" = Table.AddColumn(#"Aggregated All Rows - Copy", "Priority", each [TempIndex]-[#"Min of All Rows - Copy.Index"]+1),
        #"Removed Columns1" = Table.RemoveColumns(#"Added Priority",{"TempIndex", "Min of All Rows - Copy.TempIndex"})
    in
        #"Removed Columns1"

    Here it is done with the AdvanceEditing, but I did it only with GUI actions without pb ;)

    Add Column -> Index Column -> From 1

    Select group column

    Home -> Group By -> Operation=All Rows

    Select new column

    Add Column -> Duplicate Column

    Expand first "All Rows" keeping all columns except the group criteria

    Aggregate "All Rows - Copy" with only the index column with Minumum operation instead of SUM

    Add Column -> Custum Column

    In Custum colum formula: substract the Minimum index to the global index and that will be your index for each group (name it as you want)

    Select the temporary global quota and the Minimum quota columns

    Home -> Remove Columns


    • Proposed as answer by Tof1A Wednesday, December 6, 2017 5:13 PM
    • Edited by Tof1A Wednesday, December 6, 2017 5:30 PM
    Wednesday, December 6, 2017 5:13 PM
  • Thank you, Tof1A!

    Your solution is exactly what I was looking for.

    Sunday, January 28, 2018 9:12 AM
  • All - special thank you to Colin & Imke for suggesting the 'Table.AddIndexColumn" approach - both here and in other settings.

    Using the original table provided above - the 'List.Sum / List.Range' function - (below) is an  elegant way to compute the running total of the sub index. A new index column is needed. Note how the second argument of the Table. Range function is a combination of both index columns.

    ****************************************************************

    let
        Source = Excel.CurrentWorkbook(){[Name="Table10"]}[Content],
        GroupRows = Table.Group(Source, {"Name"}, {{"Count", each Table.AddIndexColumn(_, "SubIndex",1,1) }}),
        Expand = Table.ExpandTableColumn(GroupRows, "Count", {"Value", "SubIndex"}, {"Value", "SubIndex"}),
        AddIndex = Table.AddIndexColumn(Expand, "Index", 1, 1),
        AddCol = Table.AddColumn(AddIndex, "CumSubTotal", each List.Sum (List.Range(Expand[Value],[Index]-[SubIndex],[SubIndex]) )),
        RemCol = Table.RemoveColumns(AddCol,{"Index", "Name", "Value"})
    in
        RemCol

    ************************************************************

    Thanks - DrewBbc.


    aabc

    Thursday, February 22, 2018 2:44 PM
  • All the code for multiple sub index running totals is provided below - essentially requiring one additional Groupby statement.  A graphic is also provided.

    Thanks - Drewbbc

    let

        Source = Excel.CurrentWorkbook(){[Name="Source_Tbl"]}[Content],

        AddIndex = Table.AddIndexColumn(Source, "Index", 1, 1),

        GroupRows = Table.Group(AddIndex, {"Grp.1", "Grp.2"}, {{"Count", each Table.AddIndexColumn( (_), "SubIndex.Grp.2", 1,1) }}),

        ExpandCount = Table.ExpandTableColumn(GroupRows, "Count", {"Value", "Index", "SubIndex.Grp.2"}, {"Value", "Index", "SubIndex.Grp.2"}),

        GroupRows2 = Table.Group(ExpandCount, {"Grp.1"}, {{"Count", each Table.AddIndexColumn( (_), "SubIndex.Grp.1", 1,1) }}),

        Expand2 = Table.ExpandTableColumn(GroupRows2, "Count", {"Grp.2", "Index", "SubIndex.Grp.1", "SubIndex.Grp.2", "Value"}, {"Grp.2", "Index", "SubIndex.Grp.1", "SubIndex.Grp.2", "Value"}),

        AddCol1 = Table.AddColumn(Expand2, "Sum.SubIndex.Grp.1", each List.Sum(List.Range(Expand2[Value], [Index]-[SubIndex.Grp.1], [SubIndex.Grp.1]))),

        AddCol2 = Table.AddColumn(AddCol1, "Sum.SubIndex.Grp.2", each List.Sum(List.Range(Expand2[Value],[Index]-[SubIndex.Grp.2], [SubIndex.Grp.2])))

    in

        AddCol2

    


    aabc


    • Edited by drewbbc Friday, June 8, 2018 8:23 PM
    • Proposed as answer by Julie Koesmarno Saturday, January 5, 2019 12:00 AM
    Friday, June 8, 2018 7:49 PM
  • Colin,

    I cant seem to make this work. I added the code:

    (columnToIndexList as list, columnToIndexCurrentValue)=>
    let   
        ClusteredIndex0 = List.Positions(
                                        List.PositionOf(
                                            columnToIndexList,
                                            columnToIndexCurrentValue, 
                                            Occurrence.All
                                       )
                                    ),
        ClusteredIndex1 = List.Transform(
                                        ClusteredIndex0, 
                                        (current)=>current+1
                                     ) 
    in
        ClusteredIndex1

    in a new Query. When I call the fuction though I recive the following Error:

    Expression.Error: The Value ""(columnToIndexList a..."" can not be converted to the type "Function".

    What am I doing wrong??

    Thanks in advance


    • Edited by Dan798 Tuesday, April 9, 2019 8:31 PM
    Tuesday, April 9, 2019 8:30 PM