Guest Post: Using List.Accumulate for Input/Output Genealogy

Foreword from Imke:
Pat Mahoney is a very active fellow super user in the Power BI community and has a very strong interest in all things Power Query. He also has a YouTube-channel with interesting videos about Power BI. Here he is sharing a nifty solution for a specific kind or parent-child-challenge which he solves with the List.Accumulate-function:

The need to know which inputs are related to which outputs (and vice versa) is a common business use case, whether it’s lots used in a multi-step manufacturing process, reporting relationships in an organization, or other scenarios. As organizational relationships are usually Many:1 (employees:supervisor), in Power BI a string showing the org hierarchy can be generated using the PATH function in DAX. However, supply chains (and many other scenarios) frequently have M:M relationships and a different solution is required. In this article, an M function that leverages List.Accumulate is described that generates such relationships from an input table that has input/output columns. This is not the first function/code to do this of course (see one of Imke’s posts on this subject), but the one shown here uses a different approach, gives an option of outputs, and provides another tool for the toolbox.

The simple table below will be used to illustrate, which shows the relationships between lots in a three-step manufacturing process where lots of Type A are converted to lots of Types B and C on their way to becoming Type D material. For each OutputLot there is a row for each of its InputLots. The use of the DAX PATH function with the InputLot and OutputLot columns would result in an error (both because of the Many:Many relationship, and because the D lots are not found in the InputLot column). One way to visualize these relationships without any DAX or M is simply to use the Sankey visual as shown below. For one lot at a time, one can visually determine that A1 was used in B1, C1, and D1 and that A2 was used in all B, C, and D lots. However, to enable further analysis/visualization, a table or column that contains this information is needed.

                               

 

To find the relationships across each generation/step of this process, the table above can be merged with itself multiple times.   For example, for lot D3, merging its InputColumn values (C2 and C3) with the OutputLot column of this same table would establish that B2 and B3 were used. Doing the same again for the B values (merge that new column with B values with the original OutputLot column) would show that A2 and A3 were used on the way to make D3. This process of multiple self merges can be automated with an M function, and the List.Accumulate function can be used to do so. List.Accumulate is unique among M functions as it is the only one that offers recursion (i.e., perform multiple cycles of changing an input to an output, using the output of one cycle as the input to the next).   There are multiple good posts/videos about this function (here are two – 1, 2). A great feature of this function is that, while a list input is required to control the number of cycles to repeat (and usually to provide the input values to be used), the things being transformed can be of any type (number, date, list, record, table, etc.). A proposed function is shown and described below, which accepts 5 inputs – the table, the name of the columns in that table with the input and output values, the number of steps of the process, and the desired output format.

Here is a brief walkthrough of the M code. After renaming and selecting the input/output columns and buffering that two-column table with Table.Buffer (to greatly improve performance), the magic happens in the Output_Pivoted step. List.Accumulate has three inputs – the list to iterate over, a seed to start things out, and a function to describe the desired transformation/calculation during each cycle.   In this case, a list of numbers is generated from 2 to the number of steps provided as an input (since the input table already shows the first generation of relationship between inputs and outputs, the inputcolumn is renamed to “1” and we start at 2). The buffered two-column table of inputs/outputs is provided as the seed, and the transformation function takes the output table of the previous cycle (or the seed in the first cycle) and merge the last column with the buffered inputs/outputs table. This is repeated for the number of steps provided as input (minus 1). The rest of the code is simpler from there and generates multiple output formats from which the user can select, depending on which might work better in your model. The different outputs are selected with an integer of 1-4 provided in the “format” input to the function. I wasn’t sure the approach shown in the “in” step to select an output from a list of previous steps would work but was glad that it did. Examples of the 4 outputs are shown below.

 

(  inputtable as table,  inputcolumn as text,  outputcolumn as text,  maxsteps as number,  optional format as number ) =>
  let
    Source = inputtable,
    #"Renamed Columns" = Table.RenameColumns(Source, {{inputcolumn, "1"}, {outputcolumn, "Output"}}),
    Buffered = Table.Buffer(Table.SelectColumns(#"Renamed Columns", {"Output", "1"})),
    //Output #1 - Column for each Generation
    Output_Pivoted = Table.Distinct(
      List.Accumulate(
        {2 .. maxsteps},
        Buffered,
        (state, current) =>
          Table.ExpandTableColumn(
            Table.NestedJoin(
              state,
              {Text.From(current - 1)},
              Buffered,
              {"Output"},
              "Buffered",
              JoinKind.LeftOuter
            ),
            "Buffered",
            {"1"},
            {Text.From(current)}
          )
      )
    ),
    //Output #2 - Row for each Path combination
    AddPath = Table.AddColumn(
      Output_Pivoted,
      "Path",
      each Text.Combine(
        List.Reverse(
          Record.FieldValues(Record.SelectFields(_, List.Transform({1 .. maxsteps}, Text.From)))
        ),
        "|"
      ),
      type text
    ),
    #"Removed Other Columns" = Table.SelectColumns(AddPath, {"Output", "Path"}),
    Output_PathEachRow = Table.Distinct(#"Removed Other Columns"),
    //Output #3 - Full unpivot with Generation column
    preOutput_Unpivoted = Table.UnpivotOtherColumns(Output_Pivoted, {"Output"}, "Gen", "Input"),
    Output_Unpivoted = Table.Distinct(preOutput_Unpivoted),
    //Output #4 - Nested Path
    #"Grouped Rows" = Table.Group(
      Output_Unpivoted,
      {"Output", "Gen"},
      {"Inputs", each Text.Combine(List.Distinct([Input]), "-")}
    ),
    Output_Nested = Table.Group(
      #"Grouped Rows",
      {"Output"},
      {{"Path", each Text.Combine(List.Reverse(List.Distinct([Inputs])), "|"), type text}}
    )
  in
    if format = null or format > 4 then
      Output_Unpivoted
    else
      {Output_Pivoted, Output_PathEachRow, Output_Unpivoted, Output_Nested}{format - 1}

To use the function for your data, just copy the M code above into a blank query (replacing the default text). Once the function is created, you can invoke it with one of your table queries, providing the names of the columns with the input/output values, the max number of steps/generations, and a value from 1-4 for the desired output format. You can then load the table and add relationship(s) or merge it into one of your existing queries.

Output Formats

Based on your model, one of multiple output formats may be useful. The four available formats are shown below based on the same input table used above.

1 – A column for each generation (number of steps away)

2 – A row for each distinct path to get to the output lot with the pipe (“|”) delimiter

3 – Unpivoted version of #1 that has a generation column (number of steps away)

4 – A nested path with lots from the same step delimited by “-“ with a “|” delimiter between steps

I tend to use the Output_Nested format as all the information is contained in the least number of rows and it is similar to the DAX PATH output. With that and the SEARCH function in your DAX measure/column (to see if a given lot is contained in the nested path string), much is possible. This function of course works well for 1:M and M:1 scenarios too (e.g., org hierarchy) and doesn’t have the limitation of the PATH function where all output values have to be among the input values.

As there is a lot going on in this function, it does slow down as the input table gets large and/or if the maxsteps input gets big (>10 step process), but test it out on your data to see if it works for your use case. I tried to further improve performance with an approach using only list functions and List.Buffer, and to split the work into halves/thirds but neither approach was any faster.

Bonus Function

If you do have a simple 1:M or M:1 scenario (e.g, an org hierarchy), here is a variation on this function to simply add a column with the Path to an existing query. Just add this function to a blank query, and call it “fnAddPath”. Then, in a table query that has a single row for each output (e.g., Employee table with Employee and Supervisor columns), right click on the last step and “Insert Step After”. Then update the formula bar to wrap that previous step name with the function and input parameters.

 

(previous_step_table as table, inputcolumn as text, outputcolumn as text, maxsteps as number) =>
  let
    Source = previous_step_table,
    #"Renamed Columns" = Table.RenameColumns(
      Source,
      {{inputcolumn, "1"}, {outputcolumn, "OutputX"}}
    ),
    Buffered = Table.Buffer(Table.SelectColumns(#"Renamed Columns", {"OutputX", "1"})),
    Output_Pivoted = List.Accumulate(
      {2 .. maxsteps},
      Buffered,
      (state, current) =>
        Table.Distinct(
          Table.ExpandTableColumn(
            Table.NestedJoin(
              state,
              {Text.From(current - 1)},
              Buffered,
              {"OutputX"},
              "Buffered",
              JoinKind.LeftOuter
            ),
            "Buffered",
            {"1"},
            {Text.From(current)}
          )
        )
    ),
    AddPath = Table.AddColumn(
      Output_Pivoted,
      "Path",
      each Text.Combine(
        List.Reverse(
          Record.FieldValues(Record.SelectFields(_, List.Transform({1 .. maxsteps}, Text.From)))
        ),
        "|"
      ),
      type text
    ),
    #"Removed Other Columns" = Table.SelectColumns(AddPath, {"OutputX", "Path"}),
    Output_PathEachRow = Table.Distinct(#"Removed Other Columns"),
    Merged = Table.Join(previous_step_table, outputcolumn, Output_PathEachRow, "OutputX"),
    Result = Table.RemoveColumns(Merged, {"OutputX"})
  in
    Result

For example, the last step called #”Added Custom” of a query with Employee data is shown on the left below. If I right click on that last step in the Applied Steps window and choose “Insert Step After”, I see = #”Added Custom” in the Formula Bar. I update the FormulaBar to read = fnAddPath(#”Added Custom”, “Employee”, “Supervisor”, 4) to get the table on the right. Note that “Supervisor” is the inputcolumn in this case.   This example table has >11,000 rows and refreshes in a few seconds. Once you have the column with the “|” delimiters, you can then use PATHLENGTH, PATHITEM in your DAX columns after load. This approach doesn’t have the PATH limitation that the top person (CEO in this case) must have a Supervisor that is also present in the Employee column.

 

 fnAddPath =>

 

Thank you to Imke for all the M I’ve picked up from her articles, letting me post on this site, and for her feedback on this approach/function. The reason I posted here is that I hope this function will be useful to others and this site gets way more views than my YouTube channel (please subscribe).

Leave a Reply

Die Datenschutzbestimmungen finden Sie hier: / Please find the privacy policy here: https://wp.me/P6lgsG-Rz