Performance tip to speed up slow pivot operations in Power Query and Power BI

Pivot operations in are a very handy feature in  Power Query but they can slow down refresh performance. So with some bittersweet pleasure I can tell that I found a trick to speed them up. The sweetness comes from the fact that the performance improvement is very significant. But the bitterness comes from the fact that I could have used this for almost 4 years now, but was too blind to realize at the time when I first worked with the code.

Trick to speed up a slow pivot table

This might not work everywhere, but for my tests, it worked really well: Don’t use an aggregation function when you want fast pivoting:

slow pivot

Don’t aggregate when you want a fast pivot in Power Query

But if your data isn’t aggregated on the row- & column values already, you’ll get this error message:

Error when the values are not adequately aggregated

So to make this work, you have to aggregate the values on the axis’ values before.

Let’s walk through the steps:

Walkthrough

Start is this table:

Start

The pivot shall bring the values from the “Column”-column into the column-area and sum the values from column AB like so:

Result

If I pivot without an aggregation like mentioned above I will get the dreaded error-message like above, because there are multiple rows for each Row- and Column-combination:

slow pivot power query performance

Values are not aggregated on the row- and column axis

The step to success is a grouping operation beforehand:

slow pivot power query performance

Group on all columns that shall define the row- & column values of the pivot

This returns a table with unique row- & column – combinations:

Aggregated table with just as many rows as the number of fields in the desired pivot table

9 rows for a desired 3×3-matrix looks just about right. So if I pivot here, there will be no further aggregation needed and the desired result will be shown.

Who found it?

Genius Bill Szysz used this method to speed up a slow pivot in his code to speed up matrix multiplication:

slow pivot power query performance

Code from Bill Szysz for a fast matrix multiplication (posted by DataChant)

This article is almost 4 years old, and I’ve even played around with the code at that time. Sight.. 4 years wasted time where I didn’t realize that the key for the performance improvement lied in a technique that would significantly improve the refresh speed of so many other applications as well…

Why does it work?

Here is my guess:

It looks as if the group operation creates some primary keys that create partitions for every row (or even every cell?) of the pivot table to be. I tested this guess by adding a primary key on those 2 columns (instead of grouping) and the refresh time sped up just like with the group operation. So if your data is already aggregated to the right level you can just add a key (or remove duplicates – as long as you don’t loose any rows), no need to do the group.

This means that the pivot operation doesn’t have to work on the full table, but just on the partitioned parts. (In this article I have described the performance improvements through partitions the first time).

But this also reminds me of the performance improvement for aggregations after joins, that I’ve blogged about here. Let’s see if there will be more use cases to be found.

Enjoy and stay queryious 😉

Comments (6) Write a comment

  1. Lovely explanation – thanks Imke. On reflection I’ve also seen the same effect many times – slow performance for some pivots, fast for others. Now it makes more sense.

    Do you think the pre-Group By is worthwhile (for performance and more robust code), even if it’s necessary to avoid that error?

    Reply

  2. Hi Imke,

    can you provide an example file having such a performance difference?

    I could not reproduce this behaviour. For me pivoting with aggregation tends to be faster than pre-grouping and then pivoting without aggregation (Version: 2.80.5803.1061 64-bit April 2020).

    Reply

    • Hm, that’s interesting. Could you please share a sample?

      This is one of my tests where the query without the grouping is significantly slower:

      let
      Source = {"A".."Z"},
      #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
      #"Added Custom" = Table.AddColumn(#"Converted to Table", "Number1", each {1..1000}),
      #"Expanded Number1" = Table.ExpandListColumn(#"Added Custom", "Number1"),
      #"Added Custom1" = Table.AddColumn(#"Expanded Number1", "Number2", each {1..100}),
      #"Expanded Number2" = Table.ExpandListColumn(#"Added Custom1", "Number2"),
      #"Inserted Addition" = Table.AddColumn(#"Expanded Number2", "Addition", each [Number1] + [Number2], type number),
      #"Removed Columns" = Table.RemoveColumns(#"Inserted Addition",{"Number2"}),
      #"Grouped Rows" = Table.Group(#"Removed Columns", {"Column1", "Number1"}, {{"Addition", each List.Sum([Addition]), type number}}),
      #"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Column1]), "Column1", "Addition", List.Sum)
      in
      #"Pivoted Column"

      Reply

      • Thanks, for providing the sample code.

        I was lazy and used different Excel files (each about 200.000 rows) as data source.
        After reconstructing the data within Power BI I can confirm your findings.
        That said, (as usual) it depends on the data source.

        Reply

Leave a Reply