Performance tip to speed up slow pivot operations in Power Query and Power BI

Pivot operations in are a very handy feature in Power Query but they can slow down refresh performance. So with some bittersweet pleasure I can tell that I found a trick to speed them up. The sweetness comes from the fact that the performance improvement is very significant. But the bitterness comes from the fact that I could have used this for almost 4 years now, but was too blind to realize at the time when I first worked with the code.

Trick to speed up a slow pivot table

This might not work everywhere, but for my tests, it worked really well: Don’t use an aggregation function when you want fast pivoting:

Don’t aggregate when you want a fast pivot in Power Query

But if your data isn’t aggregated on the row- & column values already, you’ll get this error message:

Error when the values are not adequately aggregated

So to make this work, you have to aggregate the values on the axis’ values before.

Let’s walk through the steps:

Walkthrough

Start is this table:

Start

The pivot shall bring the values from the “Column”-column into the column-area and sum the values from column AB like so:

Result

If I pivot without an aggregation like mentioned above I will get the dreaded error-message like above, because there are multiple rows for each Row- and Column-combination:

Values are not aggregated on the row- and column axis

The step to success is a grouping operation beforehand:

Group on all columns that shall define the row- & column values of the pivot

This returns a table with unique row- & column – combinations:

Aggregated table with just as many rows as the number of fields in the desired pivot table

9 rows for a desired 3×3-matrix looks just about right. So if I pivot here, there will be no further aggregation needed and the desired result will be shown.

Who found it?

Genius Bill Szysz used this method to speed up a slow pivot in his code to speed up matrix multiplication:

Code from Bill Szysz for a fast matrix multiplication (posted by DataChant)

This article is almost 4 years old, and I’ve even played around with the code at that time. Sight.. 4 years wasted time where I didn’t realize that the key for the performance improvement lied in a technique that would significantly improve the refresh speed of so many other applications as well…

Why does it work?

Here is my guess:

It looks as if the group operation creates some primary keys that create partitions for every row (or even every cell?) of the pivot table to be. I tested this guess by adding a primary key on those 2 columns (instead of grouping) and the refresh time sped up just like with the group operation. So if your data is already aggregated to the right level you can just add a key (or remove duplicates – as long as you don’t loose any rows), no need to do the group.

This means that the pivot operation doesn’t have to work on the full table, but just on the partitioned parts. (In this article I have described the performance improvements through partitions the first time).

But this also reminds me of the performance improvement for aggregations after joins, that I’ve blogged about here. Let’s see if there will be more use cases to be found.

Enjoy and stay queryious 😉

Comments (6) Write a comment

- Mike Honey
- 18.05.20 @ 1:46 AM
Lovely explanation – thanks Imke. On reflection I’ve also seen the same effect many times – slow performance for some pivots, fast for others. Now it makes more sense.

Do you think the pre-Group By is worthwhile (for performance and more robust code), even if it’s necessary to avoid that error?

- - Imke Feldmann
  - 18.05.20 @ 7:19 AM
  Yes, that might be. I’d definitely prefer it to removing duplicates.
  
- Frank Tonsen
- 18.05.20 @ 12:05 PM
Hi Imke,

can you provide an example file having such a performance difference?

I could not reproduce this behaviour. For me pivoting with aggregation tends to be faster than pre-grouping and then pivoting without aggregation (Version: 2.80.5803.1061 64-bit April 2020).

- - Imke Feldmann
  - 18.05.20 @ 12:31 PM
  Hm, that’s interesting. Could you please share a sample?
  
  This is one of my tests where the query without the grouping is significantly slower:
  
  let Source = {"A".."Z"}, #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error), #"Added Custom" = Table.AddColumn(#"Converted to Table", "Number1", each {1..1000}), #"Expanded Number1" = Table.ExpandListColumn(#"Added Custom", "Number1"), #"Added Custom1" = Table.AddColumn(#"Expanded Number1", "Number2", each {1..100}), #"Expanded Number2" = Table.ExpandListColumn(#"Added Custom1", "Number2"), #"Inserted Addition" = Table.AddColumn(#"Expanded Number2", "Addition", each [Number1] + [Number2], type number), #"Removed Columns" = Table.RemoveColumns(#"Inserted Addition",{"Number2"}), #"Grouped Rows" = Table.Group(#"Removed Columns", {"Column1", "Number1"}, {{"Addition", each List.Sum([Addition]), type number}}), #"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Column1]), "Column1", "Addition", List.Sum) in #"Pivoted Column"
  
  - Frank Tonsen
    
    19.05.20 @ 12:40 PM
    
    Thanks, for providing the sample code.
    
    I was lazy and used different Excel files (each about 200.000 rows) as data source.
    After reconstructing the data within Power BI I can confirm your findings.
    That said, (as usual) it depends on the data source.
    
    - Imke Feldmann
      
      19.05.20 @ 1:51 PM
      
      Thanks for confirming and the hint re data sources, Frank!
      /Imke

New Generation Finance – Accounting – Controlling using Microsoft BI stack

Performance tip to speed up slow pivot operations in Power Query and Power BI

Trick to speed up a slow pivot table

Walkthrough

Who found it?

Why does it work?

Like this:

Comments (6) Write a comment

Leave a ReplyCancel reply

Trick to speed up a slow pivot table

Walkthrough

Who found it?

Why does it work?

Share this:

Like this:

Comments (6) Write a comment

Leave a ReplyCancel reply