Pivot operations in are a very handy feature in Power Query but they can slow down refresh performance. So with some bittersweet pleasure I can tell that I found a trick to speed them up. The sweetness comes from the fact that the performance improvement is very significant. But the bitterness comes from the fact that I could have used this for almost 4 years now, but was too blind to realize at the time when I first worked with the code.
Trick to speed up a slow pivot table
This might not work everywhere, but for my tests, it worked really well: Don’t use an aggregation function when you want fast pivoting:
But if your data isn’t aggregated on the row- & column values already, you’ll get this error message:
So to make this work, you have to aggregate the values on the axis’ values before.
Let’s walk through the steps:
Start is this table:
The pivot shall bring the values from the “Column”-column into the column-area and sum the values from column AB like so:
If I pivot without an aggregation like mentioned above I will get the dreaded error-message like above, because there are multiple rows for each Row- and Column-combination:
The step to success is a grouping operation beforehand:
This returns a table with unique row- & column – combinations:
9 rows for a desired 3×3-matrix looks just about right. So if I pivot here, there will be no further aggregation needed and the desired result will be shown.
Who found it?
This article is almost 4 years old, and I’ve even played around with the code at that time. Sight.. 4 years wasted time where I didn’t realize that the key for the performance improvement lied in a technique that would significantly improve the refresh speed of so many other applications as well…
Why does it work?
Here is my guess:
It looks as if the group operation creates some primary keys that create partitions for every row (or even every cell?) of the pivot table to be. I tested this guess by adding a primary key on those 2 columns (instead of grouping) and the refresh time sped up just like with the group operation. So if your data is already aggregated to the right level you can just add a key (or remove duplicates – as long as you don’t loose any rows), no need to do the group.
This means that the pivot operation doesn’t have to work on the full table, but just on the partitioned parts. (In this article I have described the performance improvements through partitions the first time).
But this also reminds me of the performance improvement for aggregations after joins, that I’ve blogged about here. Let’s see if there will be more use cases to be found.
Enjoy and stay queryious 😉