Pivot operations in are a very handy feature in Power Query but they can slow down refresh performance. So with some bittersweet pleasure I can tell that I found a trick to speed them up. The sweetness comes from the fact that the performance improvement is very significant. But the bitterness comes from the fact that I could have used this for almost 4 years now, but was too blind to realize at the time when I first worked with the code.
Trick to speed up a slow pivot table
This might not work everywhere, but for my tests, it worked really well: Don’t use an aggregation function when you want fast pivoting:
Don’t aggregate when you want a fast pivot in Power Query
But if your data isn’t aggregated on the row- & column values already, you’ll get this error message:
Error when the values are not adequately aggregated
So to make this work, you have to aggregate the values on the axis’ values before.
Let’s walk through the steps:
This article was edited on 2nd Nov 2019. JoinKind.Local has been removed to avoid problems with merges on tables with primary keys:
In this article you’ll learn how to speed up the aggregation of joined/merged tables by orders of magnitude (I recorded up to 30 times faster execution times). This method works for merges where both table have multiple rows for each keys. If one of your tables has a primary key, the method Chris Webb describes here works just as good: Chris Webb’s article on how to improve performance on aggregations after joins using primary keys .
You can follow along the different methods in this file:
When you join a table to another table in Power Query, the UI gives you the option to either expand the columns (default) or aggregate the contents of the joint tables. That’s useful if multiple rows are returned for the rows of the table that has been joined to (left table):
Today I want to share a scenario where a running total calculation in the query editor saved a model that run out of memory when done with DAX:
The model couldn’t be refreshed and returned out of memory error with a calculated column in the fact table of over 20 Mio rows (from a csv-file). A running total should be calculated for each “JourneyID”, of which there were over 1 Mio in the table itself. This rose memory consumption during refresh by over 300 % – until it finally errored out:
SUM ( Fact[Entries] )
– SUM ( Fact[Exits] );
ALLEXCEPT ( Fact; Fact[JourneyID] );
<= EARLIER ( Fact[StopId] )
Lately I was working on a fairly advanced allocation algorithm on large data which forced me to search for different tricks to improve performance than those that you can find on my site here already.
I was using List.Generate to check for every month in my table, if there was enough free capacity on a platform to start new wells. As every well had a certain production scheme (producing different amounts for a certain length of time), I first had to check the total production amount of active wells before I could determine the spare capacity for a new month. So I had to look into every active well, grab the capacity of the new month and add it up.
Therefore I’ve stored the active production schemes in one table in my List.Generate-record. That lead to an exponentially decreasing performance unfortunately.
Solution to improve performance of List.Generate
Buffering my tables in the “next”-function reduced the query duration by almost 70% !
Although a Table.Buffer or List.Buffer is always high on my list when it comes to performance issues, I was fairly surprised to see that behaviour here: As List.Generate returns the last element of its list as an argument for the next step, I was always assuming that this would be cached (and that was the reason because List.Generate performs recursive operations faster than the native recursion in M). Also, I had just referenced that table once ane in such a case, a buffer would normally not have come into my mind. (But desperation sometimes leads to unexpected actions …)
I also buffered a table that had just been referenced within the current record (and not recursively) and this improved performance as well. (Although in that case, the tables has been referenced multiple times within the current record). But this buffer didn’t have such a big impact on performance than the one on the table that was referenced by the recursive action.
Here is some pseudo-code illustrating the general principle:
Solution with buffers:
How to improve performance of List.Generate: Use Table.Buffer
Is that new to you or have you made the same experience? Which grades of performance improvements did you achieve with this method? Please let me know in the comments!
Enjoy & stay queryious 😉
When you import Excel sheets who have empty leading or trailing columns and rows (showing null-values), you can substantially improve the complexity and speed of your import process with a simple trick:
Remove the reasons for the empty trailing rows and columns 😉
Usually, when you import data from an Excel sheet, Power Query will automatically detect the used range in a sheet and will just return those rows and columns who have content in it. So how can it come that in some cases, additional rows or columns are returned who have nothing but empty values in them?
The reason for it can be cell formatting of empty cells. They often occur in old workbooks where cells have been deleted. These cells will be returned with a null-value during the import process with Power Query. See this blogpost for more details of potential pitfalls that come with it.
The “Inquire” Excel Add-On lets you clean any excess cell formatting. After you’ve executed this command, Power Query will not import any of those leading or trailing empty rows or columns any more. Often this will reduce the file size of the Excel files dramatically as well.
You will benefit from:
- simpler query logic
- potentially huge improved import speed, due to the reduced file size
Enjoy and stay queryious 😉
Recently I’ve distributed some techniques for partial matches or relative joins between tables using PowerQuery or the query editor in PowerBI. They are very flexible and powerful – yet slow.
To improve performance you can check if there is a chance to “partition” your table using a Table.Group. If you have an equality expression in your statement like we had in our rolling-12-months-exercise here for example:
You can boost performance into a different dimension by grouping your table on the “Associate”-table instead like this: