Performance tip to speed up slow pivot operations in Power Query and Power BI

Pivot operations in are a very handy feature in  Power Query but they can slow down refresh performance. So with some bittersweet pleasure I can tell that I found a trick to speed them up. The sweetness comes from the fact that the performance improvement is very significant. But the bitterness comes from the fact that I could have used this for almost 4 years now, but was too blind to realize at the time when I first worked with the code.

Trick to speed up a slow pivot table

This might not work everywhere, but for my tests, it worked really well: Don’t use an aggregation function when you want fast pivoting:

slow pivot

Don’t aggregate when you want a fast pivot in Power Query

But if your data isn’t aggregated on the row- & column values already, you’ll get this error message:

Error when the values are not adequately aggregated

So to make this work, you have to aggregate the values on the axis’ values before.

Let’s walk through the steps:

Walkthrough

Read more

Performance tip for aggregations after joins in Power Query and Power BI

This article was edited on 2nd Nov 2019. JoinKind.Local has been removed to avoid problems with merges on tables with primary keys:

In this article you’ll learn how to speed up the aggregation of joined/merged tables by orders of magnitude (I recorded up to 30 times faster execution times). This method works for merges where both table have multiple rows for each keys. If one of your tables has a primary key, the method Chris Webb describes here works just as good: Chris Webb’s article on how to improve performance on aggregations after joins using primary keys .

You can follow along the different methods in this file:  PerformanceAggregationsAfterMerges1_Upload.zip

Background

When you join a table to another table in Power Query, the UI gives you the option to either expand the columns (default) or aggregate the contents of the joint tables. That’s useful if multiple rows are returned for the rows of the table that has been joined to (left table):

Read more

Memory efficient clustered running total in Power BI

Today I want to share a scenario where a running total calculation in the query editor saved a model that run out of memory when done with DAX:

Problem

The model couldn’t be refreshed and returned out of memory error with a calculated column in the fact table of over 20 Mio rows (from a csv-file). A running total should be calculated for each “JourneyID”, of which there were over 1 Mio in the table itself. This rose memory consumption during refresh by over 300 % – until it finally errored out:

Besetzung =
CALCULATE (
    SUM ( Fact[Entries] )
– SUM ( Fact[Exits] );
    FILTER (
        ALLEXCEPT ( Fact; Fact[JourneyID] );
Fact[StopId]
<= EARLIER ( Fact[StopId] )
    )
)

Solution

Read more

Performance tip for List.Generate (1): Buffer your tables in Power BI and Power Query

Lately I was working on a fairly advanced allocation algorithm on large data which forced me to search for different tricks to improve performance than those that you can find on my site here already.

Background

I was using List.Generate to check for every month in my table, if there was enough free capacity on a platform to start new wells. As every well had a certain production scheme (producing different amounts for a certain length of time), I first had to check the total production amount of active wells before I could determine the spare capacity for a new month. So I had to look into every active well, grab the capacity of the new month and add it up.

Therefore I’ve stored the active production schemes in one table in my List.Generate-record. That lead to an exponentially decreasing performance unfortunately.

Solution to improve performance of List.Generate

Buffering my tables in the “next”-function reduced the query duration by almost 70% !

Although a Table.Buffer or List.Buffer is always high on my list when it comes to performance issues, I was fairly surprised to see that behaviour here: As List.Generate returns the last element of its list as an argument for the next step, I was always assuming that this would be cached (and that was the reason because List.Generate performs recursive operations faster than the native recursion in M). Also, I had just referenced that table once ane in such a case, a buffer would normally not have come into my mind. (But desperation sometimes leads to unexpected actions …)

I also buffered a table that had just been referenced within the current record (and not recursively) and this improved performance as well. (Although in that case, the tables has been referenced multiple times within the current record). But this buffer didn’t have such a big impact on performance than the one on the table that was referenced by the recursive action.

Code

Here is some pseudo-code illustrating the general principle:

Solution with buffers:

How to improve performance of List.Generate: Use Table.Buffer

 

Is that new to you or have you made the same experience? Which grades of performance improvements did you achieve with this method? Please let me know in the comments!

Enjoy & stay queryious 😉

Improve import of Excel sheets with empty rows and columns in Power Query and Power BI

When you import Excel sheets who have empty leading or trailing columns and rows (showing null-values), you can substantially improve the complexity and speed of your import process with a simple trick:

Remove the reasons for the empty trailing rows and columns 😉

Background

Usually, when you import data from an Excel sheet, Power Query will automatically detect the used range in a sheet and will just return those rows and columns who have content in it. So how can it come that in some cases, additional rows or columns are returned who have nothing but empty values in them?

Reason

The reason for it can be cell formatting of empty cells. They often occur in old workbooks where cells have been deleted. These cells will be returned with a null-value during the import process with Power Query. See this blogpost for more details of potential pitfalls that come with it.

Solution

The “Inquire” Excel Add-On lets you clean any excess cell formatting. After you’ve executed this command, Power Query will not import any of those leading or trailing empty rows or columns any more. Often this will reduce the file size of the Excel files dramatically as well.

Effects

You will benefit from:

  • simpler query logic
  • potentially huge improved import speed, due to the reduced file size

Enjoy and stay queryious 😉

Performance Tip: Partition your tables at crossjoins where possible – PowerQuery PowerBI

Recently I’ve distributed some techniques for partial matches or relative joins between tables using PowerQuery or the query editor in PowerBI. They are very flexible and powerful – yet slow.

To improve performance you can check if there is a chance to “partition” your table using a Table.Group. If you have an equality expression in your statement like we had in our rolling-12-months-exercise here for example:

You can boost performance into a different dimension by grouping your table on the “Associate”-table instead like this:

Read more