Performance tip to speed up slow pivot operations in Power Query and Power BI

Pivot operations in are a very handy feature in  Power Query but they can slow down refresh performance. So with some bittersweet pleasure I can tell that I found a trick to speed them up. The sweetness comes from the fact that the performance improvement is very significant. But the bitterness comes from the fact that I could have used this for almost 4 years now, but was too blind to realize at the time when I first worked with the code.

Trick to speed up a slow pivot table

This might not work everywhere, but for my tests, it worked really well: Don’t use an aggregation function when you want fast pivoting:

slow pivot

Don’t aggregate when you want a fast pivot in Power Query

But if your data isn’t aggregated on the row- & column values already, you’ll get this error message:

Error when the values are not adequately aggregated

So to make this work, you have to aggregate the values on the axis’ values before.

Let’s walk through the steps:

Walkthrough

Start is this table:

Start

The pivot shall bring the values from the “Column”-column into the column-area and sum the values from column AB like so:

Result

If I pivot without an aggregation like mentioned above I will get the dreaded error-message like above, because there are multiple rows for each Row- and Column-combination:

slow pivot power query performance

Values are not aggregated on the row- and column axis

The step to success is a grouping operation beforehand:

slow pivot power query performance

Group on all columns that shall define the row- & column values of the pivot

This returns a table with unique row- & column – combinations:

Aggregated table with just as many rows as the number of fields in the desired pivot table

9 rows for a desired 3×3-matrix looks just about right. So if I pivot here, there will be no further aggregation needed and the desired result will be shown.

Who found it?

Genius Bill Szysz used this method to speed up a slow pivot in his code to speed up matrix multiplication:

slow pivot power query performance

Code from Bill Szysz for a fast matrix multiplication (posted by DataChant)

This article is almost 4 years old, and I’ve even played around with the code at that time. Sight.. 4 years wasted time where I didn’t realize that the key for the performance improvement lied in a technique that would significantly improve the refresh speed of so many other applications as well…

Why does it work?

Here is my guess:

It looks as if the group operation creates some primary keys that create partitions for every row (or even every cell?) of the pivot table to be. I tested this guess by adding a primary key on those 2 columns (instead of grouping) and the refresh time sped up just like with the group operation. So if your data is already aggregated to the right level you can just add a key (or remove duplicates – as long as you don’t loose any rows), no need to do the group.

This means that the pivot operation doesn’t have to work on the full table, but just on the partitioned parts. (In this article I have described the performance improvements through partitions the first time).

But this also reminds me of the performance improvement for aggregations after joins, that I’ve blogged about here. Let’s see if there will be more use cases to be found.

Enjoy and stay queryious 😉

Automatically detect and change the types of all columns at once in Power Query

Today I want to share quick tip on how to automatically detect and change all column types at once in Power Query.

Background

Very often, when you expand a column in Power Query that contains a table or some records, the expanded columns will lose their types (like Chris Webb has described here for example). Or you might just have accidently deleted a “Changed Type”-step.

change all column types at once

No types on columns

Did you know there is actually a superfast and easy way to do it?

  1. Click the mouse anywhere in the table
  2. Press Ctrl + a (check all)
change the types of all columns

Check the whole table with Ctrl + a

  1. Go to the Transform-tab ad choose: “Detect Data Type”
change all column types at once

Transform with 1 click

Voila: All your columns should have types on them.

They have been automatically been detected by checking the first 100 rows of your table. So if you know that you’re having columns with inconsistent values in them, make sure to check the automatically assigned values.

Enjoy & stay queryious 😉

Tips to download files from webpages in Power Query and Power BI

When downloading data from the web, it’s often best to grab the data from APIs that are designed for machine-to-machine communication than from the site that’s actually visible on the screen. Not only is the download usually faster, but you also often get more additional parameters that can be very useful. In this article I’m going to show you how to retrieve the relevant URLs for downloading files from webpages (without resorting to external tools like Fiddler) and how to tweak them to your needs.

Retrieving the URL to download files from webpages

Say I want to download historical stock prices from this webpage:

https://finance.yahoo.com/quote/AAPL/history?p=AAPL

The screen will show a link to a download: Read more

Performance tip for aggregations after joins in Power Query and Power BI

This article was edited on 2nd Nov 2019. JoinKind.Local has been removed to avoid problems with merges on tables with primary keys:

In this article you’ll learn how to speed up the aggregation of joined/merged tables by orders of magnitude (I recorded up to 30 times faster execution times). This method works for merges where both table have multiple rows for each keys. If one of your tables has a primary key, the method Chris Webb describes here works just as good: Chris Webb’s article on how to improve performance on aggregations after joins using primary keys .

You can follow along the different methods in this file:  PerformanceAggregationsAfterMerges1_Upload.zip

Background

When you join a table to another table in Power Query, the UI gives you the option to either expand the columns (default) or aggregate the contents of the joint tables. That’s useful if multiple rows are returned for the rows of the table that has been joined to (left table):

Read more

Query folding on JoinKind.Inner gotcha for Power BI and Power Query

If you query databases who support query folding, you’re probably very aware of every step you take and check if folding happens with every new step like so:

“View Native Query” shows folding query that’s send back to the server (if not greyed out)

Folding will (usually) happen as long as “View Native Query” isn’t greyed out.

So when doing an inner join on tables whithin the same database, I was a bit surprised to see this greyed out actually. As according to the literature, it should fold.

But guess what? After I expanded a column from it, the folding was back again: Read more