Performance tip to speed up slow pivot operations in Power Query and Power BI

Pivot operations in are a very handy feature in  Power Query but they can slow down refresh performance. So with some bittersweet pleasure I can tell that I found a trick to speed them up. The sweetness comes from the fact that the performance improvement is very significant. But the bitterness comes from the fact that I could have used this for almost 4 years now, but was too blind to realize at the time when I first worked with the code.

Trick to speed up a slow pivot table

This might not work everywhere, but for my tests, it worked really well: Don’t use an aggregation function when you want fast pivoting:

slow pivot

Don’t aggregate when you want a fast pivot in Power Query

But if your data isn’t aggregated on the row- & column values already, you’ll get this error message:

Error when the values are not adequately aggregated

So to make this work, you have to aggregate the values on the axis’ values before.

Let’s walk through the steps:

Walkthrough

Start is this table:

Start

The pivot shall bring the values from the “Column”-column into the column-area and sum the values from column AB like so:

Result

If I pivot without an aggregation like mentioned above I will get the dreaded error-message like above, because there are multiple rows for each Row- and Column-combination:

slow pivot power query performance

Values are not aggregated on the row- and column axis

The step to success is a grouping operation beforehand:

slow pivot power query performance

Group on all columns that shall define the row- & column values of the pivot

This returns a table with unique row- & column – combinations:

Aggregated table with just as many rows as the number of fields in the desired pivot table

9 rows for a desired 3×3-matrix looks just about right. So if I pivot here, there will be no further aggregation needed and the desired result will be shown.

Who found it?

Genius Bill Szysz used this method to speed up a slow pivot in his code to speed up matrix multiplication:

slow pivot power query performance

Code from Bill Szysz for a fast matrix multiplication (posted by DataChant)

This article is almost 4 years old, and I’ve even played around with the code at that time. Sight.. 4 years wasted time where I didn’t realize that the key for the performance improvement lied in a technique that would significantly improve the refresh speed of so many other applications as well…

Why does it work?

Here is my guess:

It looks as if the group operation creates some primary keys that create partitions for every row (or even every cell?) of the pivot table to be. I tested this guess by adding a primary key on those 2 columns (instead of grouping) and the refresh time sped up just like with the group operation. So if your data is already aggregated to the right level you can just add a key (or remove duplicates – as long as you don’t loose any rows), no need to do the group.

This means that the pivot operation doesn’t have to work on the full table, but just on the partitioned parts. (In this article I have described the performance improvements through partitions the first time).

But this also reminds me of the performance improvement for aggregations after joins, that I’ve blogged about here. Let’s see if there will be more use cases to be found.

Enjoy and stay queryious 😉

paging pagination Power Query

How not to miss the last page when paging with Power BI and Power Query

When you use an API with a paging mechanism (like the example from this blogpost), you’ll might work with a field that contains the address for the next page. You can use this to walk through the available chunks until you reach the last element. That last element in the pagination will not contain a next-field or that field will be null.

Paging in Power Query

In Power Query you can use the function List.Generate for it. According the latest function documentation it:

Generates a list of values given four functions that generate the initial value initial, test against a condition condition, and if successful select the result and generate the next value next.

So an intuitive implementation would look like so:

paging pagination Power Query

Initial code for paging: Will miss the last element

In the initial step (row 2) the API will be called and returns this record:

paging pagination Power Query

Examining the result of the first call

So for the upcoming iterations (next in row 4), a reference to the field next will be made and this URL will be called.

In the condition (row 3) I say that this process shall be repeated until the next-field of my previous result ([Result]) is empty.

However, this will only return 14 list with 20 elements each, missing the last element with 13 items to retrieve the full 293 items.

Let’s check it out:

Last Element (13 rows) is missing

Solution

Honestly, I still find it difficult to understand, why this last element is missing. But fortunately there is an easy solution:

paging pagination Power Query

Split into 2 steps and reference previous URL instead

The trick lies in the adjusted condition (row 4): Instead of checking if there is a next-field in the previous record, I check if the previous record had a URL to call. That basically reaches 1 level further back and will deliver the full results.

Alternative

Actually, you can also use some “brute force” using a try – otherwise – statement like so:

Simple alternative

But this will not deliver any items for debugging if something in the calls goes wrong. So I prefer not to use try statements for looping or pagination.

Enjoy and stay queryious 😉

Tips to download files from webpages in Power Query and Power BI

When downloading data from the web, it’s often best to grab the data from APIs that are designed for machine-to-machine communication than from the site that’s actually visible on the screen. Not only is the download usually faster, but you also often get more additional parameters that can be very useful. In this article I’m going to show you how to retrieve the relevant URLs for downloading files from webpages (without resorting to external tools like Fiddler) and how to tweak them to your needs.

Retrieving the URL to download files from webpages

Say I want to download historical stock prices from this webpage:

https://finance.yahoo.com/quote/AAPL/history?p=AAPL

The screen will show a link to a download: Read more

Trimming text with custom characters in Power BI and Power Query

When cleaning dirty data, you might have used the Trim-function (Text.TrimStart or Text.TrimEnd) to delete leading or trailing whitespace from your strings. However, did you know that you could use these functions as well to delete any other characters as well from the start or end of a string? Trimming text with custom characters is pretty straightforward:

Task: Trimming text with custom characters

Say you have a column with values like so

Trimming text with custom characters

and want to delete every number at the end and also every “-” that is directly connected with a number. So that the final output shall look like so:

Trim custom characters at the end of a string.

Optional parameter

By default, you feed just one argument into the Text.TrimStart or Text.TrimEnd function: The string whose whitespace characters shall be removed.

Text.TrimEnd(text as nullable text, optional trim as any) as nullable text

But the second argument lets you define a list of your own characters to be removed respectively. So I can create a list with all the characters that shall be removed from the end like so:

{"0".."9"} & {"-"}

This concatenates 2 lists: The first list contains 10 elements: All numbers as strings. The second list has just one element in it: “-“. I have to put this element into a list as well for being able to use the ampersand (“&”) as an easy concatenator here.

So the full expression for the “Add custom column” dialogue looks like so:

Text.TrimEnd( [MyColumnName], {"0".."9"} & {"-"} )

To see this in action, you can simply paste this code into the advanced editor and follow the steps:

Enjoy and stay queryious 😉

Date.Networkdays function for Power Query and Power BI

Today I’m going to share my custom NETWORKDAYS function for Power Query with you that uses the same syntax than its Excel-equivalent.

NETWORKDAYS function

This function’s first 3 parameters work just like the Excel function and there is a 4th parameter that allows adjusting the day on which the week shall start:

  1. Start as date
  2. End as date
  3. optional holidays as list of dates
  4. optional number to change the start of the week from Monday (default: 1) to any other day (2 would mean that the week starts on Tuesday instead)

The function comes with a UI that lets you first choose a table containing the holidays and then choose the column with the holiday dates. Read more

Tidy up Power BI models with the Power BI Cleaner tool

The VertiPaq-Analyzer tool is one of the great community tools that I really cannot live without. It gives me a great overview of all elements in my model and identifies potential performance problems by showing the storage requirements of each column. So when seeing expensive columns, the first question that arises is: “Do I really need this column or could I delete it?”. Luckily, this can now be answered with my new Power BI Cleaner tool. This tool shows the usage of all columns (and measures) within the tables of the VertiPaq Analyzer.

Power BI Cleaner shows unused columns in the VertiPaq-tables

Power BI Cleaner tool

So whenever there is no entry in the column “Where Used” you can go ahead and eliminate the column (or measure) from the model. Well – with one exception actually: Fields used in the definition of incremental load policies are currently not identified. So make sure to consider this before running wild 😉 Read more