Performance tip for aggregations after joins in Power Query and Power BI

This article was edited on 2nd Nov 2019. JoinKind.Local has been removed to avoid problems with merges on tables with primary keys:

In this article you’ll learn how to speed up the aggregation of joined/merged tables by orders of magnitude (I recorded up to 30 times faster execution times). This method works for merges where both table have multiple rows for each keys. If one of your tables has a primary key, the method Chris Webb describes here works just as good: Chris Webb’s article on how to improve performance on aggregations after joins using primary keys .

You can follow along the different methods in this file:  PerformanceAggregationsAfterMerges1_Upload.zip

Background

When you join a table to another table in Power Query, the UI gives you the option to either expand the columns (default) or aggregate the contents of the joint tables. That’s useful if multiple rows are returned for the rows of the table that has been joined to (left table):

Performance of native aggregation after join is very slow

But this method is extremely slow. Compared to “simply” expanding all values to new rows (which took around 5 seconds), the aggregation took around 50 seconds. The automatically generated code uses the “Table.AggregateTableColumn”-function. (see Query1_NativeAggregate)

Table.AggregateTableColumn(#"Merged Queries", "Expanded Custom", {{"Animal", each Text.Combine(_, ", "), "CombinedValues"}})

My first attempt to speed up performance was not to expand the column that contains the merged table at all, but to add a column instead with a manual aggregation function. (see Query2_AddManualAggregation)

 Table.AddColumn(#"Merged Queries", "CombinedValues", each Text.Combine([Expanded Custom][Animal], ", "))

This improved speed by 50-60%, but still, way slower than expanding all rows.

The unexpected solution

What turned out to be by far the fastest was to expand the columns to all the new rows and then “group back”. (see Query3_ReGroupIntegrated)

Table.Group(#"Expanded Expanded Custom", {"Column1", "Custom"}, {{"CombinedValues", each Text.Combine(_[Animal], ". ") }})

To my surprise, this was even faster than skipping this step (around 2 seconds, instead of 5). Means: This aggregation shortened the table from 676k rows to 26k rows. Of course, loading a shorter table should take less time. But I expected the computation of this aggregation also to take a fair amount of time. So at the end, this was actually less than the time gained by the shorting of the table.

But the surprise didn’t stop here. As I know many beginners aren’t so comfortable with editing existing code, I tried a different method (see Query4_ReGroupAddColumn): I kept the native “All rows”-operation and added the same column than in Query2_AddManualAggregation. And it was just as fast/even slightly faster than the fast Query3_ReGroupIntegrated.

So just by adding 2 steps: Expansion of the merged column and Re-Grouping I sped up the query significantly: Another mystery that the M-engine holds for me…

Please share your thoughts about and experiences with this method in the comments below. Maybe MS will change the code behind the “default-Aggregate”-function, if there is enough evidence that the alternative method proves to be superior and stable in many other cases as well.

Thanks and stay queryious 😉

Parent-Child Hierarchies with multiple parents in Power BI with Power Query

I’ve written about a method to dynamically flatten parent-child-hierarchies also with multiple parents some while ago here. I’ve actually used this approach for Bill-of-materials cases and refined that approach in a series starting here. There, the quantities are aggregated in M already, as they are not supposed to change. But if one wants to use the hierarchical structure to report on transaction tables where several filters shall be applied, one has to adjust the data model a bit:

DimNodes

If you have parent-child-hierarchy with multiple parents, my function will a table like below, where the children with multiple parents still reside in different rows:

Read more

Power BI administration made easy with Power BI REST API custom connector

Today I read the (as always) great article by Matthew Roche “Governing Power BI just got a little easier” and couldn’t find a description on how to get to this promising window with all the admin goodness. So here it comes how to build your Power BI REST API custom connector then 😉 :

Create a custom connector for the Power BI REST API

Miguel Escobar has done a fantastic job to make it super-easy for you here.

Edit 30th September 2019: This repo has just been updated and includes a version with API secret. So if you’ve downloaded that content before and got an authorization error, please get the new files.

Get data from your connector

After you’ve stored the .mez-file in the correct folder (C:\Users\<YourUserNameGoesHere>\Documents\Power BI Desktop\Custom Connectors) and open Power BI Desktop again, you will be greeted with this warning message: Read more

Export large amount of data from Power BI desktop visuals

I’m going to show how to export data from visuals in Power BI Desktop that’s too big to be downloaded by the native functionality and therefore returns this error-message:

Export data from visuals

Check if you really need this

Although the method is fairly simple, there are simpler methods if you just need the raw data from your data model (and not the specific aggregations or measures that the visual contains): Read more

Debug DAX variables in Power BI and Power Pivot

When you’re dealing with a beast like DAX you can use any help there is, right? So here I show you how you can debug DAX variables who contain tables or show the result of multiple variables at once. So you can easily compare them with each other to spot the reason for problems fast.

Please note, that currently only comma separated DAX code is supported.

Example

Watch this measure from Gerhard Brueckl’s brilliant solution for dynamic TopN clustering with others. It contains 5 variables who return tables and one variable with a scalar:

Measure with variables who contain tables and scalars

If you want to follow along how this calculation is evolving for each value in a matrix, my VarDebugMeasure will show details of every variable like so:

Measure to debug DAX variables

Method

Read more

DAX CALCULATE Debugger

CALCULATE is the most powerful function in DAX, as it allows you to change the filter context under which its expression is evaluated to your hearts content. But with big number of options to choose from, often comes big frustration when the results don’t match expectations. Often this is because your syntax to modify the filter context doesn’t do what you’ve intended. Unfortunately CALCULATE only displays its result and not how it achieved it, so debugging becomes a challenge. This is where my CALCULATE Debugger measure can help out:

DAX CALCULATE Debugger

This is a measure that returns a text-value, showing the number of rows of the adjusted filter context table, the MIN and MAX value of the selected column as well as up to the first 10 values. Just place this measure beneath the CALCULATE-measure in question and try to find the error 😉

Read more