Performance Tip: Partition your tables at crossjoins where possible – PowerQuery PowerBI

Recently I’ve distributed some techniques for partial matches or relative joins between tables using PowerQuery or the query editor in PowerBI. They are very flexible and powerful – yet slow.

To improve performance you can check if there is a chance to “partition” your table using a Table.Group. If you have an equality expression in your statement like we had in our rolling-12-months-exercise here for example:

You can boost performance into a different dimension by grouping your table on the “Associate”-table instead like this:

The performance difference is huge, so thx to Colin Banfield for pointing this out (if you’re interested in M-code, you’ll also find an awesome alternative technique from Bill Szysz in that post). But have in mind that the performance also depends on your source data, so the performance gains my vary.

Also check out this post by Chris Webb which holds some more background information about this topic.

Enjoy & stay queryious 🙂

Comments (7) Write a comment

  1. Hi Imke

    I love your blog, keep up the good work!

    One request, if possible, are you able to put in your code snippets as text instead of pictures? It makes it easier to copy/paste 🙂

    Thanks,
    Simon

    Reply

    • Thanks Simon!
      I understand your request, but this conflicts a bit with highlighting and formatting. Lately I’ve started to enclose txt-files to download with the pure code in it.
      Here I thought that opening the link to the forum-post was also acceptable – will keep this in mind though 😉

      Reply

  2. Pingback: Tabellen zusammenführen mit relativer Bedingung - PowerBI Pro

  3. Pingback: #Excel Super Links #53 – shared by David Hager | Excel For You

  4. Pingback: Find overlapping time periods using #PowerQuery - Part 3 (Fast & Furious) - DataChant

  5. Hello! I came across this recently and have a question, hoping you can help. I’ve used the technique above and it works great. I am trying to add two separate columns to the same grouped table – one is the sum of revenue, the other is the sum of a count column. Is there a way to do this in a single function? Right now I have it split out, but I feel like it is repeating many of the same steps, wondering if there was a way to combine to make the query more efficient and faster. Thanks so much, love your blog!

    Reply

    • Hi Nicole,
      yes, you could create a record with as many fields/columns as you want like so:

      Table.AddColumn(
      TableRow,
      “Custom”,
      (FilterTable) =>
      let
      Partition = Table.SelectRows(Source, each [Date]>Date.AddMonths(FilterTable[Date],-12) and [Date]=FilterTable[Date] and [Associate]=FilterTable[Associate]),
      FirstResult = List.Sum(Partition[Sales]),
      SecondResult = Table.RowCount(Partition),
      myRecord = [FirstResult=FirstResult, SecondResult=SecondResult]
      in
      myRecord)

      Reply

Leave a Reply