Memory efficient clustered running total in Power BI

Today I want to share a scenario where a running total calculation in the query editor saved a model that run out of memory when done with DAX:

Problem

The model couldn’t be refreshed and returned out of memory error with a calculated column in the fact table of over 20 Mio rows (from a csv-file). A running total should be calculated for each “JourneyID”, of which there were over 1 Mio in the table itself. This rose memory consumption during refresh by over 300 % – until it finally errored out:

Besetzung =
CALCULATE (
    SUM ( Fact[Entries] )
– SUM ( Fact[Exits] );
    FILTER (
        ALLEXCEPT ( Fact; Fact[JourneyID] );
Fact[StopId]
<= EARLIER ( Fact[StopId] )
    )
)

Solution

In the query editor, I grouped the fact-table by “JourneyID” and choose “All Rows”:

Group in the query editor to effectively partition your Fact-Table

Due to the fact that the fact table was sorted by the “JourneyID”, I could use GroupKind.Local and manually tweaked the resulting code like so :

Table.Group(#"Changed Type2", {"JourneyID"}, {{"All", each _, type table}}, GroupKind.Local)

Without this option, the calculation would have also errored out.

This effectively created table partitions, with one row for each “JourneyID”:

One table per JourneyID

I then added a custom column where I used my superfast M-code for running totals, referencing the column “All” with the partition in it:

Table.AddColumn(#"Grouped Rows", "Custom", each fnRT([All], "JourneyID", “Movements”))

Function Code

This M-code for running totals is amazingly fast (within M-dimensions 😉 ) and particularly plays its strengths on large sections for aggregation.

Performance

Memory consumption rose from around 1,700 MB to 2,000 MB, all very stable. Of course, refresh duration rose (by 150 % for that table). But for that use case, it was an acceptable price to pay.

Credits to the core-function (using List.Generate) go to Bill Szysz.

Enjoy & stay queryious 😉

Comments (12) Write a comment

  1. Could you please provide your “big” source file for me to observe the efficiency improvement as mentioned.

    Reply

  2. Pingback: Running Totals In Power BI With M – Curated SQL

  3. Hi,

    a note about your DAX formula:

    Unless, they changed it, there is a difference between
    CALCULATE( …
    ; FILTER ( ALLEXCEPT ( Fact; Fact[JourneyID] )
    ; TRUE
    )
    )

    and

    CALCULATE( …
    ; ALLEXCEPT ( Fact; Fact[JourneyID] )
    )

    The first one returns the whole Fact table (minus the [JourneyID] column), filters it and then adds it to the context filter.

    The second one removes any filter on the Fact table, except those applied to [JourneyID].

    I suspect the following should run much faster than your original calculation:
    Besetzung =
    CALCULATE ( SUM ( Fact[Entries] ) – SUM ( Fact[Exits] );
    ALLEXCEPT ( Fact; Fact[JourneyID] );
    Fact[StopId]<= EARLIER ( Fact[StopId] )
    )

    Cheers

    Reply

    • Thanks for this code!!

      Although it runs a bit faster, the problem with the memory consumption still continues.
      So at least for my model, I have to stick to the M-solution, as this doesn’t spike the RAM-consumption.

      Reply

  4. Pingback: Optimizing M Function Calls With Function.ScaleVector() – Curated SQL

  5. Pingback: Writing data to GitHub using Power Query only – The BIccountant

  6. Hello Imke,

    I’m new in power query / power bi. I tried to recreate your code but stuck to create your function. Because I didn’t understand it, could it possible to share a sample file (not with the real data of course)?

    have nice day
    Cem

    Reply

  7. This is wonderful and so helpful! Thank you! I was able to execute using your code and it is generating a running total, but it’s not sorting correctly before calculating the running total. Do you have any suggestions as to what I may have missed in my use of your code?

    Reply

    • There is a Table.Buffer-command in the sort-step. Missing that might cause such a thing.
      And of course referencing the wrong column in the 2nd argument of the function.

      Reply

Leave a Reply

Die Datenschutzbestimmungen finden Sie hier: / Please find the privacy policy here: https://wp.me/P6lgsG-Rz