Memory efficient clustered running total in Power BI

Today I want to share a scenario where a running total calculation in the query editor saved a model that run out of memory when done with DAX:

Problem

The model couldn’t be refreshed and returned out of memory error with a calculated column in the fact table of over 20 Mio rows (from a csv-file). A running total should be calculated for each “JourneyID”, of which there were over 1 Mio in the table itself. This rose memory consumption during refresh by over 300 % – until it finally errored out:

Besetzung =
CALCULATE (
    SUM ( Fact[Entries] )
– SUM ( Fact[Exits] );
    FILTER (
        ALLEXCEPT ( Fact; Fact[JourneyID] );
Fact[StopId]
<= EARLIER ( Fact[StopId] )
    )
)

Solution

In the query editor, I grouped the fact-table by “JourneyID” and choose “All Rows”:

Group in the query editor to effectively partition your Fact-Table

Due to the fact that the fact table was sorted by the “JourneyID”, I could use GroupKind.Local and manually tweaked the resulting code like so :

Table.Group(#"Changed Type2", {"JourneyID"}, {{"All", each _, type table}}, GroupKind.Local)

Without this option, the calculation would have also errored out.

This effectively created table partitions, with one row for each “JourneyID”:

One table per JourneyID

I then added a custom column where I used my superfast M-code for running totals, referencing the column “All” with the partition in it:

Table.AddColumn(#"Grouped Rows", "Custom", each fnRT([All], "JourneyID", “Movements”))

Function Code

	let
	func = (Table as table, SortColumn as text, AmountColumn as text) =>

	let
	/* Debug parameters
	Table = #"Price Paid",
	SortColumn = "Date",
	AmountColumn = "Price paid",
	*/

	// Sort table and buffer it
	Sorted = Table.Buffer(Table.AddIndexColumn(Table.Sort(Table,{{SortColumn, Order.Ascending}}), "Index",1,1)),
	// Select the Columns
	SelectColumns = Table.SelectColumns(Sorted, {SortColumn, AmountColumn, "Index"}),
	// Extract Amount column and buffer it
	ExtractAmountColumn = List.Buffer(Table.Column(SelectColumns, AmountColumn)),
	// Calculate a list with all running Totals
	RunningTotal = List.Skip(List.Generate( ()=> [ListItem=0, Counter=0]
	,each [Counter] <= List.Count(ExtractAmountColumn)
	,each [ ListItem = ExtractAmountColumn{[Counter]}+[ListItem],
	Counter = [Counter]+1
	]
	),1),
	ConvertedTable = Table.FromList(RunningTotal, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
	ExpandedColumn = Table.ExpandRecordColumn( ConvertedTable, "Column1", {"ListItem", "Counter"}, {"ListItem", "Counter"}),
	MergedQueries = Table.NestedJoin(Sorted,{"Index"}, ExpandedColumn,{"Counter"},"Expanded Column1",JoinKind.LeftOuter),
	Expand = Table.ExpandTableColumn( MergedQueries, "Expanded Column1", {"ListItem"}, {"RunningTotal"}),
	#"Removed Columns" = Table.RemoveColumns(Expand,{"Index"}),
	#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"RunningTotal", type number}})
	in
	#"Changed Type"
	//*
	,
	documentation = [ Documentation.Name = " Table.ColumnRunningTotal"
	,Documentation.Description = " Fast way to add running total to a table"
	,Documentation.LongDescription = " Fast way to add running total to a table"
	,Documentation.Category = " Table"
	,Documentation.Source = " local"
	,Documentation.Author = " Imke Feldmann: www.TheBIccountant.com"
	,Documentation.Examples = {[Description = " ", Code = " ", Result = " "]}]
	in
	Value.ReplaceType(func, Value.ReplaceMetadata(Value.Type(func), documentation))
	//*/

view raw

Table.ColumnRunningTotal.pq

hosted with ❤ by GitHub

This M-code for running totals is amazingly fast (within M-dimensions 😉 ) and particularly plays its strengths on large sections for aggregation.

Performance

Memory consumption rose from around 1,700 MB to 2,000 MB, all very stable. Of course, refresh duration rose (by 150 % for that table). But for that use case, it was an acceptable price to pay.

Credits to the core-function (using List.Generate) go to Bill Szysz.

If you are interested in a comparison between different approaches to create running totals, please check out Rick de Groots article where my solution here is compared against different versions.

Enjoy & stay queryious 😉

Comments (27) Write a comment

- Julian
- 01.10.18 @ 4:04 AM
Could you please provide your “big” source file for me to observe the efficiency improvement as mentioned.

Reply
- - Admin
  - 01.10.18 @ 8:29 AM
  Unfortunately I cannot share the file, as it was customers data.
  
  Reply
Pingback: Running Totals In Power BI With M – Curated SQL
- Julian
- 02.10.18 @ 3:06 PM
Ok, I got it.

Reply
Hi,

a note about your DAX formula:

Unless, they changed it, there is a difference between
CALCULATE( …
; FILTER ( ALLEXCEPT ( Fact; Fact[JourneyID] )
; TRUE
)
)

and

CALCULATE( …
; ALLEXCEPT ( Fact; Fact[JourneyID] )
)

The first one returns the whole Fact table (minus the [JourneyID] column), filters it and then adds it to the context filter.

The second one removes any filter on the Fact table, except those applied to [JourneyID].

I suspect the following should run much faster than your original calculation:
Besetzung =
CALCULATE ( SUM ( Fact[Entries] ) – SUM ( Fact[Exits] );
ALLEXCEPT ( Fact; Fact[JourneyID] );
Fact[StopId]<= EARLIER ( Fact[StopId] )
)

Cheers

Reply
- - Admin
  - 27.10.18 @ 5:32 AM
  Thanks for this code!!
  
  Although it runs a bit faster, the problem with the memory consumption still continues.
  So at least for my model, I have to stick to the M-solution, as this doesn’t spike the RAM-consumption.
  
  Reply
Pingback: Optimizing M Function Calls With Function.ScaleVector() – Curated SQL
Pingback: Writing data to GitHub using Power Query only – The BIccountant
- CemK
- 12.08.19 @ 1:10 PM
Hello Imke,

I’m new in power query / power bi. I tried to recreate your code but stuck to create your function. Because I didn’t understand it, could it possible to share a sample file (not with the real data of course)?

have nice day
Cem

Reply
- - Admin
  - 13.08.19 @ 5:29 PM
  Hi Cem,
  please check this video on how to apply M-code to your solution: http://community.powerbi.com/t5/Webinars-and-Video-Gallery/Power-BI-Forum-Help-How-to-integrate-M-code-into-your-existing/m-p/179314
  Otherwise I provide consulting services and am happy to send a quote (info at thebiccountant.com)
  /Imke
  
  Reply
- Kelly Porter
- 26.09.19 @ 9:50 PM
This is wonderful and so helpful! Thank you! I was able to execute using your code and it is generating a running total, but it’s not sorting correctly before calculating the running total. Do you have any suggestions as to what I may have missed in my use of your code?

Reply
- - Admin
  - 27.09.19 @ 8:52 AM
  There is a Table.Buffer-command in the sort-step. Missing that might cause such a thing.
  And of course referencing the wrong column in the 2nd argument of the function.
  
  Reply
- ferrisstate
- 26.11.19 @ 11:48 AM
Hello Imke,

I really love what you have done. Unfortunately I could not find the Table.Buffer-command and could fix this myself. Could you share this with the community?

Many thanks in advance,

ferrisstate

Reply
- - Imke Feldmann
  - 26.11.19 @ 3:27 PM
  Hi ferrisstate,
  the buffer is in row 16 of the function code above.
  Cheers, Imke
  
  Reply
- Jan Pedersen
- 23.04.20 @ 2:04 PM
Dear imke

I would like to use tyour superfast M-code for running total, but I need know if your code modfied with more columns
So I get running total per group I have the following columns

Col1 = Site ( in total 3 sites)
Col2 = Warehouse (in total 12 warehouses)
Col4 -= Item ID (About 1,000 items)
Col5 = Week number (3 months horizon )
Col5 = Index ( Transaction Index is create by group Site, Warehouse, Item ID, weeknumber)

Reply
- - Jan Pedersen
  - 23.04.20 @ 4:02 PM
  Sorry I was typing to fast
  I would like to use your superfast M-code for running total, but I need to know if your code can be modfied with additional columns, so I get running total per group. If yes, could you please provide an code example of how to do?
  
  I have the following columns and want a running total per Site, Warehouse, Item ID and weeknumber
  
  Col1 = Site (In total 3 sites)
  Col2 = Warehouse (In total 12 warehouses)
  Col4 = Item ID (About 1,000 items)
  Col5 = Week number (3 months horizon )
  Col5 = Index ( Transaction Index is created by grouping: Site, Warehouse, Item ID and weeknumber)
  Col6 = Quantities
  
  Reply
  - Imke Feldmann
    
    23.04.20 @ 8:34 PM
    
    Hi Jan,
    you should group on all these columns with Operation “All Rows”.
    That will create one table per group combinations. Run the function on those “partitions”.
    That’s the fastest method there is.
    /Imke
    
    Reply
- - Imke Feldmann
  - 07.05.20 @ 7:41 PM
  Hi Jan,
  sorry, I missed your comment.
  You can include multiple columns as group columns: Just switch from “Basic” to “Advanced” in the UI and add Col1-4 to it.
  Week number and Index columns should show up, although you’d have to rename the “Index” column because it will conflict with the default column names of what the function returns.
  /Imke
  
  Reply
- Jan
- 12.08.20 @ 3:35 PM
Hi Imke

Now I have tried to replicate your case and grouped all rows so I end up with a table with 2 column
Column 1 = WeekNumber (Sorted in ascending Order)
Column 2 = All (partition/nested tables)

fnRT:
Table= [All]
SortColumn=”Weeknumber”
AmountColumn= “Qty” ( Qty Column inside the [All] partitions table)

Your code cant find “AmountColumn” with the below M-code

// Select the Columns
SelectColumns = Table.SelectColumns(Sorted, {SortColumn, AmountColumn, “Index”}),

Reply
- - Imke Feldmann
  - 14.08.20 @ 9:22 PM
  Hi Daniel,
  unfortunately I’m having difficulties to follow.
  /Imke
  
  Reply
- - john.b.thomas@gmail.com
  - 06.03.24 @ 11:18 PM
  Don’t forget this step:
  
  Table.Group(#”Changed Type2″, {“JourneyID”}, {{“All”, each _, type table}}, GroupKind.Local)
  
  I had the same issue and if you don’t have the “GroupKind.Local” it will not work.
  
  Reply
- Jan Pedersen
- 29.11.20 @ 6:51 PM
Dear Imke

After long time of investigation of above function I found a your 2nd. solution of RT from this website

https://community.powerbi.com/t5/Desktop/DAX-formula-to-calculate-cumulative-running-total-across-two/td-p/147327/page/3

Your function below is supporting table partitions (grouped table) 🙂

/let
func = (Table as table, AmountColumn as text, optional SortColumn as text, optional GroupColumns) =>
*/
let
/ Debug parameters*/
Table = Grouped,
SortColumn = “Country”,
AmountColumn = “Value”,
GroupColumns = {“Country”},

GroupedTable = Table.Group(Table, GroupColumns, {{"Partition", each _, type table}}), fnRunningTotal = (Table as table, AmountColumn as text, optional SortColumn as text, optional GroupColumns) => let Source = if SortColumn = null then Table else Table.Buffer(Table.Sort(Table,{{SortColumn, Order.Ascending}})), // Sort table and buffer it Buffered = Table.Buffer(Table.AddIndexColumn(Source, "Index",1,1)), // Select the Columns SelectColumns = Table.SelectColumns(Buffered, if SortColumn = null then {AmountColumn, "Index"} else {SortColumn, AmountColumn, "Index"}), // Extract Amount column and buffer it ExtractAmountColumn = List.Buffer(Table.Column(SelectColumns, AmountColumn)), // Calculate a list with all running Totals RunningTotal = List.Skip(List.Generate( ()=> [ListItem=0, Counter=0] ,each [Counter] <= List.Count(ExtractAmountColumn) ,each [ ListItem = ExtractAmountColumn{[Counter]}+[ListItem], Counter = [Counter]+1 ] ),1), ConvertedTable = Table.FromList(RunningTotal, Splitter.SplitByNothing(), null, null, ExtraValues.Error), ExpandedColumn = Table.ExpandRecordColumn( ConvertedTable, "Column1", {"ListItem", "Counter"}, {"ListItem", "Counter"}), MergedQueries = Table.NestedJoin(Buffered,{"Index"}, ExpandedColumn,{"Counter"},"Expanded Column1",JoinKind.LeftOuter), Expand = Table.ExpandTableColumn( MergedQueries, "Expanded Column1", {"ListItem"}, {"RunningTotal"}), #"Removed Columns" = Table.RemoveColumns(Expand,{"Index"}), #"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"RunningTotal", type number}}) in #"Changed Type", AddRTToGroup = Table.AddColumn(GroupedTable, "Custom", each fnRunningTotal([Partition], AmountColumn)), #"Removed Other Columns" = Table.SelectColumns(AddRTToGroup,{"Custom"}), ExpandedGroup = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", Table.ColumnNames(Table) & {"RunningTotal"}), WithoutGroup = fnRunningTotal(Table, "Value"), Result = if GroupColumns = null then WithoutGroup else ExpandedGroup

in
Result
/*
,
documentation = [ Documentation.Name = ” Table.ColumnRunningTotal”
,Documentation.Description = ” Fast way to add running total to a table”
,Documentation.LongDescription = ” Fast way to add running total to a table”
,Documentation.Category = ” Table”
,Documentation.Source = ” local”
,Documentation.Author = ” Imke Feldmann: http://www.TheBIccountant.com”
,Documentation.Examples = {[Description = ” “, Code = ” “, Result = ” “]}]
in
Value.ReplaceType(func, Value.ReplaceMetadata(Value.Type(func), documentation))*/

For your table, you would call it like so:

MyFunction(YourTable, “Value”, null, {“Name”})

Reply
- - Imke Feldmann
  - 30.11.20 @ 7:35 AM
  Hi Jan,
  glad to hear you found it useful and thanks for sharing your adoption.
  /Imke
  
  Reply
- Olly
- 29.07.22 @ 11:22 AM
This is EXCELLENT. Thank you.

Reply
- Rick de Groot
- 06.10.23 @ 10:34 AM
Hi Imke,

I’ve had the chance to review your method in detail and learned something here. It differs from the approach I’ve taken in that in I:

‘unpack’ the original table into lists,
generate a list of running totals,
ascribe the data type
reassemble everything using Table.FromColumns.

Upon reviewing your method, I see you:

return both the running total and the index number in a record.
you then join this output list back to the original table using the index number.

It sounds like that approach could offer performance advantages. Especially since M is often more efficient performing joins, compared to other methods of comparing.

Sounds like an interesting topic to test how different methods compare in terms of performance.

Best regards,
Rick

Reply
Pingback: Chris Webb's BI Blog: Using Function.ScalarVector() To Optimise Calls To M Functions
- john.b.thomas@gmail.com
- 06.03.24 @ 11:13 PM
I know this is an older post, but I used this to calculate a RunningTotal on employees hours by Week Ending date. Worked out great. Thanks for sharing Imke…

Reply