Performance tip for List.Generate (1): Buffer your tables in Power BI and Power Query

Lately I was working on a fairly advanced allocation algorithm on large data which forced me to search for different tricks to improve performance than those that you can find on my site here already.

Background

I was using List.Generate to check for every month in my table, if there was enough free capacity on a platform to start new wells. As every well had a certain production scheme (producing different amounts for a certain length of time), I first had to check the total production amount of active wells before I could determine the spare capacity for a new month. So I had to look into every active well, grab the capacity of the new month and add it up.

Therefore I’ve stored the active production schemes in one table in my List.Generate-record. That lead to an exponentially decreasing performance unfortunately.

Solution to improve performance of List.Generate

Buffering my tables in the “next”-function reduced the query duration by almost 70% !

Although a Table.Buffer or List.Buffer is always high on my list when it comes to performance issues, I was fairly surprised to see that behaviour here: As List.Generate returns the last element of its list as an argument for the next step, I was always assuming that this would be cached (and that was the reason because List.Generate performs recursive operations faster than the native recursion in M). Also, I had just referenced that table once ane in such a case, a buffer would normally not have come into my mind. (But desperation sometimes leads to unexpected actions …)

I also buffered a table that had just been referenced within the current record (and not recursively) and this improved performance as well. (Although in that case, the tables has been referenced multiple times within the current record). But this buffer didn’t have such a big impact on performance than the one on the table that was referenced by the recursive action.

Code

Here is some pseudo-code illustrating the general principle:

Solution with buffers:

How to improve performance of List.Generate: Use Table.Buffer

 

Is that new to you or have you made the same experience? Which grades of performance improvements did you achieve with this method? Please let me know in the comments!

Enjoy & stay queryious 😉

Comments (5) Write a comment

  1. Hi Imke! I think this performance boost is because the ‘promises’ that were generated inside List.Generate or List.Accumulate, something like Gil Raviv described in his post about CamelCase. Buffering forces evaluation and stops stacking of promises, which frees available memory.
    But, of course, I could be very wrong.

    Reply

      • It is hidden for me too 🙂 as far as I can see, comparing to null forces this evaluation, but… too much for me.
        “let’ expression is also a record, so I suppose that it is the same case as yours. But who knows, we need to cast Curt here 🙂

        Reply

  2. Hi Imke

    I’m struggling with the performance in one of my PowerPivot workbooks and I’m wondering if table.buffer could decrease the time it takes to update.

    So I have a very large csv-file (roughly 330 MB) that contains my fact table (about 4 million rows). One of the columns in that table is a date column. After I’ve loaded this table I want to create a dynamic date (dimension) table that starts at the earliest date in the fact table and ends at the last date in the fact table. The problem is that my current queries seems to load the large csv-file at least twice.

    So this is my current set up

    FactTable:

    Let
    Source = Csv.Document(“…\LargeCSV.txt”)…,
    SomeTransformations = Table.Transform…
    FilterRows =Table.SelectRows(SomeTransformations,…)
    FinalFactTable = Table.AddColumns(FilterRows…)
    in
    FinalFactTable

    DateTable

    Let
    Source = CreateDateTableFunction()
    in
    Source

    CreateDateTableFunction

    let CreateDateTable = () as table =>
    let
    StartDate = List.Min(Table.Column(FactTable,”Date”),
    EndDate = List.Max(Table.Column(FactTable, “Date”),
    DayCount = Duration.Day(Duration.From(EndDate – StartDate)),
    ListDays = List.Dates(StartDate,DayCount,#duration(1,0,0,0)),
    TableFromList = Table.FromList(ListDays…),
    FinalDateTable = Table.AddColumn(…),
    in
    FinalDateTable
    in
    CreateDateTable

    Could I use table.buffer to speed up the update process and if so how would i do that? Do you see any other areas that could improve the performance?

    Reply

Leave a Reply

Die Datenschutzbestimmungen finden Sie hier: / Please find the privacy policy here: https://wp.me/P6lgsG-Rz