Incremental Load in Power BI using DAX UNION

This is an old post. Meanwhile we can implement proper incremental refresh with Power BI licenses: https://docs.microsoft.com/en-us/power-bi/admin/service-premium-incremental-refresh

Warning – edit 03-Jan-18: This method doesn’t work in the service unfortunately. Please upvote the bugfix here: https://community.powerbi.com/t5/Issues/quot-Include-in-report-refresh-quot-disabling-doesn-t-work-in/idi-p/331982#M19945

This article describes the latest workaround for incremental load in PBI (thx to Taylor Clark for stressing this out!). It’s not very dynamic, as it doesn’t automatically load the difference to the existing data. Instead you have one query that contains your old data (which will be kept) and another query that grabs all data that comes after the last item from your old data. But at least it’s a technique that works without a hack:

Incremental Load Process

 

1 Create “Old Data”: “DontRefresh”

So it’s up to you to split up your long table or web-load activities and load your “old” stuff into one query (“DontRefresh”), perform your transformations and then load into the data model once. Then go back to the query-editor and disable the option “Include in Report Refresh”.

2 Create “New Data”: “Refresh”

Then take the cut-off-filter-criterium and use it to define the load of the new data that will subsequently be refreshed (“Refresh”). Transform your transformations from the first query into a function to make sure both tables have the same structure and load it to the data model (leaving the default load options to refresh).

3 Create table using UNION in DAX

In the data model, you create a new table that appends both tables to each other (and hide both input-tables from client view):

Create table in DAX using UNION

Another reason why this is not ideal is that fact that you cannot perform data transformations in the query-editor that iterate over the whole table. So I really hope that incremental load will once be a native functionality in PBI. Please vote for it here, as Microsoft prioritizes many of its activities on customer feedback: I vote for incremental load

Just picked up a useful tip from Mimoune Djouallah, to use a syntax like this:

union (summarize(table_current, field1,field2),summarize(table_history,field1,field2))

Which highlights the difference between the Append-command from M and the Union from DAX: The Union function requires the columns to have the same order in your table.

4 Why not use Append in the query-editor instead?

Another drawback of the current implementation is a somewhat unintuitive behaviour of queries which have been set to “Don’t include in Report Refresh”: As a standalone-query, they will behave as expected and not refresh. But once you reference them by a separate query or within an append-operation, they will refresh their results. So beware of this potential trap!:

Unexpected Behaviour Warning

 

Link to file: IncrementalLoadDaxUnionV2.zip

csv-sample data: https://www.dropbox.com/s/fovsg2xzclfqrzh/ImportData.csv?dl=0

So don’t forget to vote and stay queryious 😉

Comments (36) Write a comment

  1. I like this idea however it bloats the size of the Power BI data model so could be an issue if the tables get larger. I’m wondering if there is a way to achieve the same thing in Power Query (M) using smart logic around filter settings.

    Reply

  2. Imke

    SSAS already support table partitions, the SSAS engine in PowerBI support only one partition per table, so I would say it is a deliberate decision per Microsoft, they can easily have an options in PowerBI desktop to say load result to table 1, partition 1,2 etc

    Reply

  3. Please consider to add a hyperlink in the sample files whether it be Excel or Power BI Desktop, then readers can get access to the related blogs immediately.

    Reply

  4. Almost each of your blog has a sample file can be downloaded, I meant if you could add a hyperlink in each file then it is very easy for users to get back to your blogs.

    Reply

    • That’s a good idea – thank you for the suggestion !
      You can check it out in the updated file above 🙂

      Reply

      • yes, I saw it, but I don’t know why the hyperlink is not clickable. I’ve addressed this issue to Power BI forum.

        Reply

        • Thank you 🙂
          Yes, it’s a bit strange: When I click on the link a pop-up-window opens that shows the link again. When I click the link in there, the link will work.

          Reply

          • In my case the “hyperlink” seems just a plain text and not clickable.

  5. Pingback: Incremental load in PowerBI – The BIccountant

  6. Believe it or not, I struggled many times both on Power BI Desktop and Power BI Service but failed, However with the new version 2.42.4611.701 I just updated hours ago, it works smoothly now.

    Reply

  7. Hey,

    I have spend some time testing this, maybe I have misunderstood the function.
    The function I get, is that that the “first” / “old data” remains, and the new data is updated with every refresh?

    The function I am looking for is that the new data are added to the old data upon each refresh.
    I want to take at snapshot of, for example the stock balance from a SQL like this:

    Refresh 1 date: 1-17-2017 time: 08:00
    product / bal / date&time
    car / 2 / 1-17-2017 08:00

    Refresh 2 date: 1-18-2017 time: 08:00
    product / bal / date&time
    car / 2 / 1-17-2017 08:00 (this line is from the first refresh, and are no longer in the SQL at the time of the second refresh)
    car / 4 / 1-18-2017 08:00

    I this possible with the function you are describing on this site ?

    Reply

  8. Hello,
    thanks a lot for your answer!

    Im using the “On-premises data gateway”, and want to keep ass much as possible running without any files stored locally.

    As I understand the tow mentioned solutions, these requires a server / PC, and the files and scripting will be handled locally, right ?

    Im guessing there is no solution at the moment that meets my wishes.
    I think im going to create a aditonal table on the SQL server and accumulate the rows there.

    thanks a lot for your quick reply.

    best regards – os

    Reply

  9. Hi! Thanky you for this post. But it seems to me, that it doesn’t work with scheduled dataset updates on app.powerbi.com.
    Even if I click ‘update” button there (in web, I mean), dataset starts its fully update, ignoring preset “Include in report refresh = false” option in dataset query.

    Maybe I’ve done something wrong?

    Reply

      • Yikes. Just one question: is it possible to implement this when the query that has the new data only has 50 rows (it’s a web service call that fetches the last 50 played songs)?
        The historical data query would retain every different one, and not only the 50 records that get updated every hour.

        Reply

  10. Hi! Would it possible to combine old data and new data using dax if old data is import and new data is direct query?

    Many thank

    Regards,

    ramon

    Reply

  11. Hi Imke,
    Is it viable to implement this in a Bridge Table?

    My Use Case:
    Old Sales (non refreshable) + CurrentYearSales (refreshable) = FACT_AllSales, implementing this workaround so it doesn’t make a lot of requests to the API server everytime.

    DIM_SalesDocs that is N:N to Sales, so I created a Bridge Table with unique values on Key, from OldSales, CurrentYearSales and SalesDocs.
    When Bridge gets refreshed it requests all pages from FACT_OldSales, what makes it a problem again.

    So i thought about creating the Bridge using the same workaround.
    What can you tell me about this?

    Reply

  12. Hi Imke – Thanks for sharing – I was getting caught in the append trap forcing refreshes on the don’t ‘include in report refresh’ queries. I just wanted to add to the conversation that this technique does NOT work in Power BI Report Server. Even with the report saved with the refresh checkbox cleared – it still refreshes the data. Quite disappointing…

    Reply

Leave a Reply