Incremental load in PowerBI

Edit 2017-Jan-13: You’ll find a more recent version that doesn’t use hacks for incremental load here (but that also won’t work in the service): http://www.thebiccountant.com/2017/01/11/incremental-load-in-powerbi-using-dax-union/

If you want to perform incremental load of your data in PowerBI, you should vote for it here.

Please vote before you read on to see which hacky workaround to use until this is implemented.

This workaround is a compromise that’s mainly suited for use cases like:
– performing extensive transformations on imported data only once
– saving data in the model that will not be available at the source later on (like many web-imports)

It works on the principle of self-referencing-queries like I’ve described here in combination with the hack of your current PBI-datamodel which I’ve described here (that’s fully automated now). So it will just work in the Desktop version and not in the service (so no scheduled refresh possible) !!.

So the data that should “sit in the model” and just be merged with new incoming data will actually be imported into the model again. Just not from it’s original source, but instead from itself. As this is often slower than to import from the original source, you will have to look closely into your specific need and decide if this makes sense for you.

These are the steps to perform:

1) Create 1st query that imports and transforms the first chunk of data that shall not be refreshed afterwards and load it to the data model
2) Go back to the query designer and create a 2nd query that re-imports 1st query using the function below
3) Create a 3rd query that appends query2 to query1 and load it to the model
4) Go back to the query designer and choose 3d query to implement a filter or remove-duplicates step that avoids importing data that has already been imported
5) Change the source in the 2nd query from 1st to 3rd query (as you now want to reference everything that has been imported so far).
6) Disable load of the first two queries

Function PBI.Model: It connects to your current PowerBI Model fully automatically, you will just be prompted to confirm your connection to the SSAS-instance everytime you reopen the file.

(optional Dax_Query as text) =>

let
 Source_Port = Folder.Files("C:\Users"),
 msmdsrv = Table.SelectRows(Source_Port, each [Name] = "msmdsrv.port.txt"),
 Select_Last_Instance_Active = Table.FirstN(msmdsrv,1),
 #"Combined Binaries" = Binary.Combine(Select_Last_Instance_Active[Content]),
 #"Imported Text" = Table.FromColumns({Lines.FromBinary(#"Combined Binaries",null,null,1252)}),
 PortID = Table.TransformColumns(#"Imported Text",{},Text.Clean){0}[Column1],
 Database = AnalysisServices.Database("localhost: "&PortID, "", [Query="select * from $SYSTEM.DISCOVER_PROPERTIES"]){0}[Value],
 PBI.Model = if Dax_Query <> "" and Dax_Query <> null then AnalysisServices.Database("localhost: "&PortID, Database,[Query=Dax_Query]) else AnalysisServices.Database("localhost: "&PortID, Database)[Data]{0}[Data]{0}
 in
 PBI.Model

The underlying technique on how to dynamically retrieve the PortID via a file on your local drive is described here.

Actually, I thought that one could also use a different method, taking advantage of the disabling of  “Include in Report Refresh”. But while this option will stop the re-load of queries which are directly loaded to the data model, the data will actually be refreshed once you reference this query by another query (which would normally be your append-operation).

RealtimeCharts-1.zip

 

Yes, this is pretty hacky, so please vote & stay queryious 🙂

Short video:

Comments (19) Write a comment

  1. Hi Imke. I’m unable to figure out in the steps given to achieve incremental data load, which query imports the delta load from database. You have mentioned 2nd query just re-imports 1st query’s data and 3rd query just appends query2 to query1.

    Reply

    • Yes, 2nd query starts with importing 1st query. But in step 5 you need to change that source to 3rd query. That’s the crux.
      You cannot start with defining it like this, because at time of creating query 2, there is no query3. On top of that: You need to load query3 to the model once – otherwise the self-reference to the model will not find it.

      Might put together a video in the evening showing the steps. But you can check the attached file. It’s now updated with figures from this morning. So everytime the URL will release new figures you can catch them in PBI and append those data.

      Reply

      • A video would be immensely helpful to a lot of people who deal with big sized Corporate databases. Thanks a lot.

        Reply

  2. To give a standard corporate database example, let us say I’m querying SQL Server data tables which are huge in size and importing entire data is virtually impossible. Therefore, the date filter needs to be given for the same. Let us assume that the data requirement is for the last 5 years and refreshing entire 5 years data is highly time consuming. So, as per your earlier posts, first 4 years data can be imported as a one time import and saved as CSV locally using R Script and for the 5th Year a new query for extracting latest data and followed with an append operation. Now, the question arises on how can we make the same fully automated in which the latest data load query is aware of the existing CSV data (let’s say on last date) and extracts only the data which is not available in the CSV file.

    Reply

  3. Hello,

    I’m totally new to power BI. I’m working on small experiment in the lab. I have data source from googlesheet connected to powerBI. This source of data is daily updated, unfortunately powerBI doesn’t do incremental data load, as it seems to show only the last data available data on data source. I’ve tried your video, but does not seem to work. Honestly, i have no idea, i just copied what you did. Can you elaborate more on step 2 above ? which function that you refer to ?

    Reply

    • Hello John,
      you’ve picked quite a demanding topic for a beginner 😉
      The function I’m referring to in step 2 is the code that you see at the bottom of this blogpost, beginning with “(optional Dax_Query as text) =>”
      You should be able to see it in the download-file as well.

      Reply

      • Hello Admin,

        I did the second step, but it does not load from googlesheet. Did i miss anything ? 🙁

        Reply

        • I’m sorry, but this can be a bit demanding for beginners.
          Hard to tell from the distance what’s actually going wrong.

          Reply

  4. I’m confused…

    I guess when you say:

    1) Create 1st query that imports and transforms the first chunk of data that **shall not be refreshed afterwards** and load it to the data model

    You actually mean:

    Create 1st query that **initially** imports and transforms the first chunk of data that shall not be refreshed afterwards and load it to the data model and will subsequently be used to import new data incrementally.

    No?

    Also, why do you say:

    6) Disable load of the first two queries

    Surely, the first query should not be disabled, otherwise, where does the incremental data come from? Isn’t the second query the only one to be disabled because it is holding the first chunk, loaded by the initial run of the first query?

    Reply

    • Hi CB,
      this is a method to retrieve data from the web where old data will be overwritten by new one. So if one would refresh the query, the already imported data would be overwritten in Power BI as well. If you follow my setup above (and the screencast) :
      – the first query will fetch the current data from the web (but will not be loaded to the data model, because its contents will go through the 3rd query into the data model)
      – the second query will retrieve the accumulated old data with the hack to the data model (3rd query) (and also not be loaded to the data model like the 1st query)
      – the third query will combine the first and second and remove duplicates, in case the current web query still holds data that has been downloaded previously already.

      The trick here is that the hack in the second query, where the result of the previous load (that is stored in the pbix) will be retrieved “from outside”. So actually, none of the queries can hold any data in Power BI. Just the data model can “hold” data in PowerBI and from there we fetch it via the hack. Disabling load or even disabling refresh (which we haven’t done here) wouldn’t actually work – the available features here are a bit misleading unfortunately: https://community.powerbi.com/t5/Issues/Bug-in-quot-Don-t-include-in-Report-Refresh-quot/idi-p/111353

      Hope this helps? Cheers – Imke

      Reply

  5. Hi Imke,

    Thanks a lot for making this.

    Unfortunatly I’m running into an issue when trying to schedule refresh on the report server as the source is not supported. Changed the web import to SAP HANA, but the scheduled refresh is still not supporting your function. Am I missing something or is this not possible?

    Reply

    • Yes, this method only works for Power BI Desktop unfortunately.
      Let me see if I can get some help to get this going for the Report Server as well.
      /Imke

      Reply

  6. Hello,

    I did something similar (same steps) but I want to publish my report and schedule refresh.

    When I tried to do it I got the following message “You cannot schedule refresh for this dataset because it gets data from sources that currently don’t support refresh.”

    Is there any workaround for this issue?

    Thanks a lot!

    Reply

    • Unfortunately this only works in the Desktop-Version and not in the service, as it hacks the currently open pbix-file. (Sorry, I should have made it clearer – added that now in the post).
      Microsoft will offer proper incremental load in its Power BI Premium offer.
      If your bottleneck is the sheer amount of data, the DAX-Union-method could be an alternative: http://www.thebiccountant.com/2017/01/11/incremental-load-in-powerbi-using-dax-union/
      But if it is complex transformations who eat up performance, then creating intermediate tables and exporting them (via R) as a sort of staging solution would be possible. But R script are currently also not supported in the service, so no help for you at the moment.

      Reply

  7. Hey Imke,
    I tried to reproduce this because it is exacly what i need.
    My face when i remove duplicates based on DateTime values, but my duplicates aren’t exactly duplicates apparently…
    Look:
    2019-10-29T15:01:42.2980000 and
    2019-10-29T15:01:42.2966670

    This is really weird, they should be duplicates… my WebImport query fetches the last 50 played songs…
    What should i do? DateTime is the only different option for me, i think.

    Regards,
    André P.

    Reply

  8. Hello,

    I am following the process but when I add this functiun
    = Table.FromRows(Table.ToRows(Source), Table.ColumnNames(WebImport))

    I have this Error :
    Formula.Firewall: Query ‘Query1’ (step ‘Added Custom’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.

    I did it first time for test with a small file it was ok.
    But this time with a big file I have this error.
    Can you help me please.

    Thx

    Reply

Leave a Reply