If you want to collect your regularly loaded data without overwriting it or create a load log that writes the load activites with a timestamp into a table, you need to create a query that adds new lines to it’s own latest version.
In both cases you add a column that returns the current day and time using DateTime.LocalNow()
Load Log the PowerBI-way
Then using Power BI it’s fairly easy: Use the R-extension to perform an append query to the external txt or csv-file like described in this post (incremental load):
write.table(trim(dataset), file=”your filepath & filename.txt”, sep = “\t”, row.names = FALSE, append = TRUE, column.names=FALSE)
Load Log the Power Query-way
In Excel’s Power Query it’s a bit trickier, because you cannot export your queries and therefore need to create the consolidated table in your workbook itself. And it must be a self-referencing query or to be more precise: a self-sourcing query: A query that takes its latest output as its new input, then appends the new data and overwrites the old result with the new one.
How do we deal with this? When we write the query we don’t have the result yet to be referenced?
To solve this henn-&egg challenge we just take a dummy query as the first input (here: duplicate our ImportData (1)) and rename it “Result” (2). Then add the new figures (Append (3) – ImportData (4)) and delete the “Added Custom”-step (5) before loading it to a table.
With this we’ve created the table that we’re going to reference from now on. To do so we edit this query and replace the reference to the “ImportData” by the reference to our new “Result:
If you load this query now, there will be the data from your original source 3 times. My suggestion is that you delete all rows except the headers and start your import procedure fresh from here instead of manually selecting the rows that you want to keep.
Error-proofing collecting queries
If you’re not aiming for the log but for a daily collection routine, you might want to provide for unintended double-refreshes. Yes, you could delete the additional data but this is error-prone. So you reformat your “DateTime”-column from the beginning as “date” only. Then check all your columns and “Remove Duplicates”. This will only keep one load per day (provided the loads where the same).
This technique will not throw a circular-dependency-error, because it references the table in the sheet and not the query itself.ME_PQCreateLoadHistoryV2.xlsx
Enjoy & stay queryious 🙂
Edit 25-June-2016: Just came across Hilma Buchta’s blogpost from last year that describes exactly the same technique!
Edit 19-Nov-2016: For PowerBI, you can find an improved version without the need to use R here.
Please vote for the Power BI feature request for incremental load here!
Edit 03-Jan-2017: If you struggle with applying the self-referencing-table-technique to your model, here is a much better explanation: http://exceleratorbi.com.au/cleansing-data-power-query/