How to store tables longer than 1,1 Mio rows in an Excel-sheet using Power Query and JSON

If you are working Power Query, you might come into a situation where you would like to make tables accessible from outside that file, that are longer than the 1,1 Mio rows who fit into an Excel-sheet. You can do that using the JSON-format and compression. That way you create a table in Excel that contains a JSON-text string, that can then be read and decoded again by Excel (or Power BI), using Power Query (or Get&Transform in Excel 2016).

You can download the file where a table with 2 Mio rows is compressed into a table with just 229 rows here: JSON_BreakRefreshChain4.xlsx

Simple idea:

Convert your table into JSON using the Json.FromValue-function. If we could export that JSON-file from here, we would be done already. But I haven’t figured out yet, if/how one could export or copy that from within the query-editor. So if you have any idea about this, please share.

Workaround:

Read more

4_sliceabledurationanalyzer

Dynamic duration calculation using DAX in Power BI and Power Pivot

While it is fairly easy to calculate the difference between 2 dates in DAX using DATEDIFF, it is a bit more demanding if you want to exclude weekends and holidays or filter the duration on certain date-intervals, so only get a part of it. Also if you want to return on date-time-level instead of only counting net-workdays.This is where this new technique for dynamic duration calculation can come in handy.

We can use the basic technique that I’ve described here and modify it by adding 2 columns to the calculated table:

  1. Duration per day on a Date-Time-level
  2. Marker-column if weekday or not (this assumes that you have a column in your date-table which indicates if the day shall be considered as weekday or not)

1_duration_calculation

The duration-calculation needs to handle the cases where only parts of the day are to be counted: If the event starts and ends at the same day, the difference between those figures has to be taken. If on the other hand, the event spans multiple days, for the start-day the time until the end of the day has to be calculated while for the end-days the time from the beginning of the day is the right one. The other days count as full days with 1. Hence these 4 cases.

Let’s have a final look at our simple measures:

Read more

1datamodel

Analyzing events with a duration in DAX – further simplification

Alberto Ferrari has recently published a very smart concept how to analyze events with a duration in DAX, which you should read here, if you haven’t done yet. It simplifies the necessary DAX-syntax and speeds up the calculations as well. My following approach simplifies the DAX-syntax even more, but it comes with a (very tiny) premium for performance and will also increase the file size a bit. So you have the choice 🙂

I’m transforming the calculated table into a “real” fact-table which enables me to use simple 1:n-relations to the other (now) dimension-tables:

T1datamodel

The formula starts from Alberto’s first version, but uses the Date instead of the DateKey (yellow). Then there will be some columns added which we need for following calculations (green). Then you see that the DailyProductionValue is calculated at a different place and also has a much simpler syntax. At last there are some other columns for further calculations: “Shipped” and “Ordered” will create the bridge for the “missing” connections to the date-table:

Read more

9_checkeditmeasures

Import Measures from different Tabular Models using DAX Editor

Marco Russo has created a great tool for SSAS tabular that lets you edit measure definitions (which you should read here first if you haven’t done yet).

In this article I’ll show you how you can use it to import multiple measures from different tabular models into your current model.

The way the DAX-editor works is that it exports the existing measures from your model as a text file and imports them back after you’ve done your transformations. My technique will add the measures from the other model automatically to the existing measures so that both can be loaded back into your current model. In addition to that, you will have a UI with a process that guides you through the necessary steps that come with a task like that, which are:

  • select only those measures that you actually need
  • check references to existing measures and columns from the import model and manage their handling
  • allocate the tables into which these measures are going to be imported

This will all be done in a Power Query-powered Excel workbook that you can download here:   DaxEditorEditor_Final2.xlsm

These are the steps:

Read more

Incremental load in PowerBI

If you want to perform incremental load of your data in PowerBI, you should vote for it here.

Please vote before you read on to see which hacky workaround to use until this is implemented.

This workaround is a compromise that’s mainly suited for use cases like:
– performing extensive transformations on imported data only once
– saving data in the model that will not be available at the source later on (like many web-imports)

It works on the principle of self-referencing-queries like I’ve described here in combination with the hack of your current PBI-datamodel which I’ve described here (that’s fully automated now).

So the data that should “sit in the model” and just be merged with new incoming data will actually be imported into the model again. Just not from it’s original source, but instead from itself. As this is often slower than to import from the original source, you will have to look closely into your specific need and decide if this makes sense for you.

These are the steps to perform:

1) Create 1st query that imports and transforms the first chunk of data that shall not be refreshed afterwards and load it to the data model
2) Go back to the query designer and create a 2nd query that re-imports 1st query using the function below
3) Create a 3rd query that appends query2 to query1 and load it to the model
4) Go back to the query designer and choose 1st or 3d query to implement a filter or remove-duplicates step that avoids importing data that has already been imported
5) Change the source in the 2nd query from query1 to 3rd query (as you now want to reference everything that has been imported so far).
6) Disable load of the first two queries

Read more

Speed up the PowerBI and Power Query design process

When querying large databases with Power BI or Power Query, the design of the query itself sometimes causes long waiting periods because of background loading activities. In order to speed up this process, I’m adding a filter in my tables as one of the first steps, which I can toggle on or off using a central switch. The toggle is a query that will be referenced by an additional line of code:

FilterToggle = if DBFilterToggle=1 then LastQueryStep else FilteredRowStep

filtercode2

Where DBFilterToggle is a query itself with just one value: 0 if I want to have the filter active or 1 if I want to deactivate the filter and see all the data instead.

The cool thing about this method is that the following code will work seamlessly and we can use this toggle in multiple queries at the same time: Just filter your other long tables to your desire and refer them to DBFilterToggle.

This enables me to work with a very few data which will be delivered blazingly fast (due to query folding) and move on with my design very quickly or (often more importantly) during debugging where you need to move through multiple steps quickly in order to find the errors.

I’m not sure, if this is my “invention” or I have seen this before, but so far couldn’t find the source. So please post the source in the comments if you find it.

Enjoy & stay queryious 🙂

How to tame Case Sensitivity in Power Query and PowerBI

Have you heard of “Comparer.OrdinalIgnoreCase”?

If the answer is “No” (or something similar), you will be very delighted to hear that this is your key to disable case sensitivity in comparing operations.

So if you want to check if a substring like “car” is contained somewhere, you can write:

Text.Contains(MyText, "car", Comparer.OrdinalIgnoreCase)

and the expression will return true for all cases of car like: CAR, Car, caR… .

The official documentation looks like this:

Read more

Easy Profit and Loss and other (account) scheme reports in Power BI and Power Pivot using DAX

This is about an easy way to create typical finance reports like Profit and Loss using DAX that (unlike all other solutions I’ve come across so far) can be handled with very basic knowledge of this language like this:

Image1

The trick

The trick that makes my solution so easy lies in the fact that it requires no aggregation functions of the output-mediums like:

  • pivot-tables: who struggle with asymetric logic and are not available in Power BI so far
  • cubefunctions: who are not available in PowerBI so far

So we have to build the details as well as all aggregations into the solution as it is and don’t rely on/use any aggregation functions (This means for Excel: We have to turn off subtotals as well as totals in our pivot tables. It means for Power BI: Hurray! Finally a solution where the lack of pivot-tables doesn’t matter).

How to

The aim is to create a table/matrix with (account) details and aggregations into the rows and different slices of time-Intervalls or comparisons into the column sections. As for the columns, this will be covered by measures like [Actuals], [Budget], [PreviousPeriods], [Differences in all shapes…]. And – as the values in the columns should be the same – I’d prefer to use only one measure per column – that is fully sliceable and works on all (sub-) totals of course. … Ok – so some dreams later I found it:

MyMagicMeasure := CALCULATE([StandardMeasure], AccountsAllocation)

So you just wrap simple measures like Act=SUM(Fact[Amount]), Plan=SUM(Plan[Amount]), DiffPlan_Act=[Plan]-[Act] … into the CALCULATE together with the bridge-table as the filter-argument:

This is the many2many-technique in it’s simplest form (PostFromMarcoRusso). It all goes via simple aggregation on all accounts found in the filter context:

Image4

Our  bridge-table “AccountsAllocation” consists of one account number per simple account and has multiple rows for the (sub-)totals – being all accounts that belong to them:

Image5

The ConsKey stands for the row in our report (1) and the AccountKey_ holds the account numbers that are going to be aggregated (many (for the sub-totals) and 1 for the account-rows). So all we need is this unique and simple aggregation on AccountKey for every row in the report – with a filter from the Reports-table via our bridge table to the DimAccounts, who then filters our FactTables: 1 -> many -> 1 -> many.

Read more

Advanced type detection in Power BI and Power Query: Table.ClassifyMixedColumnTypes

This is not a proper blogpost, just a quick share of a function I’ve created today which I think will be very useful for others as well:

Automatic type detection will assign only one type to a column, but sometimes there are multiple types in it and then the type “any” will be applied, which means no type at all. In these cases, often the type of the individual elements/rows in those columns are the key to the necessary ETL-transformation operations to be performed. So you would like to be able to write statements like this:

Table.AddColumn(PreviousStep, “Event”, each if Type.Is(Value.Type([MyColumn]), type text) then ThisEvent else AnotherEvent)

(reading: Add a column with a new Event that depends on if the type of the current row in MyColumn is text: then do ThisEvent else do AnotherEvent)

My advanced type detection function will identify and allocate different data types. It will try to apply a couple of type conversion operations and in case of success, apply the type to the individual cell/record field and in case of failure move on with the next tries. If none of the type conversions succeeds, it will return the original state (any). It takes the name of your table and a list of the column names on which it should be applied as the input-parameters.

(Table, ListOfColumnNames) =>
let
Table.ClassifyMixedColumnTypes.m = Table.TransformColumns(Table, List.Transform(ListOfColumnNames, each {_, each try Date.From(_) otherwise try Number.From(_) otherwise try Text.From(_) otherwise _}))
in
Table.ClassifyMixedColumnTypes.m

So this function will apply different data types (expand as you like/need) within one column on a row-by-row basis – which is what I’ve been looking for quite a while 🙂

Thank you so much Bill Szysz for showing me how to use the List.Transform-trick in order to bulk-apply transformations!

If you’re as lazy as me, you could be tempted to pass “Table.ColumnNames” to the “ListOfColumnNames”-parameter – but this might slow your query down! (Guessing – not much practical experience gained yet)

A warning at the end, that this is of course error-prone – as some strings like “3.2” for example are not unambiguous and can get converted as a date although in the specific context should be numbers (and vice versa). So you should check properly and – if needed – perform some additional transformations before in order to guarantee the correct outcome. But at least in my case it did exactly what was needed.

Please share your thoughts/experiences with this function & stay queryious 🙂

Step2

How to analyse M-functions step-by-step?

This is a quick tutorial on how to analyse M-functions step-by-step.

Queries, that have been transformed into a function (1) can be invoked in the query-editor (2) and they will then return their output for you to check (3). But they will not show you how they did it. In the query-editor where you normally see the single steps and their individual output if you click on them, you will just see one line for the function and one for the invocation (4):

Step1

So how can we make them reveal all their secrets for us to see the quickest way?

Yes, we could edit the code and replace every parameter that has been passed. But this can get a bit tedious, especially if you haven’t written the code by yourself and don’t know where and how often the parameters are used. A very quick and safe way is to replicate the parameters within the let-expression by copy-&pasting like this:

AnalyzeM-Functions

That way you just define them once and don’t have to care about how often they are used within the following query. And it is very quick to do as you can just copy the parameter-expression and comma-separate your inputs.

Step2

I bet you want to check this out on Chris Webb’s latest blogpost here: DynamicDateSelections.zip

Enjoy & stay queryious 🙂