Transform text with exceptions in Power BI and Power Query

Recently I picked up an interesting request to transform text with exceptions: Transform the words in a column to proper case, but keep certain keywords like in a defined list.

Problem: Transform text with exeptions

Say you have a list with specific terms that shouldn’t be proper cased like so:

And you want to proper case the following column:

Convert to proper case with exceptions

So I proper case each word that is not contained in the “KeepTable”, identify the elements in the “KeepTable” in a case insensitive way and transform them into the syntax that’s specified in the “KeepTable”.

Solution

The overall strategy is to convert everything to proper case first and then use a translation table to convert the keywords from the table back to their desired values. The following steps show how to do it:

First I split the column with the values to be proper cased into nested lists:

Split text of each row into a list

Then I proper case each element in the list (including the ones that should actually be excluded!):

Proper case every element within the list

Now I just have to translate the proper cased keywords from my “KeepTable” to their original values. Therefore I need a translation table like from my multiple replacements solution and use the technique from this blogpost to achieve the desired result:

Starting from the “KeepTable” I add the proper cased “From”-column like so:

Then reorder the columns, so that the “From”-column comes first:

Reorder columns, so that “From” comes first

Then I transform this to a list of list so that it can be used by the replacements function:

Transform to list of lists

This list of lists can now be used in the translation operation. Therefore I reference the step where I have proper cased the original column (green) and perform the translation (yellow):

Replacement function translates proper cased word back to the original values

The last step is to stitch back the list into a text-string:

Stitch the list back together into a text-string

 

Please check out this file to follow the solution along:  CapitalizeWithExeptionsUpload.zip

You will see that this a solution that is mostly achieved by using the UI and adding columns to the table.

Enjoy and stay queryious 😉

Memory efficient clustered running total in Power BI

Today I want to share a scenario where a running total calculation in the query editor saved a model that run out of memory when done with DAX:

Problem

The model couldn’t be refreshed and returned out of memory error with a calculated column in the fact table of over 20 Mio rows (from a csv-file). A running total should be calculated for each “JourneyID”, of which there were over 1 Mio in the table itself. This rose memory consumption during refresh by over 300 % – until it finally errored out:

Besetzung =
CALCULATE (
    SUM ( Fact[Entries] )
– SUM ( Fact[Exits] );
    FILTER (
        ALLEXCEPT ( Fact; Fact[JourneyID] );
Fact[StopId]
<= EARLIER ( Fact[StopId] )
    )
)

Solution

In the query editor, I grouped the fact-table by “JourneyID” and choose “All Rows”:

Group in the query editor to effectively partition your Fact-Table

Due to the fact that the fact table was sorted by the “JourneyID”, I could use GroupKind.Local and manually tweaked the resulting code like so :

Table.Group(#"Changed Type2", {"JourneyID"}, {{"All", each _, type table}}, GroupKind.Local)

Without this option, the calculation would have also errored out.

This effectively created table partitions, with one row for each “JourneyID”:

One table per JourneyID

I then added a custom column where I used my superfast M-code for running totals, referencing the column “All” with the partition in it:

Table.AddColumn(#"Grouped Rows", "Custom", each fnRT([All], "JourneyID", “Movements”))

Function Code

This M-code for running totals is amazingly fast (within M-dimensions 😉 ) and particularly plays its strengths on large sections for aggregation.

Performance

Memory consumption rose from around 1,700 MB to 2,000 MB, all very stable. Of course, refresh duration rose (by 150 % for that table). But for that use case, it was an acceptable price to pay.

Enjoy & stay queryious 😉

Unravel cumulative totals to their initial elements in Power BI and Power Query

Recently I came across an interesting request where someone wanted to un-cumulate their quarterly YTD-figures (green) into their single quarters values (red) like so (“Unravel cumulative totals”):

Task

Retrieve every Quarters Amount from the Quarter To Date values (“YAmount”)

Method

To retrieve this value, one would have to start with the first value in the year. This is also the value of the first quarter, but for the 2nd quarter, one would have to deduct the value of the first quarter from the cumulative value of the 2nd quarter. So basically retrieving the previous cumulative row and deduct it from the current cumulative row. Do this for every row, unless it’s the start of the year or belongs to a different account code in this example:

Grab previous cumulative values, but only within the valid ranges

(Although for the data given in the sample, it would be sufficient to just take the year as a discriminator, but to be on the save side, I would suggest to include the different accounts as well)

Solution

Fortunately I’ve already written a function to grab the previous rows with lots of bells and whistles, that also includes the option to include grouping parameters. So if you copy the function code to the advanced editor of an empty query and name this “fnGetPreviousRow”, you just have to add a new step with the following code:

fnGetPreviousRow(#"Changed Type", null, {"YAmount"}, {"Account code", "Year"}, null, null)

Add a step to call this function (don’t go via “Add a column” here !!)

Call function (Previous stepname: red, Amount column: yellow, Grouping columns: green)

 

This will retrieve the previous row from the cumulative “YAmount” within every combination of “Account code” & “Year” and fill in nulls in the respective first rows. So when you then add another column that subtracts the new Value from the CumTotal, you will retrieve nulls for the first rows. This is not the desired outcome and I suggest to go back to the previous step -> check the “YAmount.Prev”-column and replace “null” by “0”. After that the calculation returns the correct result:

Result with single quarterly values (“Unravel cumulative totals”)

File to download

You can download the file to follow the steps:  Unravel cumulative or running totals

Enjoy & stay qeryious 😉

Comparing Table.AlternateRows with List.Alternate in Power BI and Power Query

I must admit that I had more than one unsuccessful attempt to try to fully understand how the List.Alternate-function works. What helped me at the end, was the function Table.AlternateRows. It pretends to be similar to List.Alternate, but holds some surprises that I will uncover in this blogpost:

How Table.Alternate works

Say I have the table below and want to retrieve just the letters that appear in every 2nd row:

Table.Alternate – Remove every other row

I find the dialogue that appears very helpful and intuitive:

It clearly is a removal operation and here I want to remove the 1st row from my table (“1”), and just one at a time. Also want to keep just one row (“A”) before the next one is removed (“2”)  and so on.

In the formula bar, this step will be translated into this M-code:

Table.AlternateRows(Source,0,1,1)

If you would have expected it to be translated to: Table.AlternateRows(Source,1,1,1) instead, you might have forgotten that the M-language in Power Query starts to count at 0, so the first row to remove is expressed by the 0 here.

List.Alternate should work similar

So if my input is a list instead of a table like below, I should expect a similar result than the sample above if I tweak the code a bit, shouldn’t I?

List.Alternate – produces a different result

But hey: What’s wrong here? Not a single element has been removed from the list !!

So let’s have a look into the documentation:

List.Alternate – Function Documentation

and compare it with the Table.AlternateRows documentation:

Table.AlternateRows – Function Documentation

Hm – at least we have one match here: The “offset” parameter is included in both functions. But it is the first (number) parameter in the Table-function and is at the last position in the List-function. So let’s move it around then like this:

List.Alternate – same result with different parameter order

There we are 🙂

So the order of the function parameters is different here. Also the other parameter names are different and their description. I find them much easier to understand in the Table function and of course, the function dialogue there helps to understand what shall happen as well.

Enjoy & stay queryious 😉

Performance tip for List.Generate (1): Buffer your tables in Power BI and Power Query

Lately I was working on a fairly advanced allocation algorithm on large data which forced me to search for different tricks to improve performance than those that you can find on my site here already.

Background

I was using List.Generate to check for every month in my table, if there was enough free capacity on a platform to start new wells. As every well had a certain production scheme (producing different amounts for a certain length of time), I first had to check the total production amount of active wells before I could determine the spare capacity for a new month. So I had to look into every active well, grab the capacity of the new month and add it up.

Therefore I’ve stored the active production schemes in one table in my List.Generate-record. That lead to an exponentially decreasing performance unfortunately.

Solution to improve performance of List.Generate

Buffering my tables in the “next”-function reduced the query duration by almost 70% !

Although a Table.Buffer or List.Buffer is always high on my list when it comes to performance issues, I was fairly surprised to see that behaviour here: As List.Generate returns the last element of its list as an argument for the next step, I was always assuming that this would be cached (and that was the reason because List.Generate performs recursive operations faster than the native recursion in M). Also, I had just referenced that table once ane in such a case, a buffer would normally not have come into my mind. (But desperation sometimes leads to unexpected actions …)

I also buffered a table that had just been referenced within the current record (and not recursively) and this improved performance as well. (Although in that case, the tables has been referenced multiple times within the current record). But this buffer didn’t have such a big impact on performance than the one on the table that was referenced by the recursive action.

Code

Here is some pseudo-code illustrating the general principle:

Solution with buffers:

How to improve performance of List.Generate: Use Table.Buffer

 

Is that new to you or have you made the same experience? Which grades of performance improvements did you achieve with this method? Please let me know in the comments!

Enjoy & stay queryious 😉

How to do a real VLOOKUP (false) in Power Query or Power BI

When you merge tables with distinct keys in Power Query you will get the same result than the VLOOKUP-function in Excel returns (if this is new to you, check out this article for example: https://www.myonlinetraininghub.com/excel-power-query-vlookup) .

But how to retrieve only the result of the first row, if the lookup-table has multiple rows with the same key?

 

Background

Say you have a dimension table for products:

Product table with one row per Product

 

 

 

 

and a transaction table with multiple entries per product:

Transactions table with multiple rows per Product

 

 

 

 

 

The task is to create 2 additional columns in your dimension table. One to show the first price at which the product has been sold and the other one the corresponding first date:

Select only first rows per Product

If you merge the transactions to the dimension table and expand it, you will end up with as much rows in the dimension table as there are in transaction table.

Problem

So how to retrieve only the elements of the first row of the matching tables? I’ll show you 2 different methods:

Solution 1 – Tweak the aggregation code

This is very quick to implement if you just want to return one or a few columns from the lookup-table: In the dialogue where you usually expand the columns, check “Aggregate” instead and click on one of the suggested aggregations for each column that I’m interested in (I simply ignore for a moment that these are not the aggregations that I actually need):

Choose one (false) aggregate per column

 

 

 

 

 

 

Now I tweak the code in the formula bar like so:

Tweaking Code for real VLOOKUP

Replacing the default aggregations by what I need (in red: List.First) and adjusting the column names directly in that command (in green: just to save one manual step later).

To avoid long query durations on large tables, you can transform the key column of the dimension table to a real key column, like Chris Webb has described here: https://blog.crossjoin.co.uk/2018/03/16/improving-the-performance-of-aggregation-after-a-merge-in-power-bi-and-excel-power-query-gettransform/

Solution 2 – Add a column that selects the whole desired row

If you want to retrieve many more columns from your lookup table, the method above can become a bit tedious. Then it might be easier to add a column, that grabs the whole first row instead: Table.First would do that job:

Add a column to retrieve the full first (or last) row

Then simply expand out all fields that you need.

Bonus

You can use many different selection operations with this technique: So List.Last or Table.Last would give you the latest prices for example. This would actually be a more realistic use case here … and is the reason why I didn’t solve the original problem with just removing duplicates 😉 .

Enjoy and stay queryious 😉