List.SelectPositions in Power BI and Power Query

With this new custom function “List.SelectPositions” you can easily select items from a list by just passing a list of their positions within it as the parameter.

What it does

Say you have a list with numbers {1..5} and want to select the 1st, 4th and 5th element from it. Then you can pass these positions to the function as another list: {0, 3, 4}.

ListSelectPositions({1..5}, {0, 3, 4}) will return: {1,4,5}

You see that I’ve decided to follow the zero-based counting principle here, that you find throughout M in the query editor. If you don’t like that, you can use the optional 3rd parameter to let it start to count from 1 instead:

ListSelectPositions({1..5}, {1, 4, 5}, 1) will return {1, 4, 5}

But if you have entered positions that don’t exist, the function will return an error in their positions by default:

ListSelectPositions({1..5}, {1, 4, 5}) will return {2, 5, Error}

because there is no 6th element (you’ve omitted the 3rd parameter that allows you to start counting with 1).

But you can change this behaviour as well through the last optional 4th parameter: Setting it to 0 will fill the missing positions with null like this:

ListSelectPositions({1..5}, {1, 4, 5}, null, 0) will return {2, 5, null}

and setting it to 1 will eliminate it and shorten the list like this:

ListSelectPositions({1..5}, {1, 4, 5}, null, 1) will return {2, 5}

These additional error-handling-options of the 4th parameters are useful for dealing with badly formatted data and if you want to learn more about it, just let me know in the comments so that I can prioritize it.

Function code

Read more

Table.Group: Exploring the 5th element in Power BI and Power Query

In this post I’ll show you the magic stuff you can do with the 5th parameter (the optional comparer function) of the Table.Group M-function in Power BI and Power Query:

Table.Group parameters

  1. table as table,
  2. key as any,
  3. aggregatedColumns as list,
  4. optional groupKind as nullable number,
  5. optional comparer as nullable function

If you’re not familiar with the 4th parameter (groupKind) already, I strongly recommend to read Chris Webb’s article, as we will build on its knowledge here.

Another aspect worth mentioning for the modus in GroupKind.local is the performance aspect: It runs MUCH faster for large datasets than the default-setting. So if you are sure that your data will always be sorted accordingly, you can speed up your grouping-operations considerably. That means: Your data has to be sorted correctly by default. At least for my tests, you would loose the performance-gain once you’d sort your table by an explicit step before.

You can find an overview of comparer functions here.

Case insensitive grouping

Imagine there was a twist in Chris’ dataset and it would look like so:

Table.Group – Modified Source Data

We would probably not be happy with these results then:

Table.Group Problem with Case Sensitivity

Because M is by default case sensitive, we get more groups than we want. Let’s try Comparer.OrdinalIgnoreCase to the rescue then:

Pretty neat, isn’t it 😉  (You can use that comparer in other functions as well, see here for text- and list operations)

Something like this was what I’ve showed Huang Caiguang the other day, who asked me what the 5th parameter of this function was about (or so I understood). He then sent me a link to one of his articles, which demanded a good 2 hours for me to digest and understand: We can also use custom functions to create all different sorts of grouping behaviours here. These are my 2 favourites:

Read more

Import text from pdf files in Power BI

While Power BI will soon provide functions to import tables from pdf-files, there might be occasions when you actually need to import text from pdf files (in unstructured form). With a little help from R in Power BI you can do exactly that. (And don’t worry: No need to learn R here: The necessary R-code is already included in my function below. All you need is to have R installed your machine). Please also note that at the time of writing the refresh of these queries in the service is only supported with the personal gateway and not with the enterprise version.

Prerequisites

You can use the function below just like a normal M-function, just pass the (URL- or file-) path to it. All you have to take care of is that a instance of R is running on your machine. If this is new to you, check out Ruth Pozuelo’s video showing all the necessary steps: How to install R for Power BI

There is one package required: pdftools. The video above also shows how to install it.

Function

Import text from PDF files:

You can try calling this function for a pdf-file from the internet like the M formula language specification like this:

ImportPdfText("http://download.microsoft.com/download/8/1/A/81A62C9B-04D5-4B6D-B162-D28E4D848552/Power%20Query%20Formula%20Language%20Specification%20(October%202016).pdf")

If you want to import local files from your computer, just paste the full file-path instead of the URL. You don’t have to care about the direction of the slashes, both versions (forward and backwards) are accepted.

How to use

The script will return a table with one row for each page in the pdf-file by default. But it has an optional 2nd parameter that will return one row per pdf-text-line instead, if you put 1 into it. A page index and a row index will help navigating the result.

The 3rd parameter is an optional owner password for the pdf and the 4th the optional user password. If you’re using them, you have to enter null for the previous optional parameters. The following example shows how to use a user password while leaving the others “empty”:

ImportPdfText("MyPdfPath",null, null, "MyPassword")

Enjoy & stay queryious 🙂

Remove repeating characters from a string in Power BI and Power Query

Repeating spaces often cause problems when cleaning up your data. My new function “Text.RemoveRepeatingCharacters” can come to the rescue here.

Imagine you have a table like this:

Challenge

To further work with this data, it would often be best if there was just one space between the words and not many.

The following function will do this for you:

Function Text.RemoveRepeatingCharacters

How to use

It takes 2 arguments: The Text/String and the Delimiter. The delimiter is an optional argument and by default set to space ” “. So you can leave it blank if that’s fine for you or enter a different value (like “,” for a comma) if needed.

How it works

It splits the text up into a list using the delimiter from the 2nd parameter (4: TextToList). Where one delimiter directly follows another, the element in the list will be empty. The next step (5: FilterList) then filters the list and removes these empty fields. In the last step (6: Result) the remaining (non-empty) fields will be reassembled, using the delimiter again. That way, just one delimiter will be left.

Edit 28-Jan-2018: While searching the web to see if one of my next blogpost-topics have already been published somewhere else already, I came across Ivan Bond’s blogpost who used this same technique over 2 years ago here: https://bondarenkoivan.wordpress.com/2015/10/11/transform-table-column-using-own-function-in-power-query/ . It’s a very good read and you will also learn how to use a function like this to transform an existing column instead of adding a new one to perform the operation like in my example above, so don’t miss it.

Enjoy & stay queryious 🙂