Extract pattern string and numbers from text using List.Accumulate in Power Query

A typical task when cleaning data is to extract substrings from string that follow a certain pattern. In this post I’m going to describe a method that uses the List.Accumulate function for it. Extract a pattern string.

Task

I have to extract a payroll key from a description field that starts with 8 number, followed by an “-” and another number.

aölsfdk0125-fds  da12345678-0asdf

So I’m after the 12345678-0.

Plan

I plan to approach this by

  1. stepping through the string and check each character if it is valid.
  2. If so, store it
  3. and if not, forget it. And in that case, also forget the stored values so far so the collection starts from scratch.
  4. Then if a series of matches builds up, I will have to check the count of the stored values to not exceed the length of my target pattern.
  5. Once the length is reached, no further checks shall be performed and the found values be returned.

My aim is to find a generic way so that this solution can be adapted to many other similar use cases as well. Therefore the pattern must be described in a scalable way and the identification of the pattern elements should be easy to compute.

Execution

First comes defining the pattern structure. Therefore I create a placeholder string, that holds one placeholder symbol for each kind of valid values:

xxxxxxxxyx

This reads as: For an x, any number between 0 and 9 is valid and for a y, only a “-” is allowed.

Therefore I create a table (“ValidValues”) with valid values for each kind of position:

Extract a pattern string

Valid values are organized as placeholders

Then I create another table (“Position”) where I define the pattern and match these values to the actual pattern:

Extract a pattern string

Pattern definition with placeholders from valid values

So for each position in the target string, I define the placeholder that identifies the valid value(s) from the first table. This is the key that I now use to merge that ValidValues-table to my positions table:

Merge the allowed values to the pattern table via placeholder key

When I expand out the “ValidValues”-column I get a long table with all valid values for each position in the pattern:

Expanding returns all valid values for each position in the pattern

This is a structure that lets me easily grab the valid values for each position with the following syntax:

Table.SelectRows(Positions, each [Position] = <NumberI’mAfter>) [ValidValue]

The blue part selects all rows which contain the number that I’m after and the green part selects the last column and transforms it to a list that is easy digestible for further computation. I’m going to use this formula in the List.Accumulate operation later on.

Now I start preparations to walk through the string that contains my pattern. Therefore I turn the text into a list:

Text.ToList(“aölsfdk0125-fds  da12345678-0asdf aölsdfj”)

Tranform text to list

List.Accumulate

Then I use List.Accumulate to step through this list, check each element and perform the task outlined in my plan above:

List.Accumulate to step through the values and create a list of matching characters

BTW: This code has been formatted with the great new Power Query Formatter tool .

The following picture contains the description of what each step does:

Commented code

After the ListAccumulate, I select the field “Result” (in step “Result”). It contains the list with matching strings which I then combine in the last step.

Please check the enclosed file to see it in action:  RegexDummy1_.zip

If you’ve enjoyed this tutorial so far, please stay tuned for the next episode. There I will transform the query to a function that can be applied to all rows of a table and adjust it to make the last 2 characters optional, so that also strings with just 8 numbers in them can be found.

Enjoy & stay queryious 😉

paging pagination Power Query

How not to miss the last page when paging with Power BI and Power Query

When you use an API with a paging mechanism (like the example from this blogpost), you’ll might work with a field that contains the address for the next page. You can use this to walk through the available chunks until you reach the last element. That last element in the pagination will not contain a next-field or that field will be null.

Paging in Power Query

In Power Query you can use the function List.Generate for it. According the latest function documentation it:

Generates a list of values given four functions that generate the initial value initial, test against a condition condition, and if successful select the result and generate the next value next.

So an intuitive implementation would look like so:

paging pagination Power Query

Initial code for paging: Will miss the last element

In the initial step (row 2) the API will be called and returns this record:

Read more

Trimming text with custom characters in Power BI and Power Query

When cleaning dirty data, you might have used the Trim-function (Text.TrimStart or Text.TrimEnd) to delete leading or trailing whitespace from your strings. However, did you know that you could use these functions as well to delete any other characters as well from the start or end of a string? Trimming text with custom characters is pretty straightforward:

Task: Trimming text with custom characters

Say you have a column with values like so

Trimming text with custom characters

and want to delete every number at the end and also every “-” that is directly connected with a number. So that the final output shall look like so:

Trim custom characters at the end of a string.

Optional parameter

By default, you feed just one argument into the Text.TrimStart or Text.TrimEnd function: The string whose whitespace characters shall be removed.

Text.TrimEnd(text as nullable text, optional trim as any) as nullable text

But the second argument lets you define a list of your own characters to be removed respectively. So I can create a list with all the characters that shall be removed from the end like so:

{"0".."9"} & {"-"}

This concatenates 2 lists: The first list contains 10 elements: All numbers as strings. The second list has just one element in it: “-“. I have to put this element into a list as well for being able to use the ampersand (“&”) as an easy concatenator here.

So the full expression for the “Add custom column” dialogue looks like so:

Text.TrimEnd( [MyColumnName], {"0".."9"} & {"-"} )

To see this in action, you can simply paste this code into the advanced editor and follow the steps:

Enjoy and stay queryious 😉

Date.Networkdays function for Power Query and Power BI

Today I’m going to share my custom NETWORKDAYS function for Power Query with you that uses the same syntax than its Excel-equivalent.

NETWORKDAYS function

This function’s first 3 parameters work just like the Excel function and there is a 4th parameter that allows adjusting the day on which the week shall start:

  1. Start as date
  2. End as date
  3. optional holidays as list of dates
  4. optional number to change the start of the week from Monday (default: 1) to any other day (2 would mean that the week starts on Tuesday instead)

The function comes with a UI that lets you first choose a table containing the holidays and then choose the column with the holiday dates. Read more

Dynamically create types from text with Type.FromText in Power Query and Power BI

In this article I’ll show you how to create types from text in Power Query, enabling you to dynamically change types via functions for example. It’ll come out as a custom Type.FromText function which has been asked for in the comments of this article: https://www.thebiccountant.com/2017/01/09/dynamic-bulk-type-transformation-in-power-query-power-bi-and-m.

Problem

To transform a column to type text can be done like so:

Table.TransformColumnTypes(Source,{{"Column1", type text}})

This transforms the “Column1” from table “Source” to type text.  Now, if you want to make the type dynamic and move it to a function parameter like so:

(VariableType as type) =>

Table.TransformColumnTypes(Source,{{"Column1", VariableType}})

This returns a function dialogue as follows:

Read more

Performance tip for aggregations after joins in Power Query and Power BI

This article was edited on 2nd Nov 2019. JoinKind.Local has been removed to avoid problems with merges on tables with primary keys:

In this article you’ll learn how to speed up the aggregation of joined/merged tables by orders of magnitude (I recorded up to 30 times faster execution times). This method works for merges where both table have multiple rows for each keys. If one of your tables has a primary key, the method Chris Webb describes here works just as good: Chris Webb’s article on how to improve performance on aggregations after joins using primary keys .

You can follow along the different methods in this file:  PerformanceAggregationsAfterMerges1_Upload.zip

Background

When you join a table to another table in Power Query, the UI gives you the option to either expand the columns (default) or aggregate the contents of the joint tables. That’s useful if multiple rows are returned for the rows of the table that has been joined to (left table):

Read more