Multiple replacements or translations in Power BI and Power Query

A common task when cleaning data is to perform multiple replacements at once. When dealing with “translation tables” you might have come across Chris Webb’s solution using List.Generate to deal with this. This works fine – but might deliver unexpected results if you don’t consider it’s implications properly. Say we add lines like shown below, then we might expect the result as highlighted in yellow:

Image2

Multiple replacements using 2 different techniques

So hey – what happened here?: Our catamaran turns out as a bearamaran and the chair turns to chwater.

How it works:

Because there could be more than one word per line that needs to be replaced, the function Text.Replace alone wouldn’t be able to do this job as it takes only one replacement element. This is where List.Generate jumps in: Taking the result of the previous replacement step as input and performs the next replacement – as often as there are elements in the replacement list.

And Text.Replace is a function that searches for substrings in the text. That way “catamaran” turns out as a “bearamaran”.

But what happened to the mat where the cat and the dog sat on: “chwater”?

Firstly “mat” has been replaced by “chair”. But due to the recursive nature of the List.Generate-function, this result is not safe. So when Text.Replace kicks in on: Replace “air” with “water”, searching again also for substrings, a part of “chair” with be replaced by water: “chwater“.

So while Chris’ article is an excellent explanation of how List.Generate works, most multiple text replacement requirements in the real world might be better solved with a different approach:

Use List.ReplaceMatchingItems instead

This is a function that is designed to work on lists and performing replacements for multiple items (that also need to be fed in as list):

But we want to replace within a text, not within a list!

No problem for Power Query as the master of transformations: We just temporarily transform our text into a list, perform the replacements and then transform our list back into text – as if nothing has happened 🙂

Text.Split(text, ” “) will split out text strings into a list of single words because we choose blank as the delimiter

Text.Combine(list, ” “) will then reassemble.

This technique is useful in many other operations as well. Sometimes the delimiters will change, but as long as you don’t get too mad with your transformations in between, it is often a easy and safe way to take a useful function from the List-category to perform on Text-fields.

But there’s another hurdle to take here: How to get our translation table into the desired format for this function? Have a look at the documentation:

List.ReplaceMatchingItems ( {1, 2, 3, 4, 5}, { {2, -2}, {3, -3} } ) equals { 1, -2, -3, 4, 5}

So we’re fine with lists: Identifying them by their curly brackets like in the result: { 1, -2, -3, 4, 5}

But what the x is this?: { {2, -2}, {3, -3} } a list containing lists as it’s elements.

List of Lists

Have you ever worked with a table with one column only? Maybe also have been confronted with a surprising effect that this had automatically turned itself into list-format? So if our table just had one column, it would almost be a list already. So let’s try to get there: Merge the columns of the translation table into one so that the contents of the rows (each cell effectively) will be returned as a list (as we want to maintain the separation that the columns gave, but not their format).

Table.AddColumn(Source, “Liste”, each ( { [Word To Replace], [Replace With] } ) )

This is an added column where in each row of the table, the specific rows contents of [Word To Replace] and [Replace With] will be passed into the list. So for every single row one list will be created.

If you’re interested in the techniques behind this or have guessed that instead one list with 2 full column would be returned: [Word To Replace] and [Replace With]: The unimposing little word “each” actually creates a sort of a row context. Actually the expression at the 3rd place in the “Table.AddColumn”-expression is expected to be a function and that’s what “each” stands for. It is basically syntax sugar for a full function expression like described here.

So: Goodbye columns “WordsToReplace” and “ReplaceWith”, but we respect your destiny and maintain your separating character by keeping your contents separated in the list-format instead.

Nice side-effect: It’s also way much faster.

And here is the full code:

let
    Source = ReplacementsTable,
    CreateListOfLists = Table.AddColumn(Source, "Liste", each ({[Word To Replace], [Replace With]})),
    Text = Text,
    TurnTextToList = Table.AddColumn(Text, "Custom", each Text.Split([Text], " ")),
    Replacements = Table.AddColumn(TurnTextToList, "Changed Text Expected", 
                each Text.Combine(List.ReplaceMatchingItems([Custom],CreateListOfLists[Liste])," ")),
    Cleanup = Table.RemoveColumns(Replacements,{"Custom", "Text"})
in
    Cleanup

& the file: M_MultipleReplacementsV3.xlsx

Enjoy & stay queryious 🙂

Comments (4) Write a comment

  1. Hi Imke 🙂
    Very elegant solution 🙂
    If you want to speed it up more, you may use List.Buffer function. I mean something like this
    CreateListOfLists = List.Buffer(Table.AddColumn(Source, “Liste”, each ({[Word To Replace], [Replace With]}))[Liste]),
    and then you have to change a little Replacements step to this below
    Replacements = Table.AddColumn(TurnTextToList, “Changed Text Expected”,
    each Text.Combine(List.ReplaceMatchingItems([Custom],CreateListOfLists),” “)),
    Of course this is a proposal only, because your solution should works fine without these changes 🙂

    From me, another approach (just different)

    let
    Repl = Table.Buffer(ReplacementsTable),
    Text = Text,
    Result = Table.AddColumn(Text, “Changed Text Expected”,
    each Text.Combine(
    List.Transform(
    Text.Split([Text], ” “),
    each
    try Repl[Replace With]{List.PositionOf(Repl[Word To Replace], _)}
    otherwise _
    ),
    ” ” )
    )
    [[Changed Text Expected]]
    in
    Result

    sq 🙂

    Reply

    • Hi Bill,
      thanks for the accolade. Actually, I found my solution quite elegant as well – until you came around …
      But yours is stellar, so I can comfort myself a bit in thinking that within earthly scale it still shines a little 🙂

      Note to my readers: This cosmic M-agician shows us how to make multiple replacements without a dedicated replacement-operation but with the most elegant lookup-operation I’ve come across in M instead: No join-expand-if-then but simple try lookup otherwise old value. Also he returns a table with one column with this statement: [[Changed Text Expected]] which I still need my head to turn around, as it looks as if it turns a list into a table, but actually only seems to work on table columns and not on native lists. @Bill: Any idea why that is?

      Apart from that: Thank you so much Bill and applause, this was a real experience!

      And yes: List.Buffer speeds it up considerably! But I might actually prefer this version then:

      let
      Source = ReplacementsTable,
      CreateListOfLists = Table.Buffer(Table.AddColumn(Source, “Liste”, each ({[Word To Replace], [Replace With]}))[[Liste]]),
      TurnTextToList = Table.AddColumn(Text, “Custom”, each Text.Split([Text], ” “)),
      Replacements = Table.AddColumn(TurnTextToList, “Changed Text Expected”, each Text.Combine(List.ReplaceMatchingItems([Custom],CreateListOfLists[Liste]),” “))[[Changed Text Expected]]
      in
      Replacements

      Fan of double-brackets already :-))

      Reply

  2. Thanks for your kind words, Imke. 🙂
    To understand this syntax (double brackets), look at “6.4.2 Field Access” in Formula Language Specification (page 61)

    Regards and sq 🙂

    Reply

  3. Thanks Bill – looks like I need to learn some basics finally… (a list is missing the identifier and therefore cannot be selected or projected?)
    sq 🙂

    Reply

Leave a Reply