Multiple replacements or translations in Power BI and Power Query

A common task when cleaning data is to perform multiple replacements at once. When dealing with “translation tables” you might have come across Chris Webb’s solution using List.Generate to deal with this. This works fine – but might deliver unexpected results if you don’t consider it’s implications properly. Say we add lines like shown below, then we might expect the result as highlighted in yellow:

Image2

Multiple replacements using 2 different techniques

So hey – what happened here?: Our catamaran turns out as a bearamaran and the chair turns to chwater.

How it works:

Because there could be more than one word per line that needs to be replaced, the function Text.Replace alone wouldn’t be able to do this job as it takes only one replacement element. This is where List.Generate jumps in: Taking the result of the previous replacement step as input and performs the next replacement – as often as there are elements in the replacement list.

And Text.Replace is a function that searches for substrings in the text. That way “catamaran” turns out as a “bearamaran”.

But what happened to the mat where the cat and the dog sat on: “chwater”?

Firstly “mat” has been replaced by “chair”. But due to the recursive nature of the List.Generate-function, this result is not safe. So when Text.Replace kicks in on: Replace “air” with “water”, searching again also for substrings, a part of “chair” with be replaced by water: “chwater“.

So while Chris’ article is an excellent explanation of how List.Generate works, most multiple text replacement requirements in the real world might be better solved with a different approach:

Use List.ReplaceMatchingItems instead

This is a function that is designed to work on lists and performing replacements for multiple items (that also need to be fed in as list):

But we want to replace within a text, not within a list!

No problem for Power Query as the master of transformations: We just temporarily transform our text into a list, perform the replacements and then transform our list back into text – as if nothing has happened 🙂

Text.Split(text, ” “) will split out text strings into a list of single words because we choose blank as the delimiter

Text.Combine(list, ” “) will then reassemble.

This technique is useful in many other operations as well. Sometimes the delimiters will change, but as long as you don’t get too mad with your transformations in between, it is often a easy and safe way to take a useful function from the List-category to perform on Text-fields.

But there’s another hurdle to take here: How to get our translation table into the desired format for this function? Have a look at the documentation:

List.ReplaceMatchingItems ( {1, 2, 3, 4, 5}, { {2, -2}, {3, -3} } ) equals { 1, -2, -3, 4, 5}

So we’re fine with lists: Identifying them by their curly brackets like in the result: { 1, -2, -3, 4, 5}

But what the x is this?: { {2, -2}, {3, -3} } a list containing lists as it’s elements.

List of Lists

Have you ever worked with a table with one column only? Maybe also have been confronted with a surprising effect that this had automatically turned itself into list-format? So if our table just had one column, it would almost be a list already. So let’s try to get there: Merge the columns of the translation table into one so that the contents of the rows (each cell effectively) will be returned as a list (as we want to maintain the separation that the columns gave, but not their format).

Table.AddColumn(Source, “Liste”, each ( { [Word To Replace], [Replace With] } ) )

This is an added column where in each row of the table, the specific rows contents of [Word To Replace] and [Replace With] will be passed into the list. So for every single row one list will be created.

If you’re interested in the techniques behind this or have guessed that instead one list with 2 full column would be returned: [Word To Replace] and [Replace With]: The unimposing little word “each” actually creates a sort of a row context. Actually the expression at the 3rd place in the “Table.AddColumn”-expression is expected to be a function and that’s what “each” stands for. It is basically syntax sugar for a full function expression like described here.

So: Goodbye columns “WordsToReplace” and “ReplaceWith”, but we respect your destiny and maintain your separating character by keeping your contents separated in the list-format instead.

Nice side-effect: It’s also way much faster.

And here is the full code:

let
    Source = ReplacementsTable,
    CreateListOfLists = Table.AddColumn(Source, "Liste", each ({[Word To Replace], [Replace With]})),
    Text = Text,
    TurnTextToList = Table.AddColumn(Text, "Custom", each Text.Split([Text], " ")),
    Replacements = Table.AddColumn(TurnTextToList, "Changed Text Expected", 
                each Text.Combine(List.ReplaceMatchingItems([Custom],CreateListOfLists[Liste])," ")),
    Cleanup = Table.RemoveColumns(Replacements,{"Custom", "Text"})
in
    Cleanup

& the file: M_MultipleReplacementsV3.xlsx

Enjoy & stay queryious 🙂

Comments (18) Write a comment

  1. Hi Imke 🙂
    Very elegant solution 🙂
    If you want to speed it up more, you may use List.Buffer function. I mean something like this
    CreateListOfLists = List.Buffer(Table.AddColumn(Source, “Liste”, each ({[Word To Replace], [Replace With]}))[Liste]),
    and then you have to change a little Replacements step to this below
    Replacements = Table.AddColumn(TurnTextToList, “Changed Text Expected”,
    each Text.Combine(List.ReplaceMatchingItems([Custom],CreateListOfLists),” “)),
    Of course this is a proposal only, because your solution should works fine without these changes 🙂

    From me, another approach (just different)

    let
    Repl = Table.Buffer(ReplacementsTable),
    Text = Text,
    Result = Table.AddColumn(Text, “Changed Text Expected”,
    each Text.Combine(
    List.Transform(
    Text.Split([Text], ” “),
    each
    try Repl[Replace With]{List.PositionOf(Repl[Word To Replace], _)}
    otherwise _
    ),
    ” ” )
    )
    [[Changed Text Expected]]
    in
    Result

    sq 🙂

    Reply

    • Hi Bill,
      thanks for the accolade. Actually, I found my solution quite elegant as well – until you came around …
      But yours is stellar, so I can comfort myself a bit in thinking that within earthly scale it still shines a little 🙂

      Note to my readers: This cosmic M-agician shows us how to make multiple replacements without a dedicated replacement-operation but with the most elegant lookup-operation I’ve come across in M instead: No join-expand-if-then but simple try lookup otherwise old value. Also he returns a table with one column with this statement: [[Changed Text Expected]] which I still need my head to turn around, as it looks as if it turns a list into a table, but actually only seems to work on table columns and not on native lists. @Bill: Any idea why that is?

      Apart from that: Thank you so much Bill and applause, this was a real experience!

      And yes: List.Buffer speeds it up considerably! But I might actually prefer this version then:

      let
      Source = ReplacementsTable,
      CreateListOfLists = Table.Buffer(Table.AddColumn(Source, “Liste”, each ({[Word To Replace], [Replace With]}))[[Liste]]),
      TurnTextToList = Table.AddColumn(Text, “Custom”, each Text.Split([Text], ” “)),
      Replacements = Table.AddColumn(TurnTextToList, “Changed Text Expected”, each Text.Combine(List.ReplaceMatchingItems([Custom],CreateListOfLists[Liste]),” “))[[Changed Text Expected]]
      in
      Replacements

      Fan of double-brackets already :-))

      Reply

  2. Thanks for your kind words, Imke. 🙂
    To understand this syntax (double brackets), look at “6.4.2 Field Access” in Formula Language Specification (page 61)

    Regards and sq 🙂

    Reply

    • Hi Bill,
      The Formula Language Specification (“6.4.2 Field Access”) is currently right in front my eyes.
      Even knowing what you showed here about [[ ]], I would have never understood it from the documentation !!!
      The only hint is the world “Table” briefly written down on the second line. I wonder how you figured it out.

      Reply

  3. Thanks Bill – looks like I need to learn some basics finally… (a list is missing the identifier and therefore cannot be selected or projected?)
    sq 🙂

    Reply

  4. Simply Genius, Thank you for your post.

    How do I go about getting the “Replace With” as my final result, i.e instead of particular ‘word’ replace, I would like the replace word to be the column result.

    Text Word To Replace Replace With Expected Result
    the cat sat on the mat cat bear bear

    Thank you

    Reply

  5. Text = the cat sat on the mat,
    Word To Replace = cat,
    Replace With = bear,
    Expected Result = bear

    Reply

    • Just to make sure to understand your requirement correctly: You want to replace multiple words of whom one is “cat” with just one word “bear”?

      Reply

  6. That is Correct. Instead of the whole sentence- I would like the result to display just one word “bear”.

    Thank you for looking into it.

    Reply

    • This can be achieved by modifying the last step “Replacements” by this: Table.AddColumn(TurnTextToList, "Changed Text Expected", each List.First(List.Select(CreateListOfLists[Liste], (Match) => List.Contains([Custom],Match{0}))){1})

      It will return the first match that is found in the list. If you want the last match instead, you simply replace “List.First” by “List.Last”.

      Hope this helps and cheers, Imke

      Reply

  7. Hi! This is was super helpful – thank you so much for posting this. Is there is a way to replace the text directly in the original column instead of adding the new column “Changed Text Expected”? I’ve been trying to modify the code to do this with no such luck. Any insight would be greatly appreciated! Thank you.

    Reply

    • Hi Ayla,
      yes, this can be done like this: = Table.TransformColumns(Text,{{“Text”, each Text.Combine(List.ReplaceMatchingItems(Text.Split(_, ” “), CreateListOfLists[Liste]),” “)}})
      Works with the data in the downloadable file.
      Cheers, Imke

      Reply

  8. Hi there,

    First, thanks a lot for this post and many others that have been super useful.
    What I’m gonna ask may be really simple to extrapolate from your query but I haven’t been successful so far. Could I replace the values in my already-existing column, without using the split function so I replace full strings by another full string.

    For example, I want to replace all cases where the value “Two words” is found by the value “Other three words”. There are obviously more values to replace on the same column.

    Many thanks for your time,

    Reply

    • Currently, I see only Chris’ method for this use case.
      Do you see the risk that your “Two words” phrases will be included in other phrases as will and you’ll actually run into the problems I’ve described?

      Reply

      • Hi!
        Thanks for quick reply. I did understand the problems of Chris´method, that’s why I opted to follow this solution. I made a few changes to your query to make it more robust and it works fine.
        There is a couple of cases I have to check first and, if it works for all of them, I’ll let you know!

        Regards,

        NAOS

        Reply

  9. Pingback: Transform text with exceptions in Power BI and Power Query – The BIccountant

Leave a Reply

Die Datenschutzbestimmungen finden Sie hier: / Please find the privacy policy here: https://wp.me/P6lgsG-Rz