Multiple replacements or translations in Power BI and Power Query

A common task when cleaning data is to perform multiple replacements at once. When dealing with “translation tables” you might have come across Chris Webb’s solution using List.Generate to deal with this. This works fine – but might deliver unexpected results if you don’t consider it’s implications properly. Say we add lines like shown below, then we might expect the result as highlighted in yellow:

Image2

Multiple replacements using 2 different techniques

So hey – what happened here?: Our catamaran turns out as a bearamaran and the chair turns to chwater.

How it works:

Because there could be more than one word per line that needs to be replaced, the function Text.Replace alone wouldn’t be able to do this job as it takes only one replacement element. This is where List.Generate jumps in: Taking the result of the previous replacement step as input and performs the next replacement – as often as there are elements in the replacement list.

And Text.Replace is a function that searches for substrings in the text. That way “catamaran” turns out as a “bearamaran”.

But what happened to the mat where the cat and the dog sat on: “chwater”?

Firstly “mat” has been replaced by “chair”. But due to the recursive nature of the List.Generate-function, this result is not safe. So when Text.Replace kicks in on: Replace “air” with “water”, searching again also for substrings, a part of “chair” with be replaced by water: “chwater“.

So while Chris’ article is an excellent explanation of how List.Generate works, most multiple text replacement requirements in the real world might be better solved with a different approach:

Use List.ReplaceMatchingItems instead

This is a function that is designed to work on lists and performing replacements for multiple items (that also need to be fed in as list):

But we want to replace within a text, not within a list!

No problem for Power Query as the master of transformations: We just temporarily transform our text into a list, perform the replacements and then transform our list back into text – as if nothing has happened 🙂

Text.Split(text, ” “) will split out text strings into a list of single words because we choose blank as the delimiter

Text.Combine(list, ” “) will then reassemble.

This technique is useful in many other operations as well. Sometimes the delimiters will change, but as long as you don’t get too mad with your transformations in between, it is often a easy and safe way to take a useful function from the List-category to perform on Text-fields.

But there’s another hurdle to take here: How to get our translation table into the desired format for this function? Have a look at the documentation:

List.ReplaceMatchingItems ( {1, 2, 3, 4, 5}, { {2, -2}, {3, -3} } ) equals { 1, -2, -3, 4, 5}

So we’re fine with lists: Identifying them by their curly brackets like in the result: { 1, -2, -3, 4, 5}

But what the x is this?: { {2, -2}, {3, -3} } a list containing lists as it’s elements.

List of Lists

Have you ever worked with a table with one column only? Maybe also have been confronted with a surprising effect that this had automatically turned itself into list-format? So if our table just had one column, it would almost be a list already. So let’s try to get there: Merge the columns of the translation table into one so that the contents of the rows (each cell effectively) will be returned as a list (as we want to maintain the separation that the columns gave, but not their format).

Table.AddColumn(Source, “Liste”, each ( { [Word To Replace], [Replace With] } ) )

This is an added column where in each row of the table, the specific rows contents of [Word To Replace] and [Replace With] will be passed into the list. So for every single row one list will be created.

If you’re interested in the techniques behind this or have guessed that instead one list with 2 full column would be returned: [Word To Replace] and [Replace With]: The unimposing little word “each” actually creates a sort of a row context. Actually the expression at the 3rd place in the “Table.AddColumn”-expression is expected to be a function and that’s what “each” stands for. It is basically syntax sugar for a full function expression like described here.

So: Goodbye columns “WordsToReplace” and “ReplaceWith”, but we respect your destiny and maintain your separating character by keeping your contents separated in the list-format instead.

Nice side-effect: It’s also way much faster.

And here is the full code:

let
    Source = ReplacementsTable,
    CreateListOfLists = Table.AddColumn(Source, "Liste", each ({[Word To Replace], [Replace With]})),
    Text = Text,
    TurnTextToList = Table.AddColumn(Text, "Custom", each Text.Split([Text], " ")),
    Replacements = Table.AddColumn(TurnTextToList, "Changed Text Expected", 
                each Text.Combine(List.ReplaceMatchingItems([Custom],CreateListOfLists[Liste])," ")),
    Cleanup = Table.RemoveColumns(Replacements,{"Custom", "Text"})
in
    Cleanup

& the file: M_MultipleReplacementsV3.xlsx

.. and a “functionized version”:

let func =
(
  TableWithTextColumn as table, 
  ColumnNameTextToReplace as text, 
  ReplacementsTable as table, 
  ColumnNameWordToReplace as text, 
  ColumnNameReplaceWith as text
) => 
  let
    Source = ReplacementsTable,
    CreateListOfLists = Table.AddColumn(
        Source, 
        "Liste", 
        each ({Record.Field(_, ColumnNameWordToReplace), Record.Field(_, ColumnNameReplaceWith)})
      ),
    Text = TableWithTextColumn,
    TurnTextToList = Table.AddColumn(
        Text, 
        "Custom", 
        each Text.Split(Record.Field(_, ColumnNameTextToReplace), " ")
      ),
    Replacements = Table.AddColumn(
        TurnTextToList, 
        "Changed Text Expected", 
        each Text.Combine(List.ReplaceMatchingItems([Custom], CreateListOfLists[Liste]), " ")
      ),
    Cleanup = Table.RemoveColumns(Replacements, {"Custom", "Text"})
  in
    Cleanup ,
documentation = [
Documentation.Name = " Table.ReplaceMultiple.pq ",
Documentation.Description = " Replaces multiple values at a time in a table column without recursion. ",
Documentation.LongDescription = " Replaces multiple values at a time in a table column without recursion: http://www.thebiccountant.com/2016/05/22/multiple-replacements-in-power-bi-and-power-query/ The <code>ReplacementsTable </code> must have the values to be replaced in the 1st column and the new value in the 2nd. ",
Documentation.Category = " Table.ColumnOperations ",
Documentation.Source = " http://www.thebiccountant.com/2016/05/22/multiple-replacements-in-power-bi-and-power-query/ . ",
Documentation.Version = " 2.0 ",
Documentation.Author = " Imke Feldmann ",
Documentation.Examples = {[Description = " ",
Code = " let
TableWithText = #table( {""Text""}, List.Zip( { {""the cat sat on the mat"" ,""the cat sat next to the dog"" ,""the dog chased the cat"" ,""the dog sat on the mat"" ,""the catamaran sails through the air""} } ) ),
ReplacementsTable = #table( {""Word To Replace"", ""Replace With""}, List.Zip( { {""cat"" ,""mat"" ,""dog"" ,""the"" ,""air""}, {""bear"" ,""chair"" ,""dragon"" ,""THE"" ,""water""} } ) ),
// The query containing the function has to be called ""fnTable_ReplaceMultiple_pq""
Result = fnTable_ReplaceMultiple_pq ( TableWithText , ""Text"", ReplacementsTable, ""Word To Replace"", ""Replace With"" )
in
Result ",
Result = " Changed Text Expected
———————
THE bear sat on THE chair
THE bear sat next to THE dragon
THE dragon chased THE bear
THE dragon sat on THE chair
THE catamaran sails through THE water "]}]
in
Value.ReplaceType(func, Value.ReplaceMetadata(Value.Type(func), documentation))

Enjoy & stay queryious 🙂

Comments (48) Write a comment

  1. Pingback: Transform text with exceptions in Power BI and Power Query – The BIccountant

Leave a Reply

Die Datenschutzbestimmungen finden Sie hier: / Please find the privacy policy here: https://wp.me/P6lgsG-Rz