Remove repeating characters from a string in Power BI and Power Query

Repeating spaces often cause problems when cleaning up your data. My new function “Text.RemoveRepeatingCharacters” can come to the rescue here.

Imagine you have a table like this:

Challenge

To further work with this data, it would often be best if there was just one space between the words and not many.

The following function will do this for you:

Function Text.RemoveRepeatingCharacters


let func =
(String as text, optional Delimiter as text) as text =>
let
delimiter = if Delimiter = null then " " else Delimiter,
TextToList = List.Buffer(Text.Split(String, delimiter)),
FilterList = List.Select(TextToList, each _ <> ""),
Result = Text.Combine(FilterList, delimiter)
in
Result
, documentation = [
Documentation.Name = " Text.RemoveRepeatingCharacters
", Documentation.Description = " Removes repeating characters of the delimiter from a <code>String</code>
" , Documentation.LongDescription = " Removes repeating characters of the delimiter from a <code>String</code>. <code>Delimiter</code> is by default set to space/blank.
", Documentation.Category = " Text
", Documentation.Source = " http://wp.me/p6lgsG-M9 .
", Documentation.Author = " Imke Feldmann: www.TheBIccountant.com .
", Documentation.Examples = {[Description = " Removes repeating characters of the delimiter from a string: https://wp.me/p6lgsG-M9 .
" , Code = " RemoveRepeatingCharacters(""Imke Feldmann"")
", Result = " ""Imke Feldmann""
"]}]
in
Value.ReplaceType(func, Value.ReplaceMetadata(Value.Type(func), documentation))

How to use

It takes 2 arguments: The Text/String and the Delimiter. The delimiter is an optional argument and by default set to space ” “. So you can leave it blank if that’s fine for you or enter a different value (like “,” for a comma) if needed.

How it works

It splits the text up into a list using the delimiter from the 2nd parameter (4: TextToList). Where one delimiter directly follows another, the element in the list will be empty. The next step (5: FilterList) then filters the list and removes these empty fields. In the last step (6: Result) the remaining (non-empty) fields will be reassembled, using the delimiter again. That way, just one delimiter will be left.

Edit 28-Jan-2018: While searching the web to see if one of my next blogpost-topics have already been published somewhere else already, I came across Ivan Bond’s blogpost who used this same technique over 2 years ago here: https://bondarenkoivan.wordpress.com/2015/10/11/transform-table-column-using-own-function-in-power-query/ . It’s a very good read and you will also learn how to use a function like this to transform an existing column instead of adding a new one to perform the operation like in my example above, so don’t miss it.

Edit: If you are wondering why I didn’t use a recursive approach via List.Accumulate or List.Generate for it, check out Rick de Groots article where he does a performance comparison of this approach against them:
Removing Excess Spaces Between Words in Power Query (gorilla.bi)
This is by far the fastest approach 🙂

Enjoy & stay queryious 🙂

Comments (16) Write a comment

  1. Pingback: Removing Repeating Characters In Power Query – Curated SQL

  2. Pingback: Power Query and a new Power BI Desktop update | Guy in a Cube

  3. Imke – rather than create a custom function — my approach has been to nest the three functions into a single line of code ie: Text.Combine ( List.Select( Text.Split (….. ) ) ) — enabling the query to execute seamless – absent the invocation requirement associated with a custom function.

    Thanks – Drewbbc.

    Reply

  4. Imke,

    A nice post. It got me thinking of how to do the opposite or something like that. I want to look for a character string that is in the following format: 99-99-999-99×9 where 9 is a numeric value and x is a character value. Much like a sin or postal code.
    Do you have any insight on how to do that?

    thx,
    wes

    Reply

    • Thank you.
      Not sure if I understood your request correctly, but please check this query:
      let
      String = "99-99-999-99x9",
      Test = "asldf asldkfj 12-34-567-89a0 asfdkjö sfadjk",
      fnProfile = (String) => {Text.PositionOfAny(String, {"0".."9"}, Occurrence.All), Text.PositionOfAny(String, {"A".."z"}, Occurrence.All), Text.PositionOfAny(String, {"-"}, Occurrence.All)},
      StringProfile = fnProfile(String),
      Custom1 = Text.Split(Test, " "),
      Custom2 = List.Select(Custom1, each fnProfile(_)=StringProfile)
      in
      Custom2

      Reply

  5. I think so.
    I will try it in Power BI on the weekend. I plan to use it to a column like this one:

    Title
    name change for01-07-009-06W2
    12-04-010-28W1 and 12-05-010-28W1 overlapping when they should not be
    change surface hole location for well 16-23-002-01W2

    Reply

      • And just to avoid errors when the input is null you could make the first argument nullable text and check for nulls to avoid Text.Combine from speading errors. The version I use is:

        ( String as nullable text, optional TrimChar as nullable text ) as nullable text =>
        let
        trimChar = if TrimChar is null then ” ” else TrimChar as text,
        Split = Text.Split( String, trimChar ) as list,
        Remove = List.RemoveItems( Split, { “” } ) as list,
        Combine = Text.Combine( Remove, trimChar ) as nullable text,
        Result = if String null then Combine else null as nullable text
        in
        Result

        Thanks for all the nice articles.

        Reply

  6. It appears that if the input string contains multiples of the delimiter to remove at the end of the string, this removes ALL of those occurrences from the end of the input string and not leaving just the one occurrence as expected. Is anyone else seeing this behavior? Try this as String –> “Johnny Be Good???” and a “?” as the Delimiter. Version 1908 Build 11929.20708

    Reply

Leave a Reply