RegEx in Power BI and Power Query in Excel with Java Script

From time to time Huang Caiguang sends me some geeky M-code that turns out to be very useful (like this for example). Today it’s utilizing java script code for RegEx (regular expressions): The Web.Page function can execute JS code and you can pass strings from you M-environment to it with simple string concatenation. So escape the js-code and use the ampersand (“&”) to reference the string. As a function it looks like so:

let
fx=(x,y)=>Web.Page(
“<script>
var x=”&x&”;var y=new RegExp(‘”&y&”‘,’g’);var b=x.match(y);document.write(b);
</script>”)[Data]{0}[Children]{0}[Children]{1}[Text]{0}
in fx(“””hello012中国1235″””,”\\d+”)

This code extracts all decimals (“\\d+”) from the string provided as the first argument and concatenates them with a comma. But just try for yourself by pasting the code into the advanced editor.

This doesn’t work in PowerBI service, so if you need it there, you can switch to an R-script that can be used with a personal gateway. But for Excel users, this is the way to go.

Please vote for a native implementation in M here: https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/8892295-regular-expressions-support-in-power-query

Enjoy and stay queryious 😉

Comments (34) Write a comment

  1. Too complicated =)
    ====
    let
    fx=(s, reg)=>Web.Page(“document.write(‘”&s&”‘.match(/”&reg&”/));”),
    tbl = fx(“qwer1234asdf”,”\d+”),
    result = tbl{0}[Data]{0}[Children]{1}[Children]{0}[Text]
    in
    result

    Reply

  2. Pingback: Power BI Lingo, RegEx, Data and BI Summit and more... (April 26, 2018) | Guy in a Cube

  3. I am trying to do something similar, but I am too new to Power Query / M to get anything working.

    I have a table with a column called Title with some text. I am trying to find any row that has Visio as a word within the title, but exclude anything such as vision, visionary, …

    I can’t get the example above to work. Is it possible to provide a sample file with a custom function, along with the appropriate parameters ([Field] as text, RegEx), returning true or false (if found or not found).

    Reply

    • Hello Daniel,
      if my understanding is correct, this is fairly easy to implement. Please paste the following code into the advanced editor and see what happens:

      let
      Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCssszsxXSCxOSVPSUUosVorVgYrlwQTTk8CCIJ5CGUgGKFaclQYVzIGJ5hWDDEgtqkpUio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Title = _t, #"Other columns" = _t]),
      #"Filtered Rows" = Table.SelectRows(Source, each List.Contains(Text.Split([Title], " "), "Visio", Comparer.OrdinalIgnoreCase))
      in
      #"Filtered Rows"

      This is a case-insensitive implementation. If you need a case sensitive, then use Comparer.Ordinal instead.
      /Imke

      Reply

  4. Thanks for the quick reply.

    The following row from your code seems to return a table with any row containing visio (case insensitive). I would like to see this in a custom column with a TRUE (contains visio) or FALSE (does not contain visio).

    Table.SelectRows(Source, each List.Contains(Text.Split([Title], ” “), “Visio”, Comparer.OrdinalIgnoreCase))

    I want to provide this to some employees who can then simply filter on a few columns (I will have a column for Visio, Project, Adobe, Antidote, …

    It would be cool, if those could all be in the same column with the software as the result instead of true/false, but that might be a phase 2 – I would be very happy to have the 1st part working. The difficulty with phase 2 is that I may have multiple software in the same title (ie: Install visio and project). In phase 1, I plan on doing this as multiple columns with visio true/false and project true/false). Phase 2, I would like “VISIO,PROJECT” as the result in a custom column.

    Reply

    • Hi Daniel,
      phase 1 is no problem, just take the filter logic to a new column instead: Table.AddColumn(Source, “FilterColumns”, each List.Contains(Text.Split([Title], ” “), “Visio”, Comparer.OrdinalIgnoreCase))
      I don’t have a clear picture what you want to achieve in phase 2. So please provide a link to an Excel file where you’ve outlined the situation as is and the desired outcome.
      Thanks, Imke

      Reply

  5. Hi, I managed to do what I wanted using multiple columns. I have provided a number of rows below to show what I would like.

    Sample Data…Column Title is what we check and Custom_Software_Column would return specific words found in the title column separated by a comma or other delimiter.
    (At the moment I am specifically looking for Visio, Project, Adobe, Antidote but this could change).

    Title Custom_Software_Column
    Install Visio Visio
    Install Visio and Project Visio, Project
    Other issue1
    Install Project Project
    Other issue2
    Remove Antidote Antidote

    I managed a solution by using your code (one column per keyword) and then using another column with “if then else” logic. The is probably a better way that would eliminate the interim columns, but the solution works.

    Reply

  6. I faced an issue with special non printable characters in the input strings, I removed them from the input string first using Text.Clean function to bypass it:

    let
    fx=(InputString,RegexPattern)=>
    let
    CleanText = Text.Clean(InputString),
    Match = Web.Page(

    var x=(‘”&CleanText&”‘);var y=new RegExp(‘”&RegexPattern&”‘,’g’);var b=x.match(y);document.write(b);

    “)[Data]{0}[Children]{0}[Children]{1}[Text]{0}
    in Match
    in fx

    Hope this helps.

    Reply

  7. Pingback: Regexp in Power Query using JavaScript - Tutorial Guruji

  8. Pingback: [SOLVED] Regexp in Power Query using JavaScript – BugsFixing

  9. Has anyone successfully implemented regex in power query (BI) M language using the .replace instead of just the .match

    Reply

  10. I got the .replace to work using grouping and replacing with groups, the code for the function is below:

    et fx=(text,regex,repl)=>
    Web.Page(

    var x='”&text&”‘;
    var z='”&repl&”‘;
    var y=new RegExp(‘”&regex&”‘,’g’);
    var b=x.replace(y, z);
    document.write(b);

    “)[Data]{0}[Children]{0}[Children]{1}[Text]{0}
    in
    fx

    In PQ a M code example of the usage is below:

    #"Added Custom4" = Table.AddColumn(#"Added Custom3", "Clean_First", each Regex_Replace([First], " \\d.\\d+| \\d", " "))

    in
    #”Added Custom4″

    Reply

    • How to overcome this error –> Expression.Error: The name ‘Regex_Replace’ wasn’t recognized. Make sure it’s spelled correctly.

      Reply

  11. That error is because you first have to create function with the code and name the function Regex_Replace
    Then you can call the function as in the example m code

    Reply

  12. I tried this approach, but the result is a piece of code and not the expected string value. 🙁

    Reply

  13. on the first line of the code the “L” in let was omitted when copying the code for the Function

    Reply

  14. I stumbled upon this and was inspired use regex to extract legislative bill numbers from unstructured text fields. Some of the text fields contain multiple bill numbers, and when I test my regex on regex101.com it returns multiple matches per record, but within PQ it’s only returning the first match it finds. I’m guessing that this is because I’m using the Table.AddColumn function to capture the matches without nesting a list function, but could be guessing wrong. Any thoughts?

    #”Add BillParts” = Table.AddColumn(#”Changed Type”, “BillParts”, each re(“/(((S|A).?\s?(\d+))-?([A-L|a-l]?))/”,[Focus Number])),

    Reply

    • Answering my own question: To return all matches, you need to use regex’s global and multiline switches at the end of the command (/gm). That solved the problem.

      #”Add BillParts” = Table.AddColumn(#”Changed Type”, “BillParts”, each re(“/(((S|A).?\s?(\d+))-?([A-L|a-l]?))/gm”,[Focus Number])),

      Reply

Leave a Reply