Return all occurrences of text between delimiters in Power BI and Power Query

The M-function Text.BetweenDelimiters does what it says and returns the text between 2 delimiters in the query editor of Power BI and Power Query. But irrespective of its optional arguments you can use to select specific occurrences, it will always only return one string found.

Problem

How to retrieve all occurrences of strings between in a text field like here for example: I’d like to get a list that contains “Python” and “R”:

“Meanwhile I prefer +Python!+ to +R!+”

Meaning all strings found between the delimiters “+” and “!+”.

Solution

This function will return the desired list:


let func =
(InputText as text, StartDelimiter as text, EndDelimiter as text, optional Alternative_Dummy as text) =>
let
// If the end delimiter contains the same characters than the start delimiter then that portion has to be replaced in the end parameter by a dummy. Otherwise the query fails.
AlternativeDummy = Alternative_Dummy,
Dummy = if AlternativeDummy is null then "*+*" else AlternativeDummy,
String = InputText,
ParaStart = StartDelimiter,
ParaEnd = EndDelimiter,
// If the delimiters are identical, a different logic has to be applied.
IdenticalDelimiters = ParaEnd = ParaStart,
// Start of logic for identical delimiters
SplitText = Text.Split(String, ParaStart),
ResultIdenticalDelimiters = List.Alternate(SplitText,1,1),
// Start of logic for different start- and end-delimiters: Create dummy end-delimiter.
DummyParaEnd = Text.Replace(ParaEnd, ParaStart, Dummy),
DummyString = Text.Replace(String, ParaEnd, DummyParaEnd),
SplitByParaStart = List.Skip(Text.Split(DummyString, ParaStart),1),
ConvertToTable = Table.FromList(SplitByParaStart, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ExtractBeforeParaEnd = Table.AddColumn(ConvertToTable, "Result", each Text.BeforeDelimiter([Column1], DummyParaEnd)),
Cleanup = Table.RemoveColumns(ExtractBeforeParaEnd,{"Column1"})[Result],
Result = if Text.Contains(String, ParaStart) and Text.Contains(String, ParaEnd)
then (if IdenticalDelimiters
then ResultIdenticalDelimiters
else Cleanup)
else error Error.Record( "At least one of the delimiters is not found in the string" )
in
Result ,
documentation = [
Documentation.Name = " Text.BetweenDelimitersOccAll ",
Documentation.Description = " Returns all occurrances of text strings between a delimiter pair. ",
Documentation.LongDescription = " Returns all occurrances of text strings between a delimiter pair. Optional parameter Alternative_Dummy can be used if Dummy shall not be *+*. ",
Documentation.Category = " Text ",
Documentation.Source = " https://wp.me/p6lgsG-Y5 ",
Documentation.Version = " 1.0 ",
Documentation.Author = " Imke Feldmann: www.TheBIccountant.com: https://wp.me/p6lgsG-Y5 . ",
Documentation.Examples = {[Description = " See this blogpost: https://wp.me/p6lgsG-Y5 ",
Code = " TextBetweenDelimitersOccAll(""Meanwhile I prefer *Python!* to *R!*"", ""*"", ""!*"") ",
Result = " {""Python"", ""R""} "]}]
in
Value.ReplaceType(func, Value.ReplaceMetadata(Value.Type(func), documentation))

How it works

In the first 3 mandatory parameters you fill the text to be analysed and the delimiters between the strings shall be extracted.

But there is also a 4th optional parameter that you will only have to use in the rare case that your 1st delimiter contains an “°”: My function works on the principle that a split by the 1st delimiter will actually just split the text on the 1st delimiters position. Now, when that string is also included in the 2nd delimiter, the text would also be split at the position of the 2nd delimiter. To prevent this, I introduced a Dummy that replaces the occurrence of the 1st delimiter-string within the 2nd delimiter. So in the case above, the 2nd delimiter “!+” will be replaced to “!°” before the splitting starts.

So far so good … BUT: If the 1st delimiter would actually be “°” instead of “+” here, my function would not return correct results. In that case, you’d have to fill in a value in the 4th optional parameter that is not contained in your current delimiters. So I tried to choose a Dummy-sign that will hopefully very seldom be used as the delimiter and you will never have to use this option 😉

The file to download contains the function and a query-version that allows you to follow each step if you want to see how the function is built.

Link to file:  TextBetweenDelimitersOccAll.zip

Enjoy & stay queryious 😉

Comments (5) Write a comment

  1. Hi, not sure if I understand where and how exactly to put my values. I have different delimiters: [ and :
    Where do I put including the dummy value?

    Reply

  2. I knew I’d find you’d done this already! I’m using to parse TEXTDATA to find table names used in long CPUTIME queries.

    Reply

Leave a Reply