Advanced type detection in Power BI and Power Query: Table.ClassifyMixedColumnTypes

This is not a proper blogpost, just a quick share of a function I’ve created today which I think will be very useful for others as well:

Automatic type detection will assign only one type to a column, but sometimes there are multiple types in it and then the type “any” will be applied, which means no type at all. In these cases, often the type of the individual elements/rows in those columns are the key to the necessary ETL-transformation operations to be performed. So you would like to be able to write statements like this:

Table.AddColumn(PreviousStep, “Event”, each if Type.Is(Value.Type([MyColumn]), type text) then ThisEvent else AnotherEvent)

(reading: Add a column with a new Event that depends on if the type of the current row in MyColumn is text: then do ThisEvent else do AnotherEvent)

My advanced type detection function will identify and allocate different data types. It will try to apply a couple of type conversion operations and in case of success, apply the type to the individual cell/record field and in case of failure move on with the next tries. If none of the type conversions succeeds, it will return the original state (any). It takes the name of your table and a list of the column names on which it should be applied as the input-parameters.

(Table, ListOfColumnNames) =>
let
Table.ClassifyMixedColumnTypes.m = Table.TransformColumns(Table.TransformColumnTypes(Table, List.Transform(ListOfColumnNames, each {_, type text})), List.Transform(ListOfColumnNames, each {_ , each try Number.From(_) otherwise try Date.From(_) otherwise  try Text.From(_) otherwise _}))
in
Table.ClassifyMixedColumnTypes.m

So this function will apply different data types (expand as you like/need) within one column on a row-by-row basis – which is what I’ve been looking for quite a while 🙂

Thank you so much Bill Szysz for showing me how to use the List.Transform-trick in order to bulk-apply transformations!

If you’re as lazy as me, you could be tempted to pass “Table.ColumnNames” to the “ListOfColumnNames”-parameter – but this might slow your query down! (Guessing – not much practical experience gained yet)

A warning at the end, that this is of course error-prone – as some strings like “3.2” for example are not unambiguous and can get converted as a date although in the specific context should be numbers (and vice versa). So you should check properly and – if needed – perform some additional transformations before in order to guarantee the correct outcome. But at least in my case it did exactly what was needed.

Please share your thoughts/experiences with this function & stay queryious 🙂

Comments (5) Write a comment

  1. Hi Imke,

    This is great. Thanks heaps for sharing. I have a continuing question, this changes the cells to the new type but the columns are still any. How do we change columns too?

    Thanks.

    Reply

    • Hi Sonali,
      good idea: I’ve created a function that automatically transforms all column types to the types of the fields of their first rows here: http://wp.me/p6lgsG-E5
      Of course, this doesn’t make sense if you have different types in one column, as only one type can be assigned to a column.
      Cheers, Imke

      Reply

      • Thanks for the quick response Imke, really appreciate it. The new function though works fine so far as it hits a null column and errors out. Any ideas how to overcome that?

        Reply

        • Hi Sonali,
          do you mean the function above? I cannot replicate that behaviour. Please check out this code with some sample data – it works for me:

          let
          Function = (Table, ListOfColumnNames) =>
          let
          Table.ClassifyMixedColumnTypes.m = Table.TransformColumns(Table.TransformColumnTypes(Table, List.Transform(ListOfColumnNames,
          each {_, type text})), List.Transform(ListOfColumnNames,
          each {_ , each try Number.From(_) otherwise try Date.From(_) otherwise try Text.From(_) otherwise _}))
          in
          Table.ClassifyMixedColumnTypes.m,

          Table = #table({"A","B","C","D"},{{"5.5","01.01.2016","1",""},{"s","10","1","2"},{"01.01.2017","asdf","1","3"},{"42736.00","01-01-17","","4"}}),
          CallFunction = Function(Table, Table.ColumnNames(Table))
          in
          CallFunction

          Reply

Leave a Reply