Create a Dimension Table with Power Query: Avoid the case sensitivity bug!

Creating a Dimension table from a fact table using Power Query is really straightforward using the Remove Duplicates function.

However – you might experience a problem if the key to your Dimension table that you’re extracting from the Fact table is text and not number format. Power Query is case sensitive and will consider “Car” and “car” as different, returning both after the remove duplicates step. Once you load this into your Power Pivot data model, it will be shown there as “Car” and “Car” or “car” and “car”, depending on which term was the first in the list (will always take the first one).

This further means that you will not be able to connect you new Dimension table to your Fact table as the Dimension table now has dups.

To overcome this (and because it might be good practice anyway):

Create a numerical index column that you use as your new key. So start off with creating your new Dimension table from your Fact table, called “Dimension”:

let
    Source = Excel.CurrentWorkbook(){[Name="YourFactTable"]}[Content],
    RemoveDups = Table.Distinct(Source, {"DimKey"}),
    AddIndex = Table.AddIndexColumn(RemoveDups, "Index", 0, 1),
    RemoveCols = Table.RemoveColumns(AddIndex,{"FactKey", "FactAttr"})
in
    RemoveCols

This is basically a distinct, added index and removal of all the columns that should stay in the Fact table.

Then create your new Fact table, joining with your Dim-Table in order to retrieve the Index-Column. Then just remove the columns that stay in your Dim-Table now:

let
    Source = Excel.CurrentWorkbook(){[Name="Tabelle3"]}[Content],
    Merge = Table.NestedJoin(Source,{"DimKey"},Dimension,{"DimKey"},"NewColumn"),
    Expand = Table.ExpandTableColumn(Merge, "NewColumn", {"Index"}, {"Index"}),
    RemoveCols = Table.RemoveColumns(Expand,{"DimKey", "DimAttr"})
in
    RemoveCols
PQCreateDimTable3.xlsx

 

edit 19 Aug 2015:

If you don’t want to consider Car and car to be different keys but the same, you can unify the values in your key column using: Text.Upper, Text.Lower or Text.Proper.

Enjoy & stay queryious 🙂

 

Comments (2) Write a comment

  1. Hi,
    I’m a bit new to power BI, and still trying to wrap my head around how this works, as it’s crucial for properly joining a couple salesforce tables I have. One table has a 15 digit version of the ID, other has an 18 digit version of the ID. When I take LEFT(18digitID,15) to get them to match, I run into the issue described above, and am told I have duplicates.

    I’m not fully understanding how adding an Index to one of these tables will help me, since the index will be specific to one table and not the other.

    Thanks,
    Matt

    Reply

    • Hi Matt,
      the index will become your new numerical key in my example: First you create it in your fact-table (on distinct old key), then do a “lookup-back” to your whole fact table. In order to mirror this new key to your Dim-table, you merge your fact-table with your dimension-table on their old keys (in PowerQuery or the query editor, where the match will still be case sensitive). There you “lookup” the new numerical key (index) from your fact table to your Dim-table.
      This will produce the numerical key on both tables who will be discovered as distinct from the data model / Power Pivot.

      I’m not sure if this is the solution to the problem you’ve described. The difference to my example seems to be that you don’t start with a (case sensitive) match at all. If your 18digit-table is the Dim-table that is supposed to have distinct keys, you would need to “loose” some rows. (If it is the other way around, then there shouldn’t be a problem with duplicate keys).

      Reply

Leave a Reply