Create a Dimension Table with Power Query: Avoid the case sensitivity bug!

Creating a Dimension table from a fact table using Power Query is really straightforward using the Remove Duplicates function.

However – you might experience a problem if the key to your Dimension table that you’re extracting from the Fact table is text and not number format. Power Query is case sensitive and will consider “Car” and “car” as different, returning both after the remove duplicates step. Once you load this into your Power Pivot data model, it will be shown there as “Car” and “Car” or “car” and “car”, depending on which term was the first in the list (will always take the first one).

This further means that you will not be able to connect you new Dimension table to your Fact table as the Dimension table now has dups.

To overcome this (and because it might be good practice anyway):

Create a numerical index column that you use as your new key.

Two queries to success

1) Your new dimension table

So start off with creating your new Dimension table from your Fact table, called “Dimension”:

let    
    Source = Excel.CurrentWorkbook(){[Name="YourFactTable"]}[Content], 
    RemoveDups = Table.Distinct(Source, {"DimKey"}),
    AddIndex = Table.AddIndexColumn(RemoveDups, "Index", 0, 1),
    RemoveCols = Table.RemoveColumns(AddIndex,{"FactKey", "FactAttr"})
in
    RemoveCols

Name this query “Dimension”. This is basically a distinct, added index and removal of all the columns that should stay in the Fact table.

2) Your new fact table

Then create your new Fact table, joining with your Dim-Table on “the old key” (“DimKey” in my example below)  in order to retrieve the Index-Column. As this will be done in a case insensitive way in Power Query, “CAR” and “car” will actually continue to be considered different and will get 2 new “Index”-keys. Then just remove the columns that stay in your Dim-Table now:

let
    Source = Excel.CurrentWorkbook(){[Name="YourFactTable"]}[Content],
    Merge = Table.NestedJoin(Source,{"DimKey"},Dimension,{"DimKey"},"NewColumn"),
    Expand = Table.ExpandTableColumn(Merge, "NewColumn", {"Index"}, {"Index"}),
    RemoveCols = Table.RemoveColumns(Expand,{"DimKey", "DimAttr"})
in
    RemoveCols

Name this query “MyNewFactTable”.

Make sure to use the new “Index”-columns as your keys in your model.

PQCreateDimTable3.xlsx

edit 19 Aug 2015:

If you don’t want to consider Car and car to be different keys but the same, you can unify the values in your key column using: Text.Upper, Text.Lower or Text.Proper.

Enjoy & stay queryious 🙂

 

Comments (15) Write a comment

  1. Hi,
    I’m a bit new to power BI, and still trying to wrap my head around how this works, as it’s crucial for properly joining a couple salesforce tables I have. One table has a 15 digit version of the ID, other has an 18 digit version of the ID. When I take LEFT(18digitID,15) to get them to match, I run into the issue described above, and am told I have duplicates.

    I’m not fully understanding how adding an Index to one of these tables will help me, since the index will be specific to one table and not the other.

    Thanks,
    Matt

    Reply

    • Hi Matt,
      the index will become your new numerical key in my example: First you create it in your fact-table (on distinct old key), then do a “lookup-back” to your whole fact table. In order to mirror this new key to your Dim-table, you merge your fact-table with your dimension-table on their old keys (in PowerQuery or the query editor, where the match will still be case sensitive). There you “lookup” the new numerical key (index) from your fact table to your Dim-table.
      This will produce the numerical key on both tables who will be discovered as distinct from the data model / Power Pivot.

      I’m not sure if this is the solution to the problem you’ve described. The difference to my example seems to be that you don’t start with a (case sensitive) match at all. If your 18digit-table is the Dim-table that is supposed to have distinct keys, you would need to “loose” some rows. (If it is the other way around, then there shouldn’t be a problem with duplicate keys).

      Reply

  2. I may be missing something obvious here, but as PowerBI ignores the case when loading the data and uses the first instance of each key if finds, the load process therefore loses the case sensitivity of the key (I’m referring to a Salesforce token) so I don’t understand how you can then merge?

    Reply

    • Power BI ignores the case when loading the data to the data model. BUT not, when loading the data from the source into the query editor. As this happens before the load to the data model, in that stage you can create the index that serves as the new technical key.
      To make it clear: This index will be re-created everytime you refresh the data. So it might assign different keys on your items with each refresh, depending on the order of your source data.
      I recommend playing with the sample file I’ve attached.

      Reply

  3. Thanks for this solution!
    I’m new to PowerQuery so it took me a while to understand. My confusion came at the Merging stage so I’ll try and explain how I did it for anyone else finding this solution.
    1. Get Data > Load your “Fact file”
    2. Follow the steps to remove the duplicates, excess columns and to add an index
    3. Name as “index” and load (I loaded to data model only)
    4. Get Data > Load your “Fact file” again
    5. Remove duplicates (my file personally also contained some non case related duplication which I forgot about and stumped me for a good 30 minutes!)
    6. Click on Merge and add the columns from your Index column, matched using your ‘problem’ column (the one with case-sensitive duplicates)
    7. Load your fact table, now with an index column.

    As I’m writing this, I realise that what might be possible (and maybe intended from the instructions) is that instead of reloading the fact table, you merge the query directly into the index (hence the instructions to remove excess columns again). If I manage to solve this I’ll re-comment here.

    Good luck!

    Reply

  4. Did anyone ever post a video of this solution? I’m running into this exact issue creating a 1 to many relationship between two tables. I get adding and merging the index column into my fact table, but what about the lookup table in a one to many relationship? If I add an index column into my lookup table, wouldn’t the index number be different from record to record and the relationship wouldn’t work?

    Reply

    • Hi David,
      sorry that my post hasn’t been clear enough. I’ve adjusted the wording to make it a bit clearer (hopefully).
      Just as a reminder: You just create an index-column once (in the first step, when creating the dimension table).
      After that you add the index-column to the fact-table by joining on the original Key-column (! called “DimKey” in my example).
      As that merge in Power Query is case sensitive, “CAR” and “car” would actually get 2 different indices (new index-keys).
      Make sure to use these new index-keys in your model.

      Also, please check out the new download.
      Hope, this works for you – if not, please let me know and I might consider posting a video as well 🙂
      /Imke

      Reply

      • The merge in power query is case sensitive you CAR and car would actually get 2 different indices. Great. However, when I place the dimension and fact columns in my table visual, then it would show me either of of CAR or car and all the facts of CAR and car. How do we fix that?

        Even if I use a slicer, it will consider CAR and car as same.

        Reply

  5. Thanks imke, Your description was fine, I’m just now getting my feet wet with PBI and I just couldn’t wrap my head around the merge part. I did get it to work without using the advanced editor. Here are the steps below that I did to get it to work.

    Get Data and open the Fact Table.
    Add an index column. Close & Apply
    Get Data and open the Lookup Table.
    Click the Merge Queries button under Home/Combine.
    On this window, Select the Key field on Fact table the Key field on Lookup table and press OK.
    This left me with a column that just said Table. At the top, next to the column name, there is a filter icon. I selected that, which showed me all of the column names from my Fact table. I removed everything but the index column and my column was now populated with the index numbers from my Fact table.

    I then used these two column to create the relationship in my model.

    Probably not the BEST way to go about it, but it worked for me! Thank you so much for your idea!!!!!

    Reply

  6. I had a little trouble following this as I’m a little slow when working late. I stumbled upon a different approach that I think works with a little less work if your case sensitive data to join on is a fixed length.

    This is a bit of a convoluted solution – you convert the case sensitive field to their character code with dashes between them and join on that.  The whole endeavor looks ridiculous in code but it does work.  All my ID’s were constant length 15 digit salesforce id’s so I just made it big, long, and ugly, but it could probably be converted to something more generic.  While ugly, it’s easy to reproduce anywhere the id is and then joins work.  The dashes are to ensure that the stray perfect combination of 2 and three digit character codes don’t overlap in a weird way to make a collision.  Hope it’s useful.  I made this a calculated column and joined on it in all the tables that needed the relationship.

    Text.Combine({Number.ToText(Character.ToNumber(Text.At([Id], 0))), “-“, Number.ToText(Character.ToNumber(Text.At([Id], 1))), “-“, Number.ToText(Character.ToNumber(Text.At([Id], 2))), “-“, Number.ToText(Character.ToNumber(Text.At([Id], 3))), “-“, Number.ToText(Character.ToNumber(Text.At([Id], 4))), “-“, Number.ToText(Character.ToNumber(Text.At([Id], 5))), “-“, Number.ToText(Character.ToNumber(Text.At([Id], 6))), “-“, Number.ToText(Character.ToNumber(Text.At([Id], 7))), “-“, Number.ToText(Character.ToNumber(Text.At([Id], 8))), “-“, Number.ToText(Character.ToNumber(Text.At([Id], 9))), “-“, Number.ToText(Character.ToNumber(Text.At([Id], 10))), “-“, Number.ToText(Character.ToNumber(Text.At([Id], 11))), “-“, Number.ToText(Character.ToNumber(Text.At([Id], 12))), “-“, Number.ToText(Character.ToNumber(Text.At([Id], 13))), “-“, Number.ToText(Character.ToNumber(Text.At([Id], 14)))})

    Reply

  7. Hey, my problem is that i want my dashboard to contain URLs. Some of the URLs just differ regarding the upper and lower cases. This means that PowerBI sees them as the same link, wich again means that im not able to land on the right website for one of those links. Is there any way so solve this problem?
    Thanks, already!

    Reply

Leave a Reply