Dynamic Benford’s Law measures in Power BI and Power Pivot

Benford’s Law compares the frequency distribution of leading digits to its (empirically proven) natural counterpart. This can then be used to detect fraud and errors.

Dynamic Benford's Law measures in Power BI and Power Pivot

Comparison between Benford distribution and actual

The green columns show how often each number should be the first digit in numbers that should follow the Benford-distribution. In black you’ll see the actual distribution of first digits within my table. Lastly, the red line shows the percentual absolute deviations between actual and Benford values.

In this example, there is a relatively high occurrence of numbers starting with 4 and 5. So this could be a sign for fraudulent manipulations.

The Benford Distribution

First you need a table with the Benford-distribution. Just load it as a disconnected table to your data model and name it “BenfordTable”. The “Value”-field from this table will be taken as x-axis for the visualisations. As the Benford-distribution is logarithmic, it can quickly be created with the following DAX-code:


BenfordTable =
ADDCOLUMNS (
GENERATESERIES ( 1, 9 ),
"BenfordDistribution", LOG10 ( 1 + ( 1 / [Value] ) )
)

Benford’s law Measure

The measure calculates how often a number starts with one of the BenfordNumbers (1..9) compared to the total number of rows in the FactTable.


BenfordMeasure =
VAR BenfordNumber =
MAX ( BenfordTable[Value] )
VAR CountBenfordNumber =
CALCULATE (
COUNTROWS ( FactTable ),
LEFT ( FactTable[Value], 1 ) * 1 = BenfordNumber
)
VAR CountTotal =
COUNTROWS ( FactTable )
RETURN
DIVIDE ( CountBenfordNumber, CountTotal )

If there are blank in the Value-column of the table to be analyzed, the measure has to be adjusted by filtering them out in the VAR CountTotal: ( CALCULATE(COUNTROWS(FactTable), FactTable[Value] <> BLANK()) )

Please note that you can create as many measures as you need in one model. So if you have multiple columns to investigate, just write a measure for each.

Benford’s law Variance Measure

To calculate the difference between target and actual, I use a MAXX-aggregation. This returns the maximum difference there is for a number. I also use this in a card visual if I want to add a data driven alert. So I don’t have to check and eyeball the chart regularly, but can just use this in a card visual in a dashboard. Then I’ll set a threshold value for the alert and will not miss any alarming developments.


Deviation % =
MAXX (
ADDCOLUMNS (
BenfordTable,
"Diff", ABS ( [BenfordMeasure] – BenfordTable[BenfordDistribution] ) / BenfordTable[BenfordDistribution]
),
[Diff]
)

 

Why DAX?

I’m using measures here instead of a calculated column (in the Benford-Table) because this allows me to filter and slice my table. This allows for making advanced and flexible analysis like comparing different values against each other or over time against the Benford distribution:

Dynamic Benford's Law measures in Power BI and Power Pivot

Benford’s Law Charts: Comparisons with various dimensions

 

Enjoy & stay queryious 😉

Comments (10) Write a comment

  1. What happens with 0.01? I think it should be Benfords Law number 1, because the first non zero digit is 1. Therefore I multiply the absolute value by 100 for transactions and then take the first digit. Also I use ABS to eliminate “-” signs.

    Reply

    • Yes, you should prepare your figures accordingly before you apply the measures.
      Thanks for the hint!
      /Imke

      Reply

  2. Hey can you help me understand what i should change if i want to addapt the measure to reach the first 2 digits?

    Reply

    • Hi John,
      you need to adjust the Benford table like so:

      Benford_Table_1st2digits =
      ADDCOLUMNS(
      GENERATESERIES(10,99)
      ,"BenfordDistribution", LOG10(1+(1/[Value]))
      ,"FirstDigit", LEFT([Value],1)
      ,"SecondDigit", RIGHT([Value],1)
      )

      and in row 7 of the measure you have to take the 2 digits from the left: LEFT ( FactTable[Value], 2 ) * 1 = BenfordNumber

      /Imke

      Reply

  3. How would you get this logic to work on journal entries? Negative numbers as -234 and numbers leading with zero as 0.001 will be able to occur.

    Reply

    • Hi Casper,
      if negative numbers have to be considered as well, you have to take the absolute values instead (ABS-function).
      If you’re one scale with most below 1, then you should multiply them by a factor of 10 (10,100, 1000…) so that the first non-zero digits will sit before the decimal separator.
      /Imke

      Reply

  4. Hi. Thanks for this, it is very helpful. I have a question, when I click in the chart (actual distribution), it does not show the list of details (say list of all amounts start with number 9 when I click value 9 on X axis for example). How to make this link? Thanks.

    Reply

  5. Hi How can I modify this to Last 2 Digits,2nd Order and Summation when my Columns has decimal Number ranging from 6.42 to 7937938.04

    Reply

Leave a Reply