Import text from pdf files in Power BI

While Power BI will soon provide functions to import tables from pdf-files, there might be occasions when you actually need to import text from pdf files (in unstructured form). With a little help from R in Power BI you can do exactly that. (And don’t worry: No need to learn R here: The necessary R-code is already included in my function below. All you need is to have R installed your machine). Please also note that at the time of writing the refresh of these queries in the service is only supported with the personal gateway and not with the enterprise version.

Prerequisites

You can use the function below just like a normal M-function, just pass the (URL- or file-) path to it. All you have to take care of is that a instance of R is running on your machine. If this is new to you, check out Ruth Pozuelo’s video showing all the necessary steps: How to install R for Power BI

There is one package required: pdftools. The video above also shows how to install it.

Function

Import text from PDF files:

You can try calling this function for a pdf-file from the internet like the M formula language specification like this:

ImportPdfText(“http://download.microsoft.com/download/8/1/A/81A62C9B-04D5-4B6D-B162-D28E4D848552/Power%20Query%20Formula%20Language%20Specification%20(October%202016).pdf”)

If you want to import local files from your computer, just paste the full file-path instead of the URL. You don’t have to care about the direction of the slashes, both versions (forward and backwards) are accepted.

How to use

The script will return a table with one row for each page in the pdf-file by default. But it has an optional 2nd parameter that will return one row per pdf-text-line instead, if you put 1 into it. A page index and a row index will help navigating the result.

The 3rd parameter is an optional owner password for the pdf and the 4th the optional user password. If you’re using them, you have to enter null for the previous optional parameters. The following example shows how to use a user password while leaving the others “empty”:

ImportPdfText(“MyPdfPath”,null, null, “MyPassword”)

Enjoy & stay queryious 🙂

Remove repeating characters from a string in Power BI and Power Query

Repeating spaces often cause problems when cleaning up your data. My new function “Text.RemoveRepeatingCharacters” can come to the rescue here.

Imagine you have a table like this:

Challenge

To further work with this data, it would often be best if there was just one space between the words and not many.

The following function will do this for you:

Function Text.RemoveRepeatingCharacters

How to use

It takes 2 arguments: The Text/string and the Delimiter. So for our example you have to enter a space, but you can use any sign there for other use cases as well.

How it works

It splits the text up into a list using the delimiter from the 2nd parameter (4: TextToList). Where one delimiter directly follows another, the element in the list will be empty. The next step (5: FilterList) then filters the list and removes these empty fields. In the last step (6: Result) the remaining (non-empty) fields will be reassembled, using the delimiter again. That way, just one delimiter will be left.

Enjoy & stay queryious 🙂

Date.DatesBetween to retrieve dates between 2 dates in Power BI and Power Query

Today I’m sharing a handy function with you that allows you to retrieve all or just a couple of dates between 2 given dates: Date.DatesBetween.

Usage

This function takes 3 parameters:

  1. From- or Start-date
  2. To- or End-date
  3. A selection of ONE of these intervals: Year, Quarter, Month, Week or Day

All dates will be created at the end of the chosen interval: So if you want to analyse events with a duration for example, where you want to transform your data to show one day per (monthly) event, this function generates month-end-dates for every month within the timespan. Please not that if the To-/End-date is within a month, the last element of the list will NOT be that day, but the day of the end of that month.

The default-value for the 3rd parameter is “Day”, so if you omit the specification, the function will return a list of all days in between.

Definition

How it works

This function uses Function.Invoke to create a compact code. In step “CaseFunction” (rows 12-18), a table is created that contains all the elements for a case-selection. The first column contains the case that is selected by the 3rd parameter. The other columns contain the functions and expressions for the cases.

Table with all different elements for each case

Step “Case”  filters that table down to one remaining record:

Case = CaseFunctions{[Case = TimeInterval]},

It uses a special row-selector { [NameOfTheColumn = Condition] } that only works for columns containing unique keys.

Step “DateFunction” has the command that creates the list of dates:

DateFunction = List.Transform({0..Case[NumberOfAddedTIs]-1}, each Function.Invoke(Case[LastDateInTI], {Function.Invoke(Case[TypeOfAddedTI], {From, _})} ))

Starting with creating a list with the length of the number of intervals (yellow). The next evaluation step uses  Function.Invoke (green): This allows us to work with the function name as a variable from our Case-record. So whatever has been selected for “TypeOfAddedTI” (Time Interval) will be executed with From and the respective number from our list as its parameter (like: Date.AddWeeks(From, 2) for the third item of the list if “Week” is selected. The last evaluation-step (orange) will be Function.Invoke(CaseLastDateInTI) which shifts every returned date at the end of its Interval (here: Date.EndOfWeek).

Of course, this coding style is a matter of taste, but I found it worth sharing.

Enjoy & stay queryious 😉

Right Aligning Text in Power BI: Format Improvements for Easy Profit&Loss Reports

Edit 14-Dec-2017: Now that we can right-align text measures in PowerBI, a SWITCH-measure like here: http://www.thebiccountant.com/2017/04/24/kpis-in-easy-profit-and-loss-for-powerbi/#comment-719 is the best alternative in my eyes. No need to read on 🙂

As shown in my last part of the Easy P&L-series, Power BI unfortunately still lacks some fundamental formatting options like:

  1. Right aligning text (please vote for it here: Right align text in Power BI – edit 15th Nov: We’re there: Right aligning text is available now: https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-november-2017-feature-summary/)
  2. Display numbers in different formats within one column (either to be implemented as a “neutral” format for Switch-measures, where the referenced measures carry the formatting attributes already, or as a part of a formula-based conditional formatting) (Thanks Matt for the voting-link: Conditional format SWITCH measure)

So for the moment I choose between the following workaround-options:

  1. Display %-values in a separate column
  2. Format numbers as text and fill up with spaces so that all end up right aligned
  3. See the suggestion from Matt Allington in the comments below (very nice)

Right aligning text or percentage figure in new column

For both options the preparations are the same:

Read more

Bulk-extracting Power Query M-code from multiple pbix files in Power BI

If you want to audit or analyse the M-code of multiple Power BI pbix-files at once, you start with either:

  1. a from-folder query where you filter all files of interest or
  2. a table with the full file-path-specification of the files to be analysed in “Column1”.

Then you add a column where you call the function that extracts the M-code:

Function to extract the M-code

This code is a variation of Igors function which retrieves the code from an opened pbix-file. So now you can apply it to closed pbix-files as well.

For method 1 you call it like so (as it takes the full string for the file-path as its parameter):

Read more

Create a function library in Power BI using M-Extensions

Having the ability to use own M-function the same way than native functions in Power BI and Excel has been one of my biggest wishes for quite some time. So I was more than amazed to see Frank Tonsen’s comment showing a way to do exactly this in PowerBI, that has been available for almost half a year now: M-Extensions as part of the custom connectors.

Unlike custom connectors who show up in the import-dialogue and provide a custom tailored option for importing data or creating queries, M-Extensions don’t show up explicitly anywhere in Power BI: They just do their M(agic) job to make the functions that you’ve defined in them accessible, as if they were inbuilt native functions: Type their name into the formula bar like this (1):

And enjoy the function description (2: if you’ve specified it in the definition, which is optional):

Simplest example

  1. Your functions: Number.Double and Number.Triple
  2. Combined with the keyword “shared” and separated by “;”
  3. Prefix by “section MyLibrary” gives this text:
section MyLibrary;
shared Number.Double = (Number as number) =>
2 * Number;
shared Number.Triple = (Number as number) =>
3 * Number;

 

How to make M-Extensions work

Read more

Custom Connector to import Google Sheets with OAuth2 authentication in PowerBI

Recently I came across the need to connect to Google Sheets with a secure authentication process quite often, so I will share with you how and to what extend I got the custom connector working that I found here. It uses OAuth2 authentication, so you can share your workbook with selected colleagues and they will be prompted to enter their credentials in Power BI if they try to access these files.

Edit: As it turns out, the credentials work for all Google accounts. So you can download my .mez  and simply paste it into your Custom Connectors-path without touching Visual studio ( create path: [My Documents]\Microsoft Power BI Desktop\Custom Connectors ). I believe this will work for the 1st 100 users and then you have to create you own. But if you want to use it in production, I’d strongly recommend to create your own anyway (otherwise continue with section “Use Google Sheets Data Connector in Power BI Desktop”) :

Setup Google API

Go to the Google Developer API and if you don’t have a project yet, just create one:

Go to “Credentials” -> Create credentials and choose “OAuth client ID”:

Choose “Web application”, adjust the “Name” if you like and paste the redirect-url into “Authorized redirect URLs”: https://preview.powerbi.com/views/oauthredirect.html

This will return the client ID and secret for your connector:

 

Adjust the connector in Visual Studio

Read more

Number.Mod rescue pack for Power BI and Power Query

If you use the M-function Number.Mod in Power BI or Power Query and expect the same result like in Excel or DAX, you are probably in good company.

But if the signs of the number and the divisor are not the same, M will differ from Excel and DAX:

Number.Mod in M is different

This is by design, so you can use this this formula instead, if you need matching results:

[Number] – [Divisor] * Number.RoundDown( [Number] / [Divisor] )

Enjoy & stay queryious 🙂

How to open a complex JSON record in Power BI and Power Query

Today I’ll show you a very useful technique how to deal with a JSON record that contains a wild mixture of different elements like this:

If you click on one of the expandable elements, their content will be shown, but you’ll loose all the “surrounding” information (metadata) that is visible now. This is often an issue, regardless if you want to create multiple tables from it to build a star-schema or just need a handful of fields or a denormalized table. But with a little help from M, you’re good to go:

Table.FromRecords( { MyJsonRecord } )

Will returns this:

With this move, every expansion of one of the expandable elements will keep the existing data in place:

Create one big flat table

Simply expand one element after each other to create a denormalized table

Create star schema

For multiple tables, keep this query and reference it to create you (sub-)tables. Always keep the Id-column as the key (!) to combine all the tables in your data model later. (Provided you use this in a function for multiple entities/series)

Best is to play with it, so just past this code into the advanced editor:

 

If your JSON-record has a different structure with “just” header and data in different fields, this technique will be more suitable for you: http://www.thebiccountant.com/2016/04/23/universal-json-opener-for-quandl/

Enjoy & stay queryious 🙂