Bulk-extract Power Query M-code from multiple Excel files at once

Some time ago I published a function that extracts all M-code from Power BI (.pbix)-files. Today I publish the pendant to Bulk-extract Power Query M-code from multiple Excel-files at once. The code contains many elements from the before mentioned, so please refer to that article for reference.

How to use

The function below has just one parameter where you either fill in a full filename (incl. path) of an Excel file, or a folder path where multiple files reside. The function will automatically detect the right modus and spit out the M-code.

If you want to analyse code from multiple Excel-files that don’t sit in one folder, you just create a table with a column that hold the full filenames (one in each row). Then create a new column that where you call the function that references the first column (this blogpost shows how to do it ).

You don’t have to edit or unzip the files in advance, this function will do all of that automagically.

The function code


let func =
(FileOrFolderPath as text) =>
let
CreateTable = if Text.End(FileOrFolderPath,5) = ".xlsx" or Text.End(FileOrFolderPath,5) = ".xlsm"
then #table({"Content", "Name"}, {{File.Contents(FileOrFolderPath), FileOrFolderPath}})
else Folder.Files(FileOrFolderPath) ,
FetchQueries = Table.AddColumn(CreateTable, "FetchQueries", each try fnFetchQueries([Content]) otherwise #table({"Column1"}, {{null}})),
#"Removed Other Columns" = Table.SelectColumns(FetchQueries,{"Name", "FetchQueries"}),
#"Expanded FetchQueries" = Table.ExpandTableColumn(#"Removed Other Columns", "FetchQueries", {"Column1"}, {"QueryCode"}),
// Helper functions
// Fetch queries from [Content]-column
fnFetchQueries = (Source as binary) =>
let
UnzipFile = fnUnzipFile(Source),
// The filename where the queries reside is not known in beforehand. Just that it contains "customXml/item"
FilterItemFiles = Table.SelectRows(UnzipFile, each Text.StartsWith([FileName], "customXml/item") and not Text.Contains([FileName], "Props")),
FetchQueriesFromBinary = Table.AddColumn(FilterItemFiles, "fnGetQueriesFromBinary", each fnGetQueriesFromBinary([Content])),
RemoveErrorRows = Table.RemoveRowsWithErrors(FetchQueriesFromBinary, {"fnGetQueriesFromBinary"}){0}[fnGetQueriesFromBinary]
in
RemoveErrorRows,
// Extracts the queries from the binary
fnGetQueriesFromBinary = (GrabItem1 as binary) =>
let
ParseAsXml = Xml.Tables(GrabItem1,null,1252),
GrabText = ParseAsXml{0}[#"Element:Text"],
BinaryFromText = Binary.FromText(GrabText, BinaryEncoding.Base64),
UnzipSection1 = fnUnzipBinary(BinaryFromText, "Formulas/Section1.m"),
TransformBinaryToList = Lines.FromBinary(UnzipSection1),
ConvertToTable = Table.FromList(TransformBinaryToList, Splitter.SplitByNothing(), null, null, ExtraValues.Error)
in
ConvertToTable,
// Unzips the xlsx
fnUnzipFile = (ZIPFile) =>
let
Header = BinaryFormat.Record([
MiscHeader = BinaryFormat.Binary(14),
BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
FileSize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
ExtrasLen = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)
]),
HeaderChoice = BinaryFormat.Choice(
BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
each if _ <> 67324752 // not the IsValid number? then return a dummy formatter
then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
else BinaryFormat.Choice(
BinaryFormat.Binary(26), // Header payload – 14+4+4+2+2
each BinaryFormat.Record([
IsValid = true,
Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
Extras = BinaryFormat.Text(Header(_)[ExtrasLen]),
Content = BinaryFormat.Transform(
BinaryFormat.Binary(Header(_)[BinarySize]),
(x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
)
]),
type binary // enable streaming
)
),
ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
Entries = List.Transform(
List.RemoveLastN( ZipFormat(ZIPFile), 1),
(e) => [FileName = e[Filename], Content = e[Content] ]
)
in
Table.FromRecords(Entries),
//Unzips the binary content
fnUnzipBinary = (binaryZip,fileName) =>
let
//shorthand
UInt32 = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32,ByteOrder.LittleEndian),
UInt16 = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16,ByteOrder.LittleEndian),
//ZIP file header fixed size structure
Header = BinaryFormat.Record([
MiscHeader = BinaryFormat.Binary(14),
CompressedSize = UInt32,
UncompressedSize = UInt32,
FileNameLen = UInt16,
ExtraFieldLen = UInt16]),
//ZIP file header dynamic size structure
FileData = (h)=> BinaryFormat.Record([
FileName = BinaryFormat.Text(h[FileNameLen]),
ExtraField = BinaryFormat.Text(h[ExtraFieldLen]),
UncompressedData = BinaryFormat.Transform(
BinaryFormat.Binary(h[CompressedSize]),
(x) => try
Binary.Buffer(Binary.Decompress(x, Compression.Deflate))
otherwise null)]),
//Parsing the binary in search for PKZIP header signature
ZipIterator = BinaryFormat.Choice(UInt32, (signature) => if signature <> 0x04034B50
then BinaryFormat.Record([FileName=null])
else BinaryFormat.Choice(Header,(z)=>FileData(z))),
ZipFormat = BinaryFormat.List(ZipIterator),
out = List.Select(ZipFormat(binaryZip), each _[FileName]=fileName)
in
out{0}[UncompressedData]
in
#"Expanded FetchQueries" ,
documentation = [
Documentation.Name = " Xlsx.ExtractQueries ",
Documentation.Description = " Extracts all queries from files in folder or xlsx-files ",
Documentation.LongDescription = " Extracts all queries from files in folder or xlsx-files ",
Documentation.Category = " Other ",
Documentation.Source = " www.TheBIccountant.com . https://wp.me/p6lgsG-112 . ",
Documentation.Version = " 3.0 – extracts code from xlsm as well ",
Documentation.Author = " Imke Feldmann: www.TheBIccountant.com. https://wp.me/p6lgsG-112 . ",
Documentation.Examples = {[Description = " ",
Code = " ",
Result = " "]}]
in
Value.ReplaceType(func, Value.ReplaceMetadata(Value.Type(func), documentation))

This function is particularly useful if you want to compare code or audit your Excel-files.

Enjoy and stay queryious 😉

Comments (11) Write a comment

  1. Do you know if there is a way to do this but for all queries contained within the same workbook? For example, if you use #sections[Section1] to get the queries in the file you are in, and then to extract the M code for those queries without referencing an external file?

    Thanks!

    Reply

    • Yes, just check all queries in the query pane -> right-click mouse – copy -> paste into a text editor or your choices and you’ll get the full query-code of all your queries.

      Reply

  2. Hello.
    I apologize for my English, which is not very good.
    Is it possible to extract zip files with password in power query? For example, by setting the zip file password for unpacking directly in the function?

    Thank you for your help.

    Reply

  3. Absolutely fantastic code. Have been searching for something like this for a long time. Thank you very much 🙂

    Reply

  4. I am excited about the code! However, when I read an order path that has more subfolders in it, unfortunately this is not shown in the result.

    Furthermore, it is a pity that you can click through normal queries step by step to see what happens, but this is unfortunately not possible with functions.

    Because so I am unfortunately not able to understand what happens in the code. Therefore the question, is there a way to split this ingenious function into one or more queries to be able to understand the steps?

    Reply

  5. that’s an amazing function!!!

    Is it possible in any way to split the output table QueryCode column into actual queries?
    Split could be done by the shared keyword, the name is there, the code is there… and then creating all the queries just read from the given/referenced file? Thereby, either overwriting existing queries or creating/adding new ones instead? This way one could update all code in a file with the code from a reference file read by your wonderful function… That would be something… 🙂

    Reply

      • Hi Imke,

        thanks for your reply.

        I tried that function ‘MetaQueriesPBIX’ and yes it splits QueryCode column into query name, step name and code.

        What I was asking is, if the QueryCode column information can be converted into real, executable queries? Currently I use Expression.Evaluate(…) to read M-code from lots of separate text files. However, I would prefer to get all query code from a single file (text file created by copy/pasting all queries OR using something like your ExtractQueries). I just don’t know how to split this into single queries and then use Expression.Evaluate on each part or whatever else woks.

        If you have any idea regarding this… 😉

        Reply

  6. Hi Imke, thanks for these great information and resources. This helps me massively to backup and keeping my m code consistent without the need to evaluate expressions from external files.

    Also it allows to set Todo tags and task its in m code and analyze those.

    Do you have resources on how to do the same besides power bi and excel files also for dataflows in power apps platform?

    Reply

Leave a Reply