Tips to download files from webpages in Power Query and Power BI

When downloading data from the web, it’s often best to grab the data from APIs that are designed for machine-to-machine communication than from the site that’s actually visible on the screen. Not only is the download usually faster, but you also often get more additional parameters that can be very useful. In this article I’m going to show you how to retrieve the relevant URLs for downloading files from webpages (without resorting to external tools like Fiddler) and how to tweak them to your needs.

Retrieving the URL to download files from webpages

Say I want to download historical stock prices from this webpage:

https://finance.yahoo.com/quote/AAPL/history?p=AAPL

The screen will show a link to a download:

Tips to download files from webpages in Power Query and Power BI

Webpage with button to download csv file

If I click on the button, a download dialogue will appear and some browsers will even show me the URL that’s behind it:

Tips to download files from webpages in Power Query and Power BI

Download dialogue with link shown at the bottom of the screen

But when I close the dialogue, the URL will disappear as well. Fortunately in some browsers, you will be able to grab that URL in the options of the link to the downloaded file like in Chrome-based browsers like so:

Grab download link from options under elipsis

But there are other ways as well. Sometimes, a mouse right-click on the download-button reveals a link that takes me to the download link:

But that’s not guaranteed to work everywhere as well. The last resort for me is to inspect the element: Rightclick the download link and choose “Inspect” (or “Inspect element”) instead:

This opens up the full monty of your site. You should then be able to find the URL near the highlighted position (indicating the element that you’re inspecting):

Tips to download files from webpages in Power Query and Power BI

Grab URL to download files from webpages

Tweaking the URL

Now let’s examine the catch and see how we can exploit it:

https://query1.finance.yahoo.com/v7/finance/download/AAPL?period1=1504461150&period2=1586083550&interval=1d&events=history

The first part up until the “?” is the main query, but after the question mark we see 4 query parameters:
Start- and Endate, interval and events. In this case, they correspond to the options on the webpage itself. Playing around with the parameters and checking the resulting URLs reveal that “wk” can be used to retrieve weekly data and “mo” is the abbreviation for monthly data. Leaves the question how to decipher the parameters for start- and endate.

As it turns out, they are noted as Unix timestamps . They represent a timestamps as the number of seconds after the start of 1st January 1970. So to transform a date to it, one has to:

  1. Determine its distance to 1/1/1970: Subtract #duration(25569,0,0,0) and then
  2. Convert it to seconds: * 86400

This is what the formula would look like:

Number.From ( DateTime.From ( DateTimeInput )  - #duration ( 25569,0,0,0 ) )  * 86400

Now you’re able to determine the interval for the download dynamically from whatever the refresh of your data reveals. Hope you found this useful.

BTW: The formula to calculate it the other way around looks like so:

#datetime ( 1970,1,1,0,0,0 ) +  Duration.From ( UnixTimestamp / 86400 )

Closing with a general hint on how you have to adjust the queries if they shall be refreshable in the Power BI service: http://blog.datainspirations.com/2018/02/17/dynamic-web-contents-and-power-bi-refresh-errors/

Enjoy & stay healthy & queryious 😉

Comments (2) Write a comment

Leave a Reply

Die Datenschutzbestimmungen finden Sie hier: / Please find the privacy policy here: https://wp.me/P6lgsG-Rz