Web Scraping 2: Scrape multiple pages at once and refresh in Power BI service

In the first post of this series I’ve explained how to simultaneously load and combine multiple (similar structured) tables from ONE PAGE. This post handles the other direction, how to combine tables from DIFFERENT PAGES into one (or to scrape multiple webpages at once).

Overview

Theoretically, this is quite a simple process:

  1. Create a query that fetches and transforms the data from one webpage
  2. Transform this query to a function
  3. Apply this function to a table that holds codes and URLs for the multiple websites

Sources

Reza Rad has described this process beautifully here. The only things that’s missing there is the trick to make these queries refresh in PBI service. My solution is inspired by the work of Stacia Varga and Chris Webb:

My method

Instead of using the RelativePath-field for the dynamic path-elements, I use just the Query-field for the dynamic path instead. But I use the syntax for the relative path, which makes creating the query a bit easier in my eyes. It starts with the leading element (yellow), indicating the position where it will be inserted within the URL-string. Then comes the parameter (green), followed by the rest of the URL (purple).

Check out my video also where I show some useful tricks to create robust web queries that work for multiple pages and also in the future:

Enjoy & stay queryious 😉

Comments (2) Write a comment

  1. Pingback: Power Query “Folder Import” auf Basis einer Excel-Fileliste realisieren – Linearis :: Self-Service Business Intelligence

Leave a Reply