Web Scraping 2: Scrape multiple pages at once and refresh in Power BI service

In the first post of this series I’ve explained how to simultaneously load and combine multiple (similar structured) tables from ONE PAGE. This post handles the other direction, how to combine tables from DIFFERENT PAGES into one (or to scrape multiple webpages at once).

Overview

Theoretically, this is quite a simple process:

  1. Create a query that fetches and transforms the data from one webpage
  2. Transform this query to a function
  3. Apply this function to a table that holds codes and URLs for the multiple websites

Sources

Reza Rad has described this process beautifully here. The only things that’s missing there is the trick to make these queries refresh in PBI service. My solution is inspired by the work of Stacia Varga and Chris Webb:

My method

Instead of using the RelativePath-field for the dynamic path-elements, I use just the Query-field for the dynamic path instead. But I use the syntax for the relative path, which makes creating the query a bit easier in my eyes. It starts with the leading element (yellow), indicating the position where it will be inserted within the URL-string. Then comes the parameter (green), followed by the rest of the URL (purple).

Check out my video also where I show some useful tricks to create robust web queries that work for multiple pages and also in the future:

Enjoy & stay queryious 😉

Web Scraping 1: Combine multiple tables from one page in Power BI and Power Query

This is a step-by-step description of how to combine multiple tables from one webpage into one table with categories as output. You can also apply this technique to combine tables from other sources as well (like from folder method for example or multiple different webpages (see in an upcoming article)).

Sometimes the page you want to scrape has multiple tables like here:

0 – Combine multiple tables into one: Input

And you want to combine them into 1 with a Category-column like so:

1 – Combine multiple tables into one: Result

Overview

I will present 2 methods here:

  1. Append-method: This is the obvious one and is fast for just a few tables.
  2. Add-Column-method: A bit more complicated but will be faster for a large number of tables and is also suitable for a dynamic number of tables.

You will also find 2 options at the end of this article:

  1. Use custom functions for multi-step table transformations
  2. Use dynamic filters to select the desired tables

 

Append method

Read more