Web Scraping 1: Combine multiple tables from one page in Power BI and Power Query

This is a step-by-step description of how to combine multiple tables from one webpage into one table with categories as output. You can also apply this technique to combine tables from other sources as well (like from folder method for example or multiple different webpages (see in an upcoming article)).

Sometimes the page you want to scrape has multiple tables like here:

0 – Combine multiple tables into one: Input

And you want to combine them into 1 with a Category-column like so:

1 – Combine multiple tables into one: Result

Overview

I will present 2 methods here:

  1. Append-method: This is the obvious one and is fast for just a few tables.
  2. Add-Column-method: A bit more complicated but will be faster for a large number of tables and is also suitable for a dynamic number of tables.

You will also find 2 options at the end of this article:

  1. Use custom functions for multi-step table transformations
  2. Use dynamic filters to select the desired tables

 

Append method

Call your webpage:

2 – Import from Web

3 – enter URL

Next select the tables you need in the navigation pane:

4 – Select relevant tables

This will return 3 separate queries:

5 – different queries are generated automtically

You then add a column “Category” on each of these queries and state which is which (“Top Gainers”, “Top Decliners” and “Top Actives”):

6 – create new column

7 – Column name and category value

Append all these queries:

8 – Append as new

Add all queries to be appended to the right window:

9 – Select all tables to append

Add column method

The first method can get a bit tedious if you have a large number of queries, then this might be better:

Select just one table from the navigation pane (no matter which) and delete all steps apart from the 1st (Source):

10 – delete steps for 2nd option

That will return a table with all content elements of the page:

11 – All content elements in one list

Add an Index column:

12 – Add Index column

and filter the desired tables (here: Index 2,4 and 6) (an option for a dynamic filter is described at the end of this article)

13 – select relevant tables from index column

A sneak in to the table contents shows us that the headers are in the first rows of the table:

14 – sneak shows headers in first row

So we add another column where we promote the headers like this:

15 – Add new column where headers are promoted

Then you can use a conditional column for the Categories:

16 – Add conditional column to allocate the category values

Now remove all columns you don’t need and expand “WithHeaders”:

16 – Expand new column and deselect unwanted columns

Option1: Use custom functions

If the transformations on the tables contain multiple steps instead of just one here (Table.PromoteHeaders), you can use a custom function and use that in the Add-Column-step instead like Matt has described here. Just be aware, that custom functions don’t work in the service (at the time of writing).

Option2: Dynamic filter

If you are not sure that your desired tables always sit on the same relative position in the webpage and therefore the use of an index will not work, you can use more dynamic approaches for filtering like this for example:

Add a column which tries to access the first value in “Column3” of the Data-table:

17 – Dynamic filter option

If that returns “Symbol”, it’s your table and you can filter on that instead. As you can see, this formula might return errors:

18 – Errors are returned if the column “Column3” is not part of the Data-Table

So you have to remove them first before your filter on that column will work:

19 – Remove rows with errors

If you don’t like the conditional column to create the categories, watch out for my next article where I show more options to add columns to a table.

Enjoy & stay queryious;-)

Comments (3) Write a comment

  1. Excellent advice, particularly on use of custom functions.

    I also appreciate you showing a ‘real world’ example to explain the theory. This helps me conceptualize the benefits.

    Thank you.

    Reply

  2. Pingback: Web Scraping With Power BI – Curated SQL

Leave a Reply