MrExcel Publishing
Your One Stop for Excel Tips & Solutions

Scrape Webpages Using Power Query


October 25, 2017 - by Bill Jelen

Scrape Webpages Using Power Query

Power Query is pretty powerful. But I am about to do Power Query Squared... writing a query for one page and then having Excel perform the same query for a whole list of web pages.


Watch Video

  • Today's trick is adapted from the M is for Data Monkey book
  • Build a query to get data from one web page
  • Edit the query to change it into a function with (VariableName)=> before Let
  • Change the hard-coded URL to VariableName
  • Rename the query to fxWeather
  • Close & Load. The data will disappear.
  • Use Excel Trickery to create a table of all URL's
  • Create a query from that table.
  • Add a new column of Weather =fxWeather([URL])
  • Expand the column. Uncheck Prefix
  • Amazing!

Auto-Generated Transcript

  • Learn Excel from MrExcel podcast episode
  • 2056 power query squared hey welcome
  • back to the mr. cell net cast I'm bill
  • chillin I came across this trick while I
  • was preparing to do a seminar at a
  • conference in Dallas called axela
  • Palooza greatest Excel conference name
  • ever you should check it out
  • every September in Dallas and credit to
  • Ken polls and Miguel Escobar because I
  • had to do an hour on power query so of
  • course I pulled out their awesome book
  • the world's greatest book on power query
  • I was flipping through the book and I
  • saw they had one technique and I said
  • wait a second I'm gonna see if I can
  • adapt this technique and this to me is
  • awesome and power query and here's what
  • we're gonna do we're gonna take one
  • power query and then we're gonna run
  • that power query that query dozens of
  • times all right and so the example that
  • I came up with was where I wanted to
  • pull data from a web page all right and
  • I went out to I just looked for some web
  • pages I could that I could use as an
  • example I ended up at Weather
  • Underground and here's the URL and you
  • can see that I was in Dallas Fort Worth
  • so we're pulling data for Dallas and it
  • looks like January 2nd 2015 so right
  • there in the URL is the parameters right
  • and that is a URL that is just ripe for
  • making this thing work we'll take a
  • quick look at the web page although it's
  • not so important see there's a lot of
  • different data out here on the web page
  • and I decided I was just going to try
  • and get precipitation and high and low
  • temperature and here's that query and
  • let me tell you right here that this
  • podcast is not about how to create this
  • query new query from other sources
  • from web specify the URL and then a
  • bunch of steps that I'm not going to
  • detail here to get my final answer of
  • max 10 min temp and precipitation the
  • point is you're going to take your own
  • query and make it work for a bunch of
  • things so I click close and load and
  • this query is working it's returning my
  • one row everything is awesome and I'm
  • going to come back in I'm going to edit
  • this query and I'm going to go to view
  • advanced editor
  • I'm gonna take this query and I'm gonna
  • make it be a function alright so right
  • here before the word let I press ENTER
  • and in parentheses I'm gonna give it a
  • variable my URL and then equal sign
  • greater than a little arrow there right
  • cool
  • and then down here where they have the
  • URL in quotes I want to get rid of the
  • entire URL including the quotes and then
  • type my variable name my URL all right
  • so what we're saying is we're gonna pass
  • it a URL and it's gonna do that same
  • query but with whatever URL we happen to
  • pass it now a couple of of disconcerting
  • things here when I click done man all my
  • applied steps are gone in there you want
  • me to enter a parameter just ignore all
  • that we're gonna rename this we're gonna
  • call it FX weather FX of course being
  • the abbreviation for function and you
  • have to really remember this name and
  • remember which letters are capitalized
  • that's going to be very important in a
  • couple of minutes home close and load
  • and bam everything is gone oh no but
  • that's okay all right so we know it's
  • there it's a connection only now I'm
  • gonna come over here and this is just
  • straight old Excel alright so here's the
  • URL I broke it out into the first part
  • of the you are the ending part of the
  • URL I know that I need to take the date
  • I need to format it in this weird format
  • of year month and day so I use the text
  • function to do that put in a starting
  • date here I can even change the airport
  • so now I'm back and forth let's do MCO
  • for Orlando and let's do some recent
  • data so it'll start at 10 1 2016 alright
  • so now we have this great little table
  • set up here in by the way it does have
  • to be a table you have to use format as
  • table or ctrl t so you know this is just
  • grabbing that date and then plus 1 plus
  • 1 plus 1 I format it I build the URL
  • alright now we're gonna build a query
  • from this table alright and there's my
  • information I'm going to add a new
  • column add a custom column the columns
  • were going to be called weather and the
  • form is going to be equal FX weather
  • make sure it's the exact same case same
  • upper and lower case letters and we'll
  • insert that field called URL like that
  • closing parenthesis no syntax error
  • click ok they want to know about privacy
  • here this is all public data click Save
  • all right so there's our date it's funny
  • they've actually changed my format into
  • something that I it doesn't look like
  • what I started out with
  • and then here's weather with the expand
  • symbol so I'm gonna click the expand
  • symbol uncheck use original column name
  • is prefix I want the max the min the
  • precipitation click OK alright and now
  • all I need is the date and that
  • information out there so I'll
  • right-click and remove this column right
  • click and remove this column over here I
  • don't need that time so I'll say that
  • this is just a date
  • alright see what it's doing every date
  • that I'm passing it it's returning the
  • high the low and the precipitation for
  • Orlando every row here is going out to a
  • different web page just imagine if it
  • wasn't 15 rows but 5,000 rows you'd set
  • it up to run overnight
  • I used to write macros for this in fact
  • one of the web pages out at MrExcel is
  • how to build a macro to scrape web pages
  • from a thousand different web pages at a
  • site not necessary anymore
  • with power query now when I close and
  • load it's funny the preview here is
  • showing me all the results when I close
  • and load they're actually going to go do
  • each query and so right now we have
  • preview rows loaded and it will take a
  • good long time for this information to
  • get updated so go to lunch do something
  • especially if you're doing more than 15
  • rows and it's funny the preview is
  • correct but they're still going out and
  • could chunka chunka chunka ng through
  • each individual row and there it's
  • loaded is this amazing or what hey I do
  • a lot of Excel seminars the power Excel
  • seminar in Orlando Florida look at these
  • beautiful temperatures we have down here
  • on November 4 2016 my afternoon will be
  • all about power query power bi power
  • pivot power map so I'm going to invite
  • you down to Orlando to check out this
  • seminar there'll be a link there
  • the top right hand corner alright so
  • recap today's trick is from this book M
  • is for data monkey we build a query to
  • get one webpage and then edit that query
  • to change it into a function so right
  • before the let statement variable name
  • equals greater than and then change the
  • hard-coded URL to whatever that variable
  • name is rename the query it affects
  • whether close and load that data
  • disappears and then we use some sort of
  • Excel trickery to create a table of all
  • the URLs we want to crawl create a query
  • from that table this has to be a ctrl T
  • table add a new column of weather equals
  • equal FX weather and again it has to
  • match the case there square brackets URL
  • expand that column uncheck prefix BAM it
  • is amazing
  • thanks to canon miguel for writing this
  • book thanks to you for stopping by hope
  • to see you in Orlando on November 4th
  • 2016 see you next time for another net
  • cast from MrExcel

Download File

Download the sample file here: Podcast2056.xlsm

Title Photo: Tama66 / Pixabay


Bill Jelen is the author / co-author of
Excel Subtotals Straight to the Point

I used to use the Subtotals feature daily after downloading mainframe data. This book covers every tip and trick for using Subtotals.