MrExcel Publishing
Your One Stop for Excel Tips & Solutions

Scrape Webpages Using Power Query


October 25, 2017 - by Bill Jelen

Scrape Webpages Using Power Query

Power Query is pretty powerful. But I am about to do Power Query Squared... writing a query for one page and then having Excel perform the same query for a whole list of web pages.


Watch Video

  • Today's trick is adapted from the M is for Data Monkey book
  • Build a query to get data from one web page
  • Edit the query to change it into a function with (VariableName)=> before Let
  • Change the hard-coded URL to VariableName
  • Rename the query to fxWeather
  • Close & Load. The data will disappear.
  • Use Excel Trickery to create a table of all URL's
  • Create a query from that table.
  • Add a new column of Weather =fxWeather([URL])
  • Expand the column. Uncheck Prefix
  • Amazing!

Auto-Generated Transcript

  • Learn Excel from MrExcel podcast episode
  • 2056 power query squared hey welcome
  • back to the mr. cell net cast I'm bill
  • chillin I came across this trick while I
  • was preparing to do a seminar at a
  • conference in Dallas called axela
  • Palooza greatest Excel conference name
  • ever you should check it out
  • every September in Dallas and credit to
  • Ken polls and Miguel Escobar because I
  • had to do an hour on power query so of
  • course I pulled out their awesome book
  • the world's greatest book on power query
  • I was flipping through the book and I
  • saw they had one technique and I said
  • wait a second I'm gonna see if I can
  • adapt this technique and this to me is
  • awesome and power query and here's what
  • we're gonna do we're gonna take one
  • power query and then we're gonna run
  • that power query that query dozens of
  • times all right and so the example that
  • I came up with was where I wanted to
  • pull data from a web page all right and
  • I went out to I just looked for some web
  • pages I could that I could use as an
  • example I ended up at Weather
  • Underground and here's the URL and you
  • can see that I was in Dallas Fort Worth
  • so we're pulling data for Dallas and it
  • looks like January 2nd 2015 so right
  • there in the URL is the parameters right
  • and that is a URL that is just ripe for
  • making this thing work we'll take a
  • quick look at the web page although it's
  • not so important see there's a lot of
  • different data out here on the web page
  • and I decided I was just going to try
  • and get precipitation and high and low
  • temperature and here's that query and
  • let me tell you right here that this
  • podcast is not about how to create this
  • query new query from other sources
  • from web specify the URL and then a
  • bunch of steps that I'm not going to
  • detail here to get my final answer of
  • max 10 min temp and precipitation the
  • point is you're going to take your own
  • query and make it work for a bunch of
  • things so I click close and load and
  • this query is working it's returning my
  • one row everything is awesome and I'm
  • going to come back in I'm going to edit
  • this query and I'm going to go to view
  • advanced editor
  • I'm gonna take this query and I'm gonna
  • make it be a function alright so right
  • here before the word let I press ENTER
  • and in parentheses I'm gonna give it a
  • variable my URL and then equal sign
  • greater than a little arrow there right
  • cool
  • and then down here where they have the
  • URL in quotes I want to get rid of the
  • entire URL including the quotes and then
  • type my variable name my URL all right
  • so what we're saying is we're gonna pass
  • it a URL and it's gonna do that same
  • query but with whatever URL we happen to
  • pass it now a couple of of disconcerting
  • things here when I click done man all my
  • applied steps are gone in there you want
  • me to enter a parameter just ignore all
  • that we're gonna rename this we're gonna
  • call it FX weather FX of course being
  • the abbreviation for function and you
  • have to really remember this name and
  • remember which letters are capitalized
  • that's going to be very important in a
  • couple of minutes home close and load
  • and bam everything is gone oh no but
  • that's okay all right so we know it's
  • there it's a connection only now I'm
  • gonna come over here and this is just
  • straight old Excel alright so here's the
  • URL I broke it out into the first part
  • of the you are the ending part of the
  • URL I know that I need to take the date
  • I need to format it in this weird format
  • of year month and day so I use the text
  • function to do that put in a starting
  • date here I can even change the airport
  • so now I'm back and forth let's do MCO
  • for Orlando and let's do some recent
  • data so it'll start at 10 1 2016 alright
  • so now we have this great little table
  • set up here in by the way it does have
  • to be a table you have to use format as
  • table or ctrl t so you know this is just
  • grabbing that date and then plus 1 plus
  • 1 plus 1 I format it I build the URL
  • alright now we're gonna build a query
  • from this table alright and there's my
  • information I'm going to add a new
  • column add a custom column the columns
  • were going to be called weather and the
  • form is going to be equal FX weather
  • make sure it's the exact same case same
  • upper and lower case letters and we'll
  • insert that field called URL like that
  • closing parenthesis no syntax error
  • click ok they want to know about privacy
  • here this is all public data click Save
  • all right so there's our date it's funny
  • they've actually changed my format into
  • something that I it doesn't look like
  • what I started out with
  • and then here's weather with the expand
  • symbol so I'm gonna click the expand
  • symbol uncheck use original column name
  • is prefix I want the max the min the
  • precipitation click OK alright and now
  • all I need is the date and that
  • information out there so I'll
  • right-click and remove this column right
  • click and remove this column over here I
  • don't need that time so I'll say that
  • this is just a date
  • alright see what it's doing every date
  • that I'm passing it it's returning the
  • high the low and the precipitation for
  • Orlando every row here is going out to a
  • different web page just imagine if it
  • wasn't 15 rows but 5,000 rows you'd set
  • it up to run overnight
  • I used to write macros for this in fact
  • one of the web pages out at MrExcel is
  • how to build a macro to scrape web pages
  • from a thousand different web pages at a
  • site not necessary anymore
  • with power query now when I close and
  • load it's funny the preview here is
  • showing me all the results when I close
  • and load they're actually going to go do
  • each query and so right now we have
  • preview rows loaded and it will take a
  • good long time for this information to
  • get updated so go to lunch do something
  • especially if you're doing more than 15
  • rows and it's funny the preview is
  • correct but they're still going out and
  • could chunka chunka chunka ng through
  • each individual row and there it's
  • loaded is this amazing or what hey I do
  • a lot of Excel seminars the power Excel
  • seminar in Orlando Florida look at these
  • beautiful temperatures we have down here
  • on November 4 2016 my afternoon will be
  • all about power query power bi power
  • pivot power map so I'm going to invite
  • you down to Orlando to check out this
  • seminar there'll be a link there
  • the top right hand corner alright so
  • recap today's trick is from this book M
  • is for data monkey we build a query to
  • get one webpage and then edit that query
  • to change it into a function so right
  • before the let statement variable name
  • equals greater than and then change the
  • hard-coded URL to whatever that variable
  • name is rename the query it affects
  • whether close and load that data
  • disappears and then we use some sort of
  • Excel trickery to create a table of all
  • the URLs we want to crawl create a query
  • from that table this has to be a ctrl T
  • table add a new column of weather equals
  • equal FX weather and again it has to
  • match the case there square brackets URL
  • expand that column uncheck prefix BAM it
  • is amazing
  • thanks to canon miguel for writing this
  • book thanks to you for stopping by hope
  • to see you in Orlando on November 4th
  • 2016 see you next time for another net
  • cast from MrExcel

Download File

Download the sample file here: Podcast2056.xlsm

Title Photo: Tama66 / Pixabay


Bill Jelen is the author / co-author of
Excel Dynamic Arrays Straight to the Point

The new Dynamic Array Functions are just one side-effect of an effort to completely rewrite the Calculation Engine in Excel. Joe McDaid and the rest of the CALC team have the laid the groundwork for all future functions in Excel. Yes, the first crop of SORT, SORTBY, FILTER, UNIQUE, SEQUENCE and RANDARRAY are awesome and powerful, but they are just the first of many new functions that will come to Office 365 over the coming years.