Importing table data from 300 local HTML files. Can I do it faster than 3-4 minutes?

d0rian

Active Member
Joined
May 30, 2015
Messages
251
Goal: getting data from 300 web pages (each has stock price data in exactly the same layout / format, a different stock on each page) into an Excel file.

My current process is pretty good, but takes 3-4 minutes total (during which Excel is "thinking" / hourglass-hanging and thus unusable for anything else.) Hoping for a different method that might cut that way down. Current steps:
  1. Use a Firefox extension to batch scrape all 300 pages (takes ~20 seconds, pretty happy with speed of this step) and download the content to a local directory (which it saves as .aspx files)
  2. Convert the .aspx files to .html files (via simple windows Rename batch function (1-2 seconds, nearly instantaneous). Each file is ~100kb, so all files together are 25-30MB.
  3. Here's the time-consuming step that I'm hoping to improve: I have a VBA script that loops through all 300 HTML files, parsing them and pasting the table data on the first empty row of a worksheet, such that it all ends up in a single table. This is the part that takes ~3 minutes during which Excel is hourglassing and can't do anything else. It seems like there should be a much faster way to accomplish this...yes, there are 300 files (~25MB in total), but they're all just sitting there locally in a single directory. Is there some method I haven't thought of that would suck all of them into Excel in less then 3 minutes?
Misc considerations:
  • I know some versions of Excel have a 'get data from web' wizard, but I don't think this is an option for me because the 300 pages I'm scraping are behind a password/login wall. (Moreover, even if they weren't, I don't think this would be much faster; I'm pretty happy with the ~20 seconds it's taking me to download all of the data locally...I just feel like it should be much faster to get that local data into Excel.)
  • I gather that it's the HTML formatting / parsing that's causing the slowdown? I don't think I have any option to scrape/download the data in a format other than aspx / html. Is parsing the data locally with perhaps a different, non-excel program better? I tried aggregating the 300 100kb HTML files into a single 25MB HTML file and then pointing Excel at that, but it didn't help much speed-wise.
Open to any out-of-the-box suggestions.

TLDR: I have 300 local HTML files with table data in the same format. Seems like I should be able to get it into Excel faster than the ~3 minutes it currently takes, but don't know how.
 

Some videos you may like

Excel Facts

Enter current date or time
Ctrl+: enters current time. Ctrl+; enters current date. Use Ctrl+: Ctrl+; Enter for current date & time.

d0rian

Active Member
Joined
May 30, 2015
Messages
251
Excel 2007, though i have access to Excel 365, which if memory serves has enhanced import-data-from-web functionality, though as I wrote in the OP, I'm not sure that will work since I don't think that can grab data from behind a paywall. (Not sure what you mean by 'confidential'; as i mentioned, the HTML pages that i'm scraping are behind a login wall since it's a paid service).

But just taking a step back for a moment, as I wrote in the OP, I've got the scraping/downloading parts of the process completing in a relatively quick ~20 seconds...I just have to think there's some method of importing data from 300 local HTML files that'd be quicker than the 3 mins it's currently taking me.
 

sandy666

Well-known Member
Joined
Oct 24, 2015
Messages
7,077
if they are not confidential could you zip them and share it via onedrive, googledrive, dropbox or any similar service and paste link to this file here
 

GWteB

Well-known Member
Joined
Sep 10, 2010
Messages
1,090
Office Version
  1. 2013
Platform
  1. Windows
TLDR: I have 300 local HTML files with table data in the same format. Seems like I should be able to get it into Excel faster than the ~3 minutes it currently takes, but don't know how.

In general, it is recommended to get worksheet access for reading en writing as few times as possible. Using arrays can be helpful in this. Within time consuming procedures it is also recommended (in connection with the architecture and behaviour of a multi-tasking OS like the Windows OS) to let the OS know at regular intervals that your VBA is still running by yielding to the OS using the DoEvents statement.
 

Watch MrExcel Video

Forum statistics

Threads
1,113,978
Messages
5,545,307
Members
410,675
Latest member
DLongmountain
Top