Best way to web scrape using excel as database

DianaBanana

Board Regular
Joined
Mar 10, 2014
Messages
71
Does anyone have a recommended way to scrape data off the web and get it to populate into an excel document? For example, I am going through a hotel site and I'd like to scrape all the different hotel room rates for that particular hotel location and store it in an excel document. I'd like to be able to re-run this (macro or code) weekly and see the different rates? I'm manually doing it and putting it into excel but there must be a better way.
 
Follow these instructions from my earlier post to set a reference to each of the libraries listed at the top of the code:
Ok, I'm sorry I missed that, I enabled that. Now, I'm getting an error message on the next line - Dim lnk As HTMLLinkElement - same error message as before
 
Upvote 0

Excel Facts

How to create a cell-sized chart?
Tiny charts, called Sparklines, were added to Excel 2010. Look for Sparklines on the Insert tab.
Ok, I'm sorry I missed that, I enabled that. Now, I'm getting an error message on the next line - Dim lnk As HTMLLinkElement - same error message as before
lol, yup, you have to do the same thing for the other two libraries: Microsoft HTML Object Library, and Microsoft VBScript Regular Expressions #. Just choose the regular expressions one with the highest number. Mine said 5.5.
 
Upvote 0
:) OK I did that now. It started running, but for some reason between each hotel it sent me a form with an "OK" button message - like La QUinta Inn Hotel in AZ with 2 floors Con't?
Is there a way for it to run w/out these forms? And I could not see the output because I cancelled..is there a way to see output after say 5 hotels (just to see what it's doing)?
 
Upvote 0
That's correct. There is no output besides those message boxes. The purpose of that sample was to show you how to navigate through the pages and retrieve information from them. The rest is up to you or anyone nice enough to take on the entire project for you. :biggrin:
 
Upvote 0
That's correct. There is no output besides those message boxes. The purpose of that sample was to show you how to navigate through the pages and retrieve information from them. The rest is up to you or anyone nice enough to take on the entire project for you. :biggrin:
Oh man - so that's not you? Ok, I am going to try to work on this but can you help me with some questions, which of these words indicate where exactly it's looking? Is it possible for me to "record" a macro from excel into html for one hotel and then just repeat it 800+ times? :) - Wend
regex.Pattern = "(Floors|Rooms): \d+"
For Each mch In regex.Execute(ie2.Document.DocumentElement.innerHTML)
If MsgBox(ie2.Document.Title & vbCr & mch & vbCr & vbCr & "Continue?", vbYesNo) <> vbYes Then
GoTo Quit
 
Upvote 0
Oh man - so that's not you?
Nope, sorry. I actually thought the code might scare you into doing this manually until you learned VBA, which wouldn't be such a bad idea as this is not a great project to start VBA on. But, I'll never discourage someone who wants to learn. In fact, I think your desire to get this project done will only drive your learning.

I've tried teaching people VBA before and one common problem was that they didn't have a project like this to challenge them.

...which of these words indicate where exactly it's looking?
This pattern tells it what to look for:
Code:
regex.Pattern = "(Floors|Rooms): \d+"
Look here for a visual demonstration of how that works: Regex Tester (Permalink)
And research Regular Expressions for more info. I used regular expressions here because the elements that contained the info didn't have id's. When they do have id's you can use this method: getElementById method (Internet Explorer)

Is it possible for me to "record" a macro from excel into html for one hotel and then just repeat it 800+ times? :)
No, not that easy, unfortunately.
 
Upvote 0
Nope, sorry. I actually thought the code might scare you into doing this manually until you learned VBA, which wouldn't be such a bad idea as this is not a great project to start VBA on. But, I'll never discourage someone who wants to learn. In fact, I think your desire to get this project done will only drive your learning.

I've tried teaching people VBA before and one common problem was that they didn't have a project like this to challenge them.


This pattern tells it what to look for:
Code:
regex.Pattern = "(Floors|Rooms): \d+"
Look here for a visual demonstration of how that works: Regex Tester (Permalink)
And research Regular Expressions for more info. I used regular expressions here because the elements that contained the info didn't have id's. When they do have id's you can use this method: getElementById method (Internet Explorer)


No, not that easy, unfortunately.

Angel,
1) How does it know to go from the hotel listings to click on each link? Is it the "navigate lnk"?
2) I am simplifying the process. I am just going to pull up the best available rate that is on the first page along with the room #. Where it says on the page "enjoy rates from". I cannot seem to replicate the regex test.
3) Once it can identify those two items (number of rooms and lowest rate), what tells it to bring it back into excel?
 
Upvote 0
Well If you know HTML and VBA, one way to do it is with the Microsoft company On-line collection. I'm sure you will discover some examples by searching on the 'screen' or something like that.

I just started learning vba so when I google for codes not following it that well...
 
Upvote 0
Angel,
1) How does it know to go from the hotel listings to click on each link? Is it the "navigate lnk"?
Yes, the Navigate method follows the links. This code uses a For Each loop to iterate through the links. Look into VBA loops for a better understanding of that.

2) I am simplifying the process. I am just going to pull up the best available rate that is on the first page along with the room #. Where it says on the page "enjoy rates from". I cannot seem to replicate the regex test.
The pattern for this gets much more complex. This one should work: (remember you're testing against the actual source HTML, not what you see in your web browser)
Code:
Enjoy Rates From:(?:.|\n)+\$((?:\d+,+)*\d+\.\d+)
You can use the SubMatches property to extract just the dollar amount like so:
Code:
            regex.Pattern = "Enjoy Rates From:(?:.|\n)+\$((?:\d+,+)*\d+\.\d+)"
            For Each mch In regex.Execute(ie2.Document.DocumentElement.innerHTML)
                If MsgBox(ie2.Document.Title & vbCr & mch.SubMatches(0) & vbCr & vbCr & "Continue?", vbYesNo) <> vbYes Then

3) Once it can identify those two items (number of rooms and lowest rate), what tells it to bring it back into excel?
Currently nothing does. Getting the data onto a spreadsheet should be one of the easier parts of this project though. Look at some samples and give it a shot. Then, report back here and we'll help you out.

I just started learning vba so when I google for codes not following it that well...
That post, which looked like a mangled version of my first post was made by a spam bot. Forget about it, lol.
 
Upvote 0
Yes, the Navigate method follows the links. This code uses a For Each loop to iterate through the links. Look into VBA loops for a better understanding of that.


The pattern for this gets much more complex. This one should work: (remember you're testing against the actual source HTML, not what you see in your web browser)
Code:
Enjoy Rates From:(?:.|\n)+\$((?:\d+,+)*\d+\.\d+)
You can use the SubMatches property to extract just the dollar amount like so:
Code:
            regex.Pattern = "Enjoy Rates From:(?:.|\n)+\$((?:\d+,+)*\d+\.\d+)"
            For Each mch In regex.Execute(ie2.Document.DocumentElement.innerHTML)
                If MsgBox(ie2.Document.Title & vbCr & mch.SubMatches(0) & vbCr & vbCr & "Continue?", vbYesNo) <> vbYes Then


Currently nothing does. Getting the data onto a spreadsheet should be one of the easier parts of this project though. Look at some samples and give it a shot. Then, report back here and we'll help you out.


That post, which looked like a mangled version of my first post was made by a spam bot. Forget about it, lol.
im trying to run the macro and it says the macros in this workbook are disabled?
 
Upvote 0

Forum statistics

Threads
1,215,219
Messages
6,123,687
Members
449,117
Latest member
Aaagu

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top