Scraping data from a web page

rawdata

New Member
Joined
Oct 12, 2009
Messages
31
Hello,

I'm attempting to scrape data from a web page. The screen scrape that I am using works fine. Here is a high-level overview of what it does and where I'm having problems:

1) Open up Internet Explorer (this works fine)
2) Log-in (this works fine)
3) Go to web page (this works fine)
4) Scrape the data (this works only if there is data to be scraped. If the data is non-existent, then the macro wants me to debug it).

I'm using ie.document.Links(Link# here).Click. If the link isn't there, then vba wants me to end or debug.

I'm thinking that the best direction is to tell VBA if no link number exists to pass it up and move on to the next.

Does anyone have any insight on the best thing to do here and how to do it?

I'm very grateful for any advice and practical code solutions.
 

Excel Facts

Wildcard in VLOOKUP
Use =VLOOKUP("Apple*" to find apple, Apple, or applesauce
Hi and Welcome to MrExcel

Seeing your code usually helps so perhaps you could post it up for the benefit of everyone.

That aside I think you want to start your code with the following

Code:
On Error Resume Next

Which may work but I couldn't be totally sure without seeing what you're trying to actually do
 
Upvote 0
Dave,

It's funny you bring up the On Error Resume Next. I was just looking into that. Here is my code. It is not the cleanest. Nevetheless, ie.document.Links(15).Click is where my question begins to pick up. When I travel to Links(15), if the ie.document.Links(22).Click on the following screen is non-existent, it would be nice have the code travel back to the preceding page and go to Links(16). And so on, and so forth. I hope this makes sense.

This isn't all of the code, but hopefully it will give the idea:

Sub Integrity_Get_BatchCard()
Dim ie As Object, iebody As String, strURL As String, strUsername As String, strPassword As String
Dim lRow As Long
Dim abc As String
Dim striEst As String
Dim ele As Object
Dim a As String
Dim b As String
Dim c As String
a = DateAdd("d", -1, Now)
b = DateAdd("d", -2, Now)
c = DateAdd("d", -3, Now)

Set ie = CreateObject("InternetExplorer.Application")
strURL = "https://www.webpage1.aspx"
strUsername = "XXXXXXXX"
strPassword = "XXXXXXXX"
ie.navigate strURL
ie.Visible = True

While ie.Busy
DoEvents
Wend

ie.Document.All("txtUserID").Value = strUsername
ie.Document.All("txtpassword").Value = strPassword
ie.Document.All("btnSubmit").Click

While ie.Busy
DoEvents
Wend

ie.navigate "https://www.webpage2.aspx?rpt=BA"
While ie.Busy
DoEvents
Wend

Workbooks.Add.SaveAs FileName:= _
"C:\Documents\My FileName" & _
Format(a, " mmddyyyy") & ".xls", FileFormat:=xlNormal, Password:="", WriteResPassword:="", _
ReadOnlyRecommended:=False, CreateBackup:=False
ie.Document.All("txtStartDate").Value = Format(c, "mm/dd/yyyy")
ie.Document.All("txtEndDate").Value = Format(c, "mm/dd/yyyy")
ie.Document.All("btnSubmit").Click

While ie.Busy
DoEvents
Wend

ie.Document.Links(15).Click

While ie.Busy
DoEvents
Wend

ie.Document.Links(22).Click
While ie.Busy
DoEvents
Wend

For lRow = 2 To 2

While ie.Busy
DoEvents
Wend

Set sht = Sheets("Sheet1")
RowCount = 1
For Each itm In ie.Document.All
sht.Range("A" & RowCount) = itm.tagname
sht.Range("B" & RowCount) = itm.classname
sht.Range("C" & RowCount) = itm.ID
sht.Range("D" & RowCount) = Left(itm.innertext, 1000)
RowCount = RowCount + 1
Next itm
Next lRow
ie.navigate "https://www.webpage3.aspx"
While ie.Busy
DoEvents
Wend
ie.Document.Links(23).Click
While ie.Busy
DoEvents
Wend
For lRow = 2 To 2
While ie.Busy
DoEvents
Wend
Set sht = Sheets("Sheet1")
RowCount = RowCount + 1
For Each itm In ie.Document.All
sht.Range("A" & RowCount) = itm.tagname
sht.Range("B" & RowCount) = itm.classname
sht.Range("C" & RowCount) = itm.ID
sht.Range("D" & RowCount) = Left(itm.innertext, 1000)
RowCount = RowCount + 1
Next itm
Next lRow
 
Upvote 0
Dave,

It worked! I actually used the On Error Goto Line code.

However, when I return to go to the previous page and click the next link (ie.document.Links(16).Click) and run the routine of attempting to click on links 22 and 23 for that page, i get a run-time error 91 on ie.document.Links(22).Click.

Are my variables incorrectly declared?

P.S. i'm working with Excel 2007.
 
Upvote 0
Maybe something like:
Code:
    Dim firstPageLink As Integer, nextPageLink22Exists As Boolean

    With ie
        firstPageLink = 15
        nextPageLink22Exists = False
        Do While firstPageLink < .Document.links.Length And Not nextPageLink22Exists
            .Document.links(firstPageLink).Click
            While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
            If .Document.links.Length >= 23 Then
                nextPageLink22Exists = True
            Else
                .GoBack
                While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
                firstPageLink = firstPageLink + 1
            End If
        Loop
        
        If nextPageLink22Exists Then
            .Document.links(22).Click
            While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
        End If
    End With
Which replaces:
Code:
ie.Document.Links(15).Click

While ie.Busy
DoEvents
Wend

ie.Document.Links(22).Click
While ie.Busy
DoEvents
Wend
IMHO it's far better to use logic as shown above, rather than On Error, to anticipate potential error situations so that they don't occur in the first place. And the DOM objects provide enough functionality that you shouldn't need On Error anyway.
 
Upvote 0

Forum statistics

Threads
1,216,172
Messages
6,129,290
Members
449,498
Latest member
Lee_ray

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top