Excel vba scraper - guidance part II

Anka

New Member
Joined
Oct 20, 2012
Messages
45
Office Version
  1. 2016
  2. 2010
Platform
  1. Windows
Hi all,

I'm start from here , but seems that was the easiest part...
In the second part, I try to take the data from all rows that corresponding with the Bet365 bookmaker (for example), from this page.

I'm able, do that but my vba is very slow.
Is there a way to do the task without IE as I have hundreds of similar links and using ie is slower?
I need to use another method to do this job.
This is an example of link that I need to scrape.
How difficult is to scrape ajax sites because this site have this kind of pages (just for the odds)?

And the final result have to be something like this.
image.jpg


Thank you in advance.
 
Last edited:
I thought that IE change would make a bigger difference to the run time.

Try this code instead. As with your code, the URLs are expected to be in column D in the MatchDetails sheet starting at D2, and the extracted data is written to the Ou_Odds sheet.

It calls the HTMLDocument CreateDocumentFromUrl method on each URL and is quite fast - about 35 seconds for 20 URLs (Windows 10, IE 11.576). The only slight issue is that each call to CreateDocumentFromUrl causes Windows to display a temporary window saying:
You'll need a new app to open this
about
Look for an app in the Store
[checkbox] Always use this app
[button] OK

I think somehow it is trying to open the webpage "about:", or "about:#ou".

Code:
Public Sub Extract_Data()
    
    Dim matchURLs As Range, matchURL As Range
    Dim URL As String
    Dim destSheet As Worksheet
    Dim HTMLdoc As HTMLDocument, HTMLdoc2 As HTMLDocument
    Dim table As HTMLTable
    Dim tRows As IHTMLElementCollection
    Dim tRow As HTMLTableRow
    Dim r As Long
        
    Set destSheet = Worksheets("Ou_Odds")
    With destSheet
        .UsedRange.ClearContents
        .Range("A1:K1").Value = Array("Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs", "Bookmaker", "Total", "Over", "Under")
    End With
    r = 2
    
    With Worksheets("MatchDetails")
        Set matchURLs = .Range("D2", .Cells(Rows.Count, "D").End(xlUp))
    End With
   
    For Each matchURL In matchURLs
    
        URL = matchURL.Value
        If Right(URL, 3) <> "#ou" Then URL = URL & "#ou"

        Set HTMLdoc2 = New HTMLDocument
        Set HTMLdoc = HTMLdoc2.createDocumentFromUrl(URL, "")
        While HTMLdoc.readyState <> "complete": DoEvents: Wend

        Do
            Set table = HTMLdoc.getElementById("sortable-1")
            DoEvents
        Loop While table Is Nothing
        
        Set tRows = HTMLdoc.getElementsByTagName("TR")
        
        For Each tRow In tRows
            
            If tRow.innerText Like "*bet365*" Then

                destSheet.Cells(r, "A").Value = HTMLdoc.getElementsByClassName("list-breadcrumb__item")(2).innerText
                destSheet.Cells(r, "B").Value = HTMLdoc.getElementsByClassName("list-breadcrumb__item")(3).innerText
                destSheet.Cells(r, "C").Value = HTMLdoc.getElementById("match-date").innerText
                destSheet.Cells(r, "D").Value = HTMLdoc.getElementsByTagName("h2")(0).innerText
                destSheet.Cells(r, "E").Value = HTMLdoc.getElementsByTagName("h2")(2).innerText
                destSheet.Cells(r, "F").Value = HTMLdoc.getElementById("js-score").innerText
                destSheet.Cells(r, "G").Value = HTMLdoc.getElementById("js-partial").innerText

                destSheet.Cells(r, "H").Value = tRow.Cells(0).innerText
                destSheet.Cells(r, "I").Value = tRow.Cells(3).innerText
                destSheet.Cells(r, "J").Value = tRow.Cells(4).innerText
                destSheet.Cells(r, "K").Value = tRow.Cells(5).innerText
                
                r = r + 1
                
            End If
            
        Next
        
    Next

    Set HTMLdoc = Nothing
    Set HTMLdoc2 = Nothing
    
End Sub
 
Upvote 0

Excel Facts

Convert text numbers to real numbers
Select a column containing text numbers. Press Alt+D E F to quickly convert text to numbers. Faster than "Convert to Number"
Thank you for you do not have abandoned this discussion.
In my case it happens what you said: it is trying to open the webpage "about:", or "about:#ou".
My system is Windows 7 Pro 32 bit and Office Pro Plus 2010.
Maybe I must to check/add something special in References ?

image.jpg
 
Upvote 0
I should have said that you need to set a reference (Tools -> References in the VBA Editor) to Microsoft HTML Object Library.
 
Upvote 0
I'm sorry, this (and Microsoft Internet Controls) is already set.
I ask if is necessary to set something else, more special.
 
Upvote 0
No, only the HTML Object Library is needed. The createDocumentFromUrl method is part of the HTMLDocument class, which is in the HTML Object Library.

You don't need the Internet Controls, though it shouldn't matter if that is also selected.
 
Upvote 0
In this case something don't work. :(
Why this works in your system, but does not work in my system ?
 
Upvote 0
I don't know why it doesn't work for you. CreateDocumentFromUrl is implemented by the underlying Trident (IE/MSHTML) layout engine - see https://en.wikipedia.org/wiki/Trident_(layout_engine). With you running IE11, it should be using the latest, Trident/7.0. You can check this by typing javascript:alert(navigator.userAgent) in the IE address bar. Maybe the Windows version affects it somehow, I don't know.

What debugging have you tried? Does it get past the CreateDocumentFromUrl line? Try commenting out the While readyState line and stepping through the code with the F8 key.
 
Upvote 0
If you run the code (normally), first open IE page what you saw in the print screen below. After this remains to running without any result. I must to stop the code from stop button.
If I use the F8 key, the code do the same things, and remain to make an infinite loop betwen this lines.

Code:
Set table = HTMLdoc.getElementById("sortable-1")
            DoEvents
        Loop While table Is Nothing

image.jpg
 
Upvote 0
I have the same problem with IE8 on Win XP. It opens "about:blank#ou" in IE and loops infinitely in the getElementById("sortable-1") loop because table is always Nothing (because that id doesn't exist).

Furthermore, if you write the contents of HTMLDoc to a text file:

Code:
        Set HTMLdoc2 = New HTMLDocument
        Set HTMLdoc = HTMLdoc2.createDocumentFromUrl(URL, "")
        While HTMLdoc.readyState <> "complete": DoEvents: Wend
        Log_HTML HTMLdoc.body.parentElement.outerHTML

Code:
Private Sub Log_HTML(HTML As String)
    Static num As Integer
    num = num + 1
    Open ThisWorkbook.Path & "\HTML" & num & Format(Now, " hh mm ss") & ".txt" For Output As #1
    Print #1, HTML
    Close #1
End Sub
there is no odds data amongst the HTML tags, which means the Ajax request that the page sends to get the odds data is not working. On Windows 10/IE11 all the odds data is in the HTML and it can be extracted from HTMLdoc, as shown by my code.

I think it's best to abandon the CreateDocumentFromUrl method and I will try direct XMLhttp/WinHttp requests.
 
Upvote 0
Just a couple of things.

If IE is the issue then explore the use of Firefox/Chrome via the installation of Selenium. https://code.google.com/archive/p/selenium-vba/

Also one poster stated that AJAX responses are JSON. While I may be missing the point of what he's trying to say AJAX queries return anything they are designed to (which will be interpreted via the browser and returned as HTML). AJAX itself refers only to the asynchronous nature of the query (with subsequent update to the IE page) and that's the only thing you have to account for when adding AJAX capability to your scraping.
 
Upvote 0

Forum statistics

Threads
1,214,643
Messages
6,120,707
Members
448,981
Latest member
recon11bucks

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top