Excel vba scraper - guidance part II

Anka · Jan 6, 2017

Hi all,

I'm start from here , but seems that was the easiest part...
In the second part, I try to take the data from all rows that corresponding with the Bet365 bookmaker (for example), from this page.

I'm able, do that but my vba is very slow.
Is there a way to do the task without IE as I have hundreds of similar links and using ie is slower?
I need to use another method to do this job.
This is an example of link that I need to scrape.
How difficult is to scrape ajax sites because this site have this kind of pages (just for the odds)?

And the final result have to be something like this.

Thank you in advance.

John_w · Sep 4, 2017

gywjdqks said:
JOHN. please
I ran the code, but there's a problem. Input is omitted if a bookmaker does not exist.
If there are no bookmaker, the following information shall be entered.
< "Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs", >
Is it possible?

What URL are you using?

The code works correctly with http://www.betexplorer.com/soccer/italy/serie-a-2015-2016/empoli-torino/bFpRibLT/#ou, for example.

gywjdqks · Sep 4, 2017

John_w said:
What URL are you using?

The code works correctly with http://www.betexplorer.com/soccer/italy/serie-a-2015-2016/empoli-torino/bFpRibLT/#ou, for example.

ex url : http://www.betexplorer.com/soccer/italy/serie-a-2015-2016/empoli-torino/bFpRibLT/#ou
my bookmaker : pinncle

The above address works because the pinnacle exists. But it doesn't work unless there is a pinnacle. (Skip it out.)

I ran the code, but there's a problem. Input is omitted if a bookmaker does not exist.
If there are no bookmaker, the following information shall be entered.
< "Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs", >

ex)
bookmaker yes : < "Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs", "Bookmaker", "Total", "Over", "Under" >
bookmaker non : < "Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs", >

John_w · Sep 5, 2017

This code is an update to https://www.mrexcel.com/forum/excel...guidance-part-ii-post4725612.html#post4725612 which outputs the "Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs" values if the specified bookmaker doesn't exist, otherwise the "Bookmaker", "Total", "Over", "Under" fields are also output to Excel cells.

Code:

Public Sub Extract_Data3()
    
    Dim httpReq As WinHttp.WinHttpRequest
    Dim HTMLdoc As HTMLDocument
    Dim para As HTMLParaElement
    Dim tRows As IHTMLElementCollection
    Dim tRow As HTMLTableRow
    Dim URLs As Range, URL As Range
    Dim destSheet As Worksheet
    Dim parts As Variant
    Dim matchURL As String, matchOddsURL As String
    Dim r As Long
    Dim matchData(1 To 7) As Variant
    Dim HTML As String
    Dim bookmakerFound As Boolean
    Dim startTime As Single
    
    startTime = Timer
    
    Set destSheet = ThisWorkbook.Worksheets("XMLhttp_Odds")
    With destSheet
        .UsedRange.ClearContents
        .Range("A1:K1").Value = Array("Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs", "Bookmaker", "Total", "Over", "Under")
    End With
    r = 2
    
    With ThisWorkbook.Worksheets("MatchDetails")
        Set URLs = .Range("D2", .Cells(Rows.Count, "D").End(xlUp))
    End With
    
    Set httpReq = New WinHttp.WinHttpRequest
    
    For Each URL In URLs
        
        'Request the match result page by removing "#ou" from the URL
        
        matchURL = Replace(URL.Value, "#ou", "")
        Debug.Print matchURL
        With httpReq
            .Open "GET", matchURL, False
            .setRequestHeader "Host", "www.betexplorer.com"
            .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"     'Windows 10, IE11
            .setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
            .setRequestHeader "Accept-Language", "en-US,en;q=0.5"
            .setRequestHeader "Upgrade-Insecure-Requests", "1"
            .setRequestHeader "Referer", "http://www.betexplorer.com/soccer/italy/serie-a-2015-2016/results/"
            .send
            Debug.Print .Status, .statusText
            
            'Put response in a HTMLDocument for parsing
            
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = .responseText
        End With
        DoEvents
        
        matchData(1) = HTMLdoc.getElementsByClassName("list-breadcrumb__item")(2).innerText
        matchData(2) = HTMLdoc.getElementsByClassName("list-breadcrumb__item")(3).innerText
        
        'Date and time of match is in data-dt attribute of "match-date" P element:
        '< P id=match-date class=list-details__item__date data-dt="15,5,2016,18,00">< /P>
        
        Set para = HTMLdoc.getElementById("match-date")
        parts = Split(para.getAttribute("data-dt"), ",")
        matchData(3) = DateSerial(parts(2), parts(1), parts(0)) + TimeSerial(parts(3), parts(4), 0)
        
        matchData(4) = HTMLdoc.getElementsByTagName("h2")(0).innerText
        matchData(5) = HTMLdoc.getElementsByTagName("h2")(2).innerText
        matchData(6) = HTMLdoc.getElementById("js-score").innerText
        matchData(7) = HTMLdoc.getElementById("js-partial").innerText
        
        'Construct match odds URL from match result URL
        
        parts = Split(URL.Value, "/")
        matchOddsURL = "http://www.betexplorer.com/gres/ajax/matchodds.php?p=1&b=ou&e=" & parts(7)
        Debug.Print matchOddsURL
        
        'Request the match odds data.  The response is a JSON string containing HTML which itself contains the odds data
        
        With httpReq
            .Open "GET", matchOddsURL, False
            .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"     'Windows 10, IE11
            .setRequestHeader "Accept", "application/json, text/javascript, */*; q=0.01"
            .setRequestHeader "Accept-Language", "en-US,en;q=0.5"
            .setRequestHeader "X-Requested-With", "XMLHttpRequest"
            .setRequestHeader "Referer", URL.Value
            .send
            Debug.Print .Status, .statusText
            
            'Extract HTML from JSON response and put in HTMLDocument
            
            HTML = Mid(.responseText, Len("{'odds':'") + 1)     'remove {"odds":" at the start
            HTML = Left(HTML, Len(HTML) - 2)                    'remove "} at the end
            HTML = Replace(HTML, "\n", "")                      'remove newlines
            HTML = Replace(HTML, "\", "")                       'remove escape characters (assumes that every "\" is the escape character preceding the escaped character)
            
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = HTML
        End With
        DoEvents
    
        bookmakerFound = False
        Set tRows = HTMLdoc.getElementsByTagName("TR")
        For Each tRow In tRows
            If tRow.getElementsByTagName("TABLE").Length = 0 Then       'no inner tables?
                If InStr(1, tRow.innerText, "bet365", vbTextCompare) > 0 Then
                    bookmakerFound = True
                    destSheet.Cells(r, "A").Resize(1, 7) = matchData
                    destSheet.Cells(r, "H").Value = tRow.Cells(0).innerText
                    destSheet.Cells(r, "I").Value = tRow.Cells(3).innerText
                    destSheet.Cells(r, "J").Value = tRow.Cells(4).getAttribute("data-odd")
                    destSheet.Cells(r, "K").Value = tRow.Cells(5).getAttribute("data-odd")
                    r = r + 1
                End If
            End If
        Next
        
        If Not bookmakerFound Then
            destSheet.Cells(r, "A").Resize(1, 7) = matchData
            r = r + 1
        End If
        
    Next
    
    Debug.Print "Elapsed time = " & Timer - startTime & " seconds"
    Debug.Print "Elapsed time = " & Format(TimeSerial(0, 0, Timer - startTime), "hh:mm:ss")
    
End Sub

gywjdqks · Sep 5, 2017

Your code has been applied and executed. Perfect. You are a genius. My troubles are gone !!! Thank you.

gywjdqks · Sep 11, 2017

John, how do I amend the code when I have three bookmakers? I modified the code alone, but if there was no last bookmaker, no row was added. It's my limit. I'm sorry.

ex) < "Country" ~ "Halfs", "Bookmaker", "Total", "bet365 Over", "bet365 Under" , "Bookmaker", "Total", "188bet Over", "188bet Under", "Total", "Bookmaker", "pinnacle Over", "pinnacle Under" >

+This code is an update to https://www.mrexcel.com/forum/excel-...ml#post4725612 which outputs the "Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs" values if the specified bookmaker doesn't exist, otherwise the "Bookmaker", "Total", "Over", "Under" fields are also output to Excel cells.

DoEvents

bookmakerFound = False
Set tRows = HTMLdoc.getElementsByTagName("TR")
For Each tRow In tRows
If tRow.getElementsByTagName("TABLE").Length = 0 Then 'no inner tables?
If InStr(1, tRow.innerText, "bet365", vbTextCompare) > 0 Then
bookmakerFound = True
destSheet.Cells(r, "A").Resize(1, 7) = matchData
destSheet.Cells(r, "H").Value = tRow.Cells(0).innerText
destSheet.Cells(r, "I").Value = tRow.Cells(3).innerText
destSheet.Cells(r, "J").Value = tRow.Cells(4).getAttribute("data-odd")
destSheet.Cells(r, "K").Value = tRow.Cells(5).getAttribute("data-odd")
r = r + 1
End If
End If
Next

If Not bookmakerFound Then
destSheet.Cells(r, "A").Resize(1, 7) = matchData
r = r + 1
End If

gywjdqks · Sep 12, 2017

I don't know how to delete the text. John, you don't have to answer any questions. I am satisfied with your reply. If we get a problem later, we'll write again.

gywjdqks · Sep 26, 2017

Let me ask you a question again. Please help me.
John, how do I amend the code when I have three bookmakers? I modified the code alone, but if there was no last bookmaker, no row was added. It's my limit. I'm sorry.

ex) < "Country" ~ "Halfs", "Bookmaker", "Total", "bet365 Over", "bet365 Under" , "Bookmaker", "Total", "188bet Over", "188bet Under", "Total", "Bookmaker", "pinnacle Over", "pinnacle Under" >

+This code is an update to https://www.mrexcel.com/forum/excel-...ml#post4725612 which outputs the "Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs" values if the specified bookmaker doesn't exist, otherwise the "Bookmaker", "Total", "Over", "Under" fields are also output to Excel cells.

DoEvents

bookmakerFound = False
Set tRows = HTMLdoc.getElementsByTagName("TR")
For Each tRow In tRows
If tRow.getElementsByTagName("TABLE").Length = 0 Then 'no inner tables?
If InStr(1, tRow.innerText, "bet365", vbTextCompare) > 0 Then
bookmakerFound = True
destSheet.Cells(r, "A").Resize(1, 7) = matchData
destSheet.Cells(r, "H").Value = tRow.Cells(0).innerText
destSheet.Cells(r, "I").Value = tRow.Cells(3).innerText
destSheet.Cells(r, "J").Value = tRow.Cells(4).getAttribute("data-odd")
destSheet.Cells(r, "K").Value = tRow.Cells(5).getAttribute("data-odd")
r = r + 1
End If
End If
Next

If Not bookmakerFound Then
destSheet.Cells(r, "A").Resize(1, 7) = matchData
r = r + 1
End If

John_w · Sep 27, 2017

gywjdqks said:
Let me ask you a question again. Please help me.
John, how do I amend the code when I have three bookmakers? I modified the code alone, but if there was no last bookmaker, no row was added. It's my limit. I'm sorry.

ex) < "Country" ~ "Halfs", "Bookmaker", "Total", "bet365 Over", "bet365 Under" , "Bookmaker", "Total", "188bet Over", "188bet Under", "Total", "Bookmaker", "pinnacle Over", "pinnacle Under" >

So you want the columns as shown above, for 3 bookmakers? The code needs quite a few changes. This code should work for any number of bookmakers - just put the names in the bookmakers array.

Code:

Public Sub Extract_Data4()
    
    Dim httpReq As WinHttp.WinHttpRequest
    Dim HTMLdoc As HTMLDocument
    Dim para As HTMLParaElement
    Dim tRows As IHTMLElementCollection
    Dim tRow As HTMLTableRow
    Dim URLs As Range, URL As Range
    Dim destSheet As Worksheet
    Dim parts As Variant
    Dim matchURL As String, matchOddsURL As String
    Dim r As Long, i As Long
    Dim matchData(1 To 7) As Variant
    Dim HTML As String
    Dim bookmakers As Variant, numRows() As Long, maxRows As Long
    Dim startTime As Single
    
    startTime = Timer
    
    bookmakers = Array("bet365", "188bet", "pinnacle")
    ReDim numRows(UBound(bookmakers))
    
    Set destSheet = ThisWorkbook.Worksheets("XMLhttp_Odds")
    With destSheet
        .UsedRange.ClearContents
        .Range("A1:G1").Value = Array("Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs")
        For i = 0 To UBound(bookmakers)
            .Range("H1").Offset(, i * 4).Resize(1, 4) = Array("Bookmaker", "Total", bookmakers(i) & " Over", bookmakers(i) & " Under")
        Next
    End With
    r = 2
    
    With ThisWorkbook.Worksheets("MatchDetails")
        Set URLs = .Range("D2", .Cells(Rows.Count, "D").End(xlUp))
    End With
    
    Set httpReq = New WinHttp.WinHttpRequest
    
    For Each URL In URLs
        
        'Request the match result page by removing "#ou" from the URL
        
        matchURL = Replace(URL.Value, "#ou", "")
        With httpReq
            .Open "GET", matchURL, False
            .setRequestHeader "Host", "www.betexplorer.com"
            .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"     'Windows 10, IE11
            .setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
            .setRequestHeader "Accept-Language", "en-US,en;q=0.5"
            .setRequestHeader "Upgrade-Insecure-Requests", "1"
            .setRequestHeader "Referer", "http://www.betexplorer.com/soccer/italy/serie-a-2015-2016/results/"
            .send
            Debug.Print .Status, .statusText
            
            'Put response in a HTMLDocument for parsing
            
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = .responseText
        End With
        DoEvents
        
        matchData(1) = HTMLdoc.getElementsByClassName("list-breadcrumb__item")(2).innerText
        matchData(2) = HTMLdoc.getElementsByClassName("list-breadcrumb__item")(3).innerText
        
        'Date and time of match is in data-dt attribute of "match-date" P element:
        '< P id=match-date class=list-details__item__date data-dt="15,5,2016,18,00">< /P>
        
        Set para = HTMLdoc.getElementById("match-date")
        parts = Split(para.getAttribute("data-dt"), ",")
        matchData(3) = DateSerial(parts(2), parts(1), parts(0)) + TimeSerial(parts(3), parts(4), 0)
        
        matchData(4) = HTMLdoc.getElementsByTagName("h2")(0).innerText
        matchData(5) = HTMLdoc.getElementsByTagName("h2")(2).innerText
        matchData(6) = HTMLdoc.getElementById("js-score").innerText
        matchData(7) = HTMLdoc.getElementById("js-partial").innerText
        
        'Construct match odds URL from match result URL
        
        parts = Split(URL.Value, "/")
        matchOddsURL = "http://www.betexplorer.com/gres/ajax/matchodds.php?p=1&b=ou&e=" & parts(7)
        
        'Request the match odds data.  The response is a JSON string containing HTML which itself contains the odds data
        
        With httpReq
            .Open "GET", matchOddsURL, False
            '.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0" 'Windows 10, Firefox
            .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"     'Windows 10, IE11
            .setRequestHeader "Accept", "application/json, text/javascript, */*; q=0.01"
            .setRequestHeader "Accept-Language", "en-US,en;q=0.5"
            .setRequestHeader "X-Requested-With", "XMLHttpRequest"
            .setRequestHeader "Referer", URL.Value
            .send
            Debug.Print .Status, .statusText
            
            'Extract HTML from JSON response and put in HTMLDocument
            
            HTML = Mid(.responseText, Len("{'odds':'") + 1)     'remove {"odds":" at the start
            HTML = Left(HTML, Len(HTML) - 2)                    'remove "} at the end
            HTML = Replace(HTML, "\n", "")                      'remove newlines
            HTML = Replace(HTML, "\", "")                       'remove escape characters (assumes that every "\" is the escape character preceding the escaped character)
            
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = HTML
        End With
        DoEvents
        
        'Initialise number of rows found for each bookmaker
        
        For i = 0 To UBound(numRows)
            numRows(i) = 0
        Next

        'Find data for each bookmaker and extract to Excel
        
        Set tRows = HTMLdoc.getElementsByTagName("TR")
        For Each tRow In tRows
            If tRow.getElementsByTagName("TABLE").Length = 0 Then       'no inner tables?
                For i = 0 To UBound(bookmakers)
                    If InStr(1, tRow.innerText, bookmakers(i), vbTextCompare) > 0 Then
                        destSheet.Cells(r + numRows(i), "H").Offset(, i * 4).Value = tRow.Cells(0).innerText
                        destSheet.Cells(r + numRows(i), "I").Offset(, i * 4).Value = tRow.Cells(3).innerText
                        destSheet.Cells(r + numRows(i), "J").Offset(, i * 4).Value = tRow.Cells(4).getAttribute("data-odd")
                        destSheet.Cells(r + numRows(i), "K").Offset(, i * 4).Value = tRow.Cells(5).getAttribute("data-odd")
                        numRows(i) = numRows(i) + 1
                    End If
                Next
            End If            
        Next
        
        'Find maximum number of rows output for all bookmakers
        
        maxRows = 0
        For i = 0 To UBound(numRows)
            If numRows(i) > maxRows Then maxRows = numRows(i)
        Next
        
        'Set the next start row
        
        If maxRows > 0 Then
            destSheet.Cells(r, "A").Resize(maxRows, 7) = matchData
            r = r + maxRows
        Else
            destSheet.Cells(r, "A").Resize(1, 7) = matchData
            r = r + 1
        End If
        
    Next
    
    Debug.Print "Elapsed time = " & Timer - startTime & " seconds"
    Debug.Print "Elapsed time = " & Format(TimeSerial(0, 0, Timer - startTime), "hh:mm:ss")
    
End Sub

gywjdqks · Sep 28, 2017

Thank you for your efforts. The code changed so much that it took time. I can't make a code like you. However, the application is available for the code you provided. I did it again. I applied, and I studied again about the code you gave me. Thank you very much. You're the best.

bonzo1971 · Oct 15, 2017

Hi to all,
someone can help me to import data from this link: http://www.betexplorer.com/results/soccer/?year=2017&month=10&day=13 thanks in advice for everyone can help me i'm a newbie in Excel VBA.
Regards
Vincenzo

Excel vba scraper - guidance part II

Anka

New Member

John_w

MrExcel MVP

Excel Facts

gywjdqks

New Member

John_w

MrExcel MVP

gywjdqks

New Member

gywjdqks

New Member

gywjdqks

New Member

gywjdqks

New Member

John_w

MrExcel MVP

gywjdqks

New Member

bonzo1971

New Member

Similar threads

Forum statistics

Share this page

Excel vba scraper - guidance part II

New Member

MrExcel MVP

Excel Facts

New Member

MrExcel MVP

New Member

New Member

New Member

New Member

MrExcel MVP

New Member

New Member

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock