Excel vba scraper - guidance part II

Anka

New Member
Joined
Oct 20, 2012
Messages
45
Office Version
  1. 2016
  2. 2010
Platform
  1. Windows
Hi all,

I'm start from here , but seems that was the easiest part...
In the second part, I try to take the data from all rows that corresponding with the Bet365 bookmaker (for example), from this page.

I'm able, do that but my vba is very slow.
Is there a way to do the task without IE as I have hundreds of similar links and using ie is slower?
I need to use another method to do this job.
This is an example of link that I need to scrape.
How difficult is to scrape ajax sites because this site have this kind of pages (just for the odds)?

And the final result have to be something like this.
image.jpg


Thank you in advance.
 
Last edited:
Upvote 0

Excel Facts

Add Bullets to Range
Select range. Press Ctrl+1. On Number tab, choose Custom. Type Alt+7 then space then @ sign (using 7 on numeric keypad)
What URL are you using?

The code works correctly with http://www.betexplorer.com/soccer/italy/serie-a-2015-2016/empoli-torino/bFpRibLT/#ou, for example.

ex url : http://www.betexplorer.com/soccer/italy/serie-a-2015-2016/empoli-torino/bFpRibLT/#ou
my bookmaker : pinncle

The above address works because the pinnacle exists. But it doesn't work unless there is a pinnacle. (Skip it out.)

I ran the code, but there's a problem. Input is omitted if a bookmaker does not exist.

If there are no bookmaker, the following information shall be entered.
< "Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs", >

ex)
bookmaker yes : < "Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs", "Bookmaker", "Total", "Over", "Under" >
bookmaker non : < "Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs", >
 
Upvote 0
This code is an update to https://www.mrexcel.com/forum/excel...guidance-part-ii-post4725612.html#post4725612 which outputs the "Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs" values if the specified bookmaker doesn't exist, otherwise the "Bookmaker", "Total", "Over", "Under" fields are also output to Excel cells.

Code:
Public Sub Extract_Data3()
    
    Dim httpReq As WinHttp.WinHttpRequest
    Dim HTMLdoc As HTMLDocument
    Dim para As HTMLParaElement
    Dim tRows As IHTMLElementCollection
    Dim tRow As HTMLTableRow
    Dim URLs As Range, URL As Range
    Dim destSheet As Worksheet
    Dim parts As Variant
    Dim matchURL As String, matchOddsURL As String
    Dim r As Long
    Dim matchData(1 To 7) As Variant
    Dim HTML As String
    Dim bookmakerFound As Boolean
    Dim startTime As Single
    
    startTime = Timer
    
    Set destSheet = ThisWorkbook.Worksheets("XMLhttp_Odds")
    With destSheet
        .UsedRange.ClearContents
        .Range("A1:K1").Value = Array("Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs", "Bookmaker", "Total", "Over", "Under")
    End With
    r = 2
    
    With ThisWorkbook.Worksheets("MatchDetails")
        Set URLs = .Range("D2", .Cells(Rows.Count, "D").End(xlUp))
    End With
    
    Set httpReq = New WinHttp.WinHttpRequest
    
    For Each URL In URLs
        
        'Request the match result page by removing "#ou" from the URL
        
        matchURL = Replace(URL.Value, "#ou", "")
        Debug.Print matchURL
        With httpReq
            .Open "GET", matchURL, False
            .setRequestHeader "Host", "www.betexplorer.com"
            .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"     'Windows 10, IE11
            .setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
            .setRequestHeader "Accept-Language", "en-US,en;q=0.5"
            .setRequestHeader "Upgrade-Insecure-Requests", "1"
            .setRequestHeader "Referer", "http://www.betexplorer.com/soccer/italy/serie-a-2015-2016/results/"
            .send
            Debug.Print .Status, .statusText
            
            'Put response in a HTMLDocument for parsing
            
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = .responseText
        End With
        DoEvents
        
        matchData(1) = HTMLdoc.getElementsByClassName("list-breadcrumb__item")(2).innerText
        matchData(2) = HTMLdoc.getElementsByClassName("list-breadcrumb__item")(3).innerText
        
        'Date and time of match is in data-dt attribute of "match-date" P element:
        '< P id=match-date class=list-details__item__date data-dt="15,5,2016,18,00">< /P>
        
        Set para = HTMLdoc.getElementById("match-date")
        parts = Split(para.getAttribute("data-dt"), ",")
        matchData(3) = DateSerial(parts(2), parts(1), parts(0)) + TimeSerial(parts(3), parts(4), 0)
        
        matchData(4) = HTMLdoc.getElementsByTagName("h2")(0).innerText
        matchData(5) = HTMLdoc.getElementsByTagName("h2")(2).innerText
        matchData(6) = HTMLdoc.getElementById("js-score").innerText
        matchData(7) = HTMLdoc.getElementById("js-partial").innerText
        
        'Construct match odds URL from match result URL
        
        parts = Split(URL.Value, "/")
        matchOddsURL = "http://www.betexplorer.com/gres/ajax/matchodds.php?p=1&b=ou&e=" & parts(7)
        Debug.Print matchOddsURL
        
        'Request the match odds data.  The response is a JSON string containing HTML which itself contains the odds data
        
        With httpReq
            .Open "GET", matchOddsURL, False
            .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"     'Windows 10, IE11
            .setRequestHeader "Accept", "application/json, text/javascript, */*; q=0.01"
            .setRequestHeader "Accept-Language", "en-US,en;q=0.5"
            .setRequestHeader "X-Requested-With", "XMLHttpRequest"
            .setRequestHeader "Referer", URL.Value
            .send
            Debug.Print .Status, .statusText
            
            'Extract HTML from JSON response and put in HTMLDocument
            
            HTML = Mid(.responseText, Len("{'odds':'") + 1)     'remove {"odds":" at the start
            HTML = Left(HTML, Len(HTML) - 2)                    'remove "} at the end
            HTML = Replace(HTML, "\n", "")                      'remove newlines
            HTML = Replace(HTML, "\", "")                       'remove escape characters (assumes that every "\" is the escape character preceding the escaped character)
            
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = HTML
        End With
        DoEvents
    
        bookmakerFound = False
        Set tRows = HTMLdoc.getElementsByTagName("TR")
        For Each tRow In tRows
            If tRow.getElementsByTagName("TABLE").Length = 0 Then       'no inner tables?
                If InStr(1, tRow.innerText, "bet365", vbTextCompare) > 0 Then
                    bookmakerFound = True
                    destSheet.Cells(r, "A").Resize(1, 7) = matchData
                    destSheet.Cells(r, "H").Value = tRow.Cells(0).innerText
                    destSheet.Cells(r, "I").Value = tRow.Cells(3).innerText
                    destSheet.Cells(r, "J").Value = tRow.Cells(4).getAttribute("data-odd")
                    destSheet.Cells(r, "K").Value = tRow.Cells(5).getAttribute("data-odd")
                    r = r + 1
                End If
            End If
        Next
        
        If Not bookmakerFound Then
            destSheet.Cells(r, "A").Resize(1, 7) = matchData
            r = r + 1
        End If
        
    Next
    
    Debug.Print "Elapsed time = " & Timer - startTime & " seconds"
    Debug.Print "Elapsed time = " & Format(TimeSerial(0, 0, Timer - startTime), "hh:mm:ss")
    
End Sub
 
Upvote 0
Your code has been applied and executed. Perfect. You are a genius. My troubles are gone !!! Thank you.
 
Upvote 0
John, how do I amend the code when I have three bookmakers? I modified the code alone, but if there was no last bookmaker, no row was added. It's my limit. I'm sorry.

ex) < "Country" ~ "Halfs", "Bookmaker", "Total", "bet365 Over", "bet365 Under" , "Bookmaker", "Total", "188bet Over", "188bet Under", "Total", "Bookmaker", "pinnacle Over", "pinnacle Under" >

+This code is an update to https://www.mrexcel.com/forum/excel-...ml#post4725612 which outputs the "Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs" values if the specified bookmaker doesn't exist, otherwise the "Bookmaker", "Total", "Over", "Under" fields are also output to Excel cells.

DoEvents

bookmakerFound = False
Set tRows = HTMLdoc.getElementsByTagName("TR")
For Each tRow In tRows
If tRow.getElementsByTagName("TABLE").Length = 0 Then 'no inner tables?
If InStr(1, tRow.innerText, "bet365", vbTextCompare) > 0 Then
bookmakerFound = True
destSheet.Cells(r, "A").Resize(1, 7) = matchData
destSheet.Cells(r, "H").Value = tRow.Cells(0).innerText
destSheet.Cells(r, "I").Value = tRow.Cells(3).innerText
destSheet.Cells(r, "J").Value = tRow.Cells(4).getAttribute("data-odd")
destSheet.Cells(r, "K").Value = tRow.Cells(5).getAttribute("data-odd")
r = r + 1
End If
End If
Next

If Not bookmakerFound Then
destSheet.Cells(r, "A").Resize(1, 7) = matchData
r = r + 1
End If
 
Upvote 0
I don't know how to delete the text. John, you don't have to answer any questions. I am satisfied with your reply. If we get a problem later, we'll write again.
 
Upvote 0
Let me ask you a question again. Please help me.
John, how do I amend the code when I have three bookmakers? I modified the code alone, but if there was no last bookmaker, no row was added. It's my limit. I'm sorry.

ex) < "Country" ~ "Halfs", "Bookmaker", "Total", "bet365 Over", "bet365 Under" , "Bookmaker", "Total", "188bet Over", "188bet Under", "Total", "Bookmaker", "pinnacle Over", "pinnacle Under" >

+This code is an update to https://www.mrexcel.com/forum/excel-...ml#post4725612 which outputs the "Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs" values if the specified bookmaker doesn't exist, otherwise the "Bookmaker", "Total", "Over", "Under" fields are also output to Excel cells.

DoEvents

bookmakerFound = False
Set tRows = HTMLdoc.getElementsByTagName("TR")
For Each tRow In tRows
If tRow.getElementsByTagName("TABLE").Length = 0 Then 'no inner tables?
If InStr(1, tRow.innerText, "bet365", vbTextCompare) > 0 Then
bookmakerFound = True
destSheet.Cells(r, "A").Resize(1, 7) = matchData
destSheet.Cells(r, "H").Value = tRow.Cells(0).innerText
destSheet.Cells(r, "I").Value = tRow.Cells(3).innerText
destSheet.Cells(r, "J").Value = tRow.Cells(4).getAttribute("data-odd")
destSheet.Cells(r, "K").Value = tRow.Cells(5).getAttribute("data-odd")
r = r + 1
End If
End If
Next

If Not bookmakerFound Then
destSheet.Cells(r, "A").Resize(1, 7) = matchData
r = r + 1
End If
 
Upvote 0
Let me ask you a question again. Please help me.
John, how do I amend the code when I have three bookmakers? I modified the code alone, but if there was no last bookmaker, no row was added. It's my limit. I'm sorry.

ex) < "Country" ~ "Halfs", "Bookmaker", "Total", "bet365 Over", "bet365 Under" , "Bookmaker", "Total", "188bet Over", "188bet Under", "Total", "Bookmaker", "pinnacle Over", "pinnacle Under" >
So you want the columns as shown above, for 3 bookmakers? The code needs quite a few changes. This code should work for any number of bookmakers - just put the names in the bookmakers array.

Code:
Public Sub Extract_Data4()
    
    Dim httpReq As WinHttp.WinHttpRequest
    Dim HTMLdoc As HTMLDocument
    Dim para As HTMLParaElement
    Dim tRows As IHTMLElementCollection
    Dim tRow As HTMLTableRow
    Dim URLs As Range, URL As Range
    Dim destSheet As Worksheet
    Dim parts As Variant
    Dim matchURL As String, matchOddsURL As String
    Dim r As Long, i As Long
    Dim matchData(1 To 7) As Variant
    Dim HTML As String
    Dim bookmakers As Variant, numRows() As Long, maxRows As Long
    Dim startTime As Single
    
    startTime = Timer
    
    bookmakers = Array("bet365", "188bet", "pinnacle")
    ReDim numRows(UBound(bookmakers))
    
    Set destSheet = ThisWorkbook.Worksheets("XMLhttp_Odds")
    With destSheet
        .UsedRange.ClearContents
        .Range("A1:G1").Value = Array("Country", "League", "Date & Time", "Home", "Away", "Score", "Halfs")
        For i = 0 To UBound(bookmakers)
            .Range("H1").Offset(, i * 4).Resize(1, 4) = Array("Bookmaker", "Total", bookmakers(i) & " Over", bookmakers(i) & " Under")
        Next
    End With
    r = 2
    
    With ThisWorkbook.Worksheets("MatchDetails")
        Set URLs = .Range("D2", .Cells(Rows.Count, "D").End(xlUp))
    End With
    
    Set httpReq = New WinHttp.WinHttpRequest
    
    For Each URL In URLs
        
        'Request the match result page by removing "#ou" from the URL
        
        matchURL = Replace(URL.Value, "#ou", "")
        With httpReq
            .Open "GET", matchURL, False
            .setRequestHeader "Host", "www.betexplorer.com"
            .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"     'Windows 10, IE11
            .setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
            .setRequestHeader "Accept-Language", "en-US,en;q=0.5"
            .setRequestHeader "Upgrade-Insecure-Requests", "1"
            .setRequestHeader "Referer", "http://www.betexplorer.com/soccer/italy/serie-a-2015-2016/results/"
            .send
            Debug.Print .Status, .statusText
            
            'Put response in a HTMLDocument for parsing
            
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = .responseText
        End With
        DoEvents
        
        matchData(1) = HTMLdoc.getElementsByClassName("list-breadcrumb__item")(2).innerText
        matchData(2) = HTMLdoc.getElementsByClassName("list-breadcrumb__item")(3).innerText
        
        'Date and time of match is in data-dt attribute of "match-date" P element:
        '< P id=match-date class=list-details__item__date data-dt="15,5,2016,18,00">< /P>
        
        Set para = HTMLdoc.getElementById("match-date")
        parts = Split(para.getAttribute("data-dt"), ",")
        matchData(3) = DateSerial(parts(2), parts(1), parts(0)) + TimeSerial(parts(3), parts(4), 0)
        
        matchData(4) = HTMLdoc.getElementsByTagName("h2")(0).innerText
        matchData(5) = HTMLdoc.getElementsByTagName("h2")(2).innerText
        matchData(6) = HTMLdoc.getElementById("js-score").innerText
        matchData(7) = HTMLdoc.getElementById("js-partial").innerText
        
        'Construct match odds URL from match result URL
        
        parts = Split(URL.Value, "/")
        matchOddsURL = "http://www.betexplorer.com/gres/ajax/matchodds.php?p=1&b=ou&e=" & parts(7)
        
        'Request the match odds data.  The response is a JSON string containing HTML which itself contains the odds data
        
        With httpReq
            .Open "GET", matchOddsURL, False
            '.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0" 'Windows 10, Firefox
            .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"     'Windows 10, IE11
            .setRequestHeader "Accept", "application/json, text/javascript, */*; q=0.01"
            .setRequestHeader "Accept-Language", "en-US,en;q=0.5"
            .setRequestHeader "X-Requested-With", "XMLHttpRequest"
            .setRequestHeader "Referer", URL.Value
            .send
            Debug.Print .Status, .statusText
            
            'Extract HTML from JSON response and put in HTMLDocument
            
            HTML = Mid(.responseText, Len("{'odds':'") + 1)     'remove {"odds":" at the start
            HTML = Left(HTML, Len(HTML) - 2)                    'remove "} at the end
            HTML = Replace(HTML, "\n", "")                      'remove newlines
            HTML = Replace(HTML, "\", "")                       'remove escape characters (assumes that every "\" is the escape character preceding the escaped character)
            
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = HTML
        End With
        DoEvents
        
        'Initialise number of rows found for each bookmaker
        
        For i = 0 To UBound(numRows)
            numRows(i) = 0
        Next

        'Find data for each bookmaker and extract to Excel
        
        Set tRows = HTMLdoc.getElementsByTagName("TR")
        For Each tRow In tRows
            If tRow.getElementsByTagName("TABLE").Length = 0 Then       'no inner tables?
                For i = 0 To UBound(bookmakers)
                    If InStr(1, tRow.innerText, bookmakers(i), vbTextCompare) > 0 Then
                        destSheet.Cells(r + numRows(i), "H").Offset(, i * 4).Value = tRow.Cells(0).innerText
                        destSheet.Cells(r + numRows(i), "I").Offset(, i * 4).Value = tRow.Cells(3).innerText
                        destSheet.Cells(r + numRows(i), "J").Offset(, i * 4).Value = tRow.Cells(4).getAttribute("data-odd")
                        destSheet.Cells(r + numRows(i), "K").Offset(, i * 4).Value = tRow.Cells(5).getAttribute("data-odd")
                        numRows(i) = numRows(i) + 1
                    End If
                Next
            End If            
        Next
        
        'Find maximum number of rows output for all bookmakers
        
        maxRows = 0
        For i = 0 To UBound(numRows)
            If numRows(i) > maxRows Then maxRows = numRows(i)
        Next
        
        'Set the next start row
        
        If maxRows > 0 Then
            destSheet.Cells(r, "A").Resize(maxRows, 7) = matchData
            r = r + maxRows
        Else
            destSheet.Cells(r, "A").Resize(1, 7) = matchData
            r = r + 1
        End If
        
    Next
    
    Debug.Print "Elapsed time = " & Timer - startTime & " seconds"
    Debug.Print "Elapsed time = " & Format(TimeSerial(0, 0, Timer - startTime), "hh:mm:ss")
    
End Sub
 
Upvote 0
Thank you for your efforts. The code changed so much that it took time. I can't make a code like you. However, the application is available for the code you provided. I did it again. I applied, and I studied again about the code you gave me. Thank you very much. You're the best.
 
Upvote 0

Forum statistics

Threads
1,214,641
Messages
6,120,693
Members
448,979
Latest member
DET4492

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top