VBA Web scrape site and extract variable data

ziggyfo

New Member
Joined
Mar 24, 2021
Messages
24
Office Version
  1. 2019
Platform
  1. Windows
  2. MacOS
Hi geniuses ! :)
I have very little to offer in this regard and hoping somebody could help me if its not too big of an ask. I have been browsing for codes and tried various methods but have not had much success.

What im trying to achieve is scrape the below site for race results, however the results are variable and will change from day to day.

Each day the results will be updated as they happen, so below you will see some times that have not yet happened. Once they have happened they will populate 3 Places. Its worth noting that the color numbers are not text and appear to be images. I am looking for a code that can download all available races through out the day as and when its run to update with the new races.



TM1.1.jpg


I guess the formatting can can be anything really, but if it could be extract almost as is, that would be phenomenal! I have made an example below for reference

Screenshot 2021-04-10 at 18.00.01.jpg


The important part would be to be able to see the 1st, 2nd, 3rd winners so i can process the information further into a database.

Cheers
 

Excel Facts

Excel Can Read to You
Customize Quick Access Toolbar. From All Commands, add Speak Cells or Speak Cells on Enter to QAT. Select cells. Press Speak Cells.
Hi !​
I have been browsing for codes and tried various methods but have not had much success.
Any method works as this is the easiest kind of webpages for scraping !​
Directions :​
  1. On VBE side the reference Microsoft HTML Object Library must be activated.
  2. Paste this beginner starter demonstration to the worksheet module :
VBA Code:
Sub Demo1()
    Dim S$(3, 3), oDoc(2) As New HTMLDocument, C%, R&, oMeet As HTMLDivElement, F&, oRace As HTMLDivElement, oElt As Object, L%
        If UsedRange.Rows.Count > 3 Then Rows("4:" & UsedRange.Rows.Count).Delete
    With [D1]
        If .Text = "" Then .Value2 = "today"
        If IsNumeric(.Value2) Then S(0, 0) = Format(.Value2, "yyyy-mm-dd") Else S(0, 0) = .Text
    End With
    With CreateObject("WinHttp.WinHttpRequest.5.1")
        .Open "GET", "https://www.timeform.com/greyhound-racing/results/" & S(0, 0), False
        .setRequestHeader "DNT", "1"
         On Error Resume Next
        .send
         If .Status = 200 Then S(0, 1) = .responseText
    End With
        On Error GoTo 0
        If S(0, 1) = "" Then Beep: Exit Sub
        oDoc(0).body.innerHTML = S(0, 1)
        Application.ScreenUpdating = False
        C = 2:  R = 5
    For Each oMeet In oDoc(0).getElementsByClassName("waf-meeting w-container")
            oDoc(1).body.innerHTML = oMeet.innerHTML
            Cells(R - 1, 1).Value2 = Split(Application.Clean(LTrim(oDoc(1).querySelector("h3").innerText)), " Top")(0)
            F = R
        For Each oRace In oDoc(1).getElementsByClassName("waf-result w-content")
                oDoc(2).body.innerHTML = oRace.innerHTML
                Set oElt = oDoc(2).querySelector("a")
                Erase S
            If oElt Is Nothing Then
                S(0, 0) = oDoc(2).querySelector("b").innerText
            Else
                    S(0, 0) = oElt.innerText
                    L = 0
                For Each oElt In oDoc(2).querySelector("table").Rows
                    L = L + 1
                    S(L, 0) = oElt.Cells(0).innerText
                    S(L, 1) = oElt.Cells(1).Children(0).alt
                    S(L, 2) = oElt.Cells(1).Children(1).innerText
                    S(L, 3) = oElt.Cells(2).innerText
                Next
            End If
            With Cells(R, C).Resize(4, 4)
                .Rows(1).HorizontalAlignment = xlCenterAcrossSelection
                .Value2 = S
            End With
                If C < 12 Then C = C + 5 Else C = 2: R = R + 2 - (Cells(R + 1, C).Text > "") * 3
        Next
            If C = 2 Then R = R + 1 Else C = 2: R = R + 3 - (Cells(R + 1, C).Text > "") * 3
            Rows(F & ":" & R - 3).Group
    Next
        Range("A:A,D:D,I:I,N:N").Columns.AutoFit
        Application.ScreenUpdating = True
        Erase oDoc
End Sub
➡️ Do you like it ? ⏩ So thanks to click on the bottom right ?Like icon ! ↘️
 
Upvote 0
Solution
Hi !​

Any method works as this is the easiest kind of webpages for scraping !​
Directions :​
  1. On VBE side the reference Microsoft HTML Object Library must be activated.
  2. Paste this beginner starter demonstration to the worksheet module :
VBA Code:
Sub Demo1()
    Dim S$(3, 3), oDoc(2) As New HTMLDocument, C%, R&, oMeet As HTMLDivElement, F&, oRace As HTMLDivElement, oElt As Object, L%
        If UsedRange.Rows.Count > 3 Then Rows("4:" & UsedRange.Rows.Count).Delete
    With [D1]
        If .Text = "" Then .Value2 = "today"
        If IsNumeric(.Value2) Then S(0, 0) = Format(.Value2, "yyyy-mm-dd") Else S(0, 0) = .Text
    End With
    With CreateObject("WinHttp.WinHttpRequest.5.1")
        .Open "GET", "https://www.timeform.com/greyhound-racing/results/" & S(0, 0), False
        .setRequestHeader "DNT", "1"
         On Error Resume Next
        .send
         If .Status = 200 Then S(0, 1) = .responseText
    End With
        On Error GoTo 0
        If S(0, 1) = "" Then Beep: Exit Sub
        oDoc(0).body.innerHTML = S(0, 1)
        Application.ScreenUpdating = False
        C = 2:  R = 5
    For Each oMeet In oDoc(0).getElementsByClassName("waf-meeting w-container")
            oDoc(1).body.innerHTML = oMeet.innerHTML
            Cells(R - 1, 1).Value2 = Split(Application.Clean(LTrim(oDoc(1).querySelector("h3").innerText)), " Top")(0)
            F = R
        For Each oRace In oDoc(1).getElementsByClassName("waf-result w-content")
                oDoc(2).body.innerHTML = oRace.innerHTML
                Set oElt = oDoc(2).querySelector("a")
                Erase S
            If oElt Is Nothing Then
                S(0, 0) = oDoc(2).querySelector("b").innerText
            Else
                    S(0, 0) = oElt.innerText
                    L = 0
                For Each oElt In oDoc(2).querySelector("table").Rows
                    L = L + 1
                    S(L, 0) = oElt.Cells(0).innerText
                    S(L, 1) = oElt.Cells(1).Children(0).alt
                    S(L, 2) = oElt.Cells(1).Children(1).innerText
                    S(L, 3) = oElt.Cells(2).innerText
                Next
            End If
            With Cells(R, C).Resize(4, 4)
                .Rows(1).HorizontalAlignment = xlCenterAcrossSelection
                .Value2 = S
            End With
                If C < 12 Then C = C + 5 Else C = 2: R = R + 2 - (Cells(R + 1, C).Text > "") * 3
        Next
            If C = 2 Then R = R + 1 Else C = 2: R = R + 3 - (Cells(R + 1, C).Text > "") * 3
            Rows(F & ":" & R - 3).Group
    Next
        Range("A:A,D:D,I:I,N:N").Columns.AutoFit
        Application.ScreenUpdating = True
        Erase oDoc
End Sub
➡️ Do you like it ? ⏩ So thanks to click on the bottom right ?Like icon ! ↘️

Hi Marc, Thank you for getting back to me with this great piece of work. I am however getting an error, i have installed the HTML object, but i am getting a run time error on :

If UsedRange.Rows.Count > 3 Then

Line 3 of your code. Do you know if there is an additional object required?
 
Upvote 0
Hi, another alternative

Activate the library
E242a77.png


VBA Code:
Sub Demo2()
    Dim http As New MSXML2.XMLHTTP60
    Dim html As New MSHTML.HTMLDocument
    Dim elements As MSHTML.IHTMLElementCollection
    Dim element As MSHTML.IHTMLElement
    Dim r As Long
    Dim c As Integer
    Cells.Clear
    With http
        .Open "GET", "https://www.timeform.com/greyhound-racing/results/today"
        .Send
        html.body.innerHTML = .responseText
    End With
    Set elements = html.getElementsByClassName("waf-result w-content")
    r = 1: c = 1
    For Each element In elements
        Cells(r, c) = element.innerText
        c = c + 1
        If c = 4 Then
            c = 1: r = r + 1
        End If
    Next element
    Cells.EntireColumn.ColumnWidth = 55
    Cells.EntireRow.AutoFit
End Sub
 
Upvote 0
Line 3 of your code. Do you know if there is an additional object required?
No well rocks on my side as it is so it seems you did not follow my direction #2 - the bad reader classic trap ! - : where is located the code ? …​
And I forgot to say : for Windows only obviously …​
 
Last edited:
Upvote 0
MrGes,​

first thanks for the Like !​
Your code well works on my side but all race data are within a single cell so different of the expected shown in the initial post and meeting names are missing …​
Could be a mess for him to feed his database.​
 
Last edited:
Upvote 0
@Marc L this is not a problem, just integrate the code
VBA Code:
Sub Demo2()
    Dim http As New MSXML2.XMLHTTP60
    Dim html As New MSHTML.HTMLDocument
    Dim elements As MSHTML.IHTMLElementCollection
    Dim element As MSHTML.IHTMLElement
    Dim myTxt As Variant
    Dim r As Long
    Dim c As Integer
    Cells.Clear
    With http
        .Open "GET", "https://www.timeform.com/greyhound-racing/results/today"
        .Send
        html.body.innerHTML = .responseText
    End With
    Set elements = html.getElementsByClassName("waf-result w-content")
    r = 1: c = 1
    For Each element In elements
        On Error Resume Next
        myTxt = Split(element.innerText, Chr(10))
        Cells(r, c) = myTxt(0)
        Cells(r + 1, c) = myTxt(1)
        Cells(r + 2, c) = myTxt(2)
        Cells(r + 3, c) = myTxt(3)
        Cells(r + 4, c) = myTxt(4)
        c = c + 1
        If c = 4 Then
            c = 1: r = r + 6
        End If
    Next element
    Cells.EntireColumn.ColumnWidth = 55
    Cells.EntireRow.AutoFit
End Sub
 
Upvote 0
That's better but what about the missing meeting names if really they are necessary, let's wait his answer …​
 
Upvote 0
No well rocks on my side as it is so it seems you did not follow my direction #2 - the bad reader classic trap ! - : where is located the code ? …​
And I forgot to say : for Windows only obviously …​
@Marc L this is not a problem, just integrate the code
VBA Code:
Sub Demo2()
    Dim http As New MSXML2.XMLHTTP60
    Dim html As New MSHTML.HTMLDocument
    Dim elements As MSHTML.IHTMLElementCollection
    Dim element As MSHTML.IHTMLElement
    Dim myTxt As Variant
    Dim r As Long
    Dim c As Integer
    Cells.Clear
    With http
        .Open "GET", "https://www.timeform.com/greyhound-racing/results/today"
        .Send
        html.body.innerHTML = .responseText
    End With
    Set elements = html.getElementsByClassName("waf-result w-content")
    r = 1: c = 1
    For Each element In elements
        On Error Resume Next
        myTxt = Split(element.innerText, Chr(10))
        Cells(r, c) = myTxt(0)
        Cells(r + 1, c) = myTxt(1)
        Cells(r + 2, c) = myTxt(2)
        Cells(r + 3, c) = myTxt(3)
        Cells(r + 4, c) = myTxt(4)
        c = c + 1
        If c = 4 Then
            c = 1: r = r + 6
        End If
    Next element
    Cells.EntireColumn.ColumnWidth = 55
    Cells.EntireRow.AutoFit
End Sub

Thank you Mrges, you code works great. As Marc mentioned, there are some important information that is required on the sheet that is not currently being extracted.

The race names are vital to be able to use this information and also the cloth number. You will see the 1st 2,nd and 3rd positions all have a colour number before their name, that is the cloth name. As mentioned in the original post, this seems to require an alerted way of extracting from my research.
 
Upvote 0

Forum statistics

Threads
1,215,387
Messages
6,124,637
Members
449,177
Latest member
Sousanna Aristiadou

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top