Changing a loop and finding a HREF links using VBA and appending fetched links from sheet

mattadams84

Board Regular
Joined
Oct 30, 2016
Messages
54
Hi all,

My problem is two-fold. I have a worksheet already created that has a list of Football Leagues (in the rows), the columns hold the data. I have some code that fetches data from a website and populates this sheet (each row contains data corresponding to a league). It currently works, but i need to modify it so that it acts slightly differently.

Code:
'VBE > Tools > References:'1: Microsoft HTML Object library  2: Microsoft Internet Controls
Public Sub GetSoccerStats()
    Dim ie As Object, t As Date
    Dim objDoc As New MSHTML.HTMLDocument, text As String
    Dim lastRow As Long, dataSheet As Worksheet, inputArray(), i As Long


    Const MAX_WAIT_SEC As Long = 10


    Set dataSheet = ThisWorkbook.Worksheets("AVG GOAL DATA")
    Set ie = CreateObject("InternetExplorer.Application")
    With dataSheet
        lastRow = .Cells(.Rows.Count, "B").End(xlUp).Row
    End With


    inputArray = dataSheet.Range("C4:E" & lastRow).Value
    inputArray = GetLinks(inputArray)


    Dim results(), r As Long, c As Long
    ReDim results(1 To UBound(inputArray, 1), 1 To 8)


    With ie
        .Visible = True
        For i = LBound(inputArray, 1) To UBound(inputArray, 1)
            r = r + 1
            .navigate2 inputArray(i, 4)


            While .Busy Or .readyState < 4: DoEvents: Wend


            Dim objTable As MSHTML.HTMLTable, objTableRow As MSHTML.HTMLTableRow


            Do
                DoEvents
                On Error Resume Next
                Set objTable = .document.getElementsByClassName("table-main leaguestats")(0)
                On Error GoTo 0
                If Timer - t > MAX_WAIT_SEC Then Exit Do
            Loop While objTable Is Nothing


            If Not objTable Is Nothing Then
                c = 1
                For Each objTableRow In objTable.Rows
                    text = objTableRow.Cells(0).innerText
                    Select Case text
                    Case "Matches played", "Matches remaining", "Home goals", "Away goals"
                        results(r, c) = objTableRow.Cells(1).innerText
                        results(r, c + 1) = objTableRow.Cells(2).innerText
                        c = c + 2
                    End Select
                Next objTableRow
            End If
        Next
        .Quit
    End With
    dataSheet.Range("F4").Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub

The problem is in the next section of code which is the part that fetches the URL links from my spreadsheet.

Code:
Public Function GetLinks(ByRef inputArray As Variant) As Variant    Dim i As Long
    ReDim Preserve inputArray(1 To UBound(inputArray, 1), 1 To UBound(inputArray, 2) + 1)


    For i = LBound(inputArray, 1) To UBound(inputArray, 1)
        inputArray(i, 4) = IIf(inputArray(i, 1) = "CURRENT", inputArray(i, 2), inputArray(i, 3))
    Next
    GetLinks = inputArray
End Function

At the moment this code looks on my sheet to see that if in column C, the word is CURRENT, then use the URL provided in column D. If it doesn't contain that word it takes the URL from column E.

So my first problem is that I need change it so that if it the cell DOES NOT contain the word CURRENT it simply skips the row (or that league) and goes to the next row down and repeats the process. This is because the data i am fetching does not change in the URL for column E, so there is no need to refetch it.

My second problem is that Some leagues towards the end of the season get split in to groups (See betexplorer.com/soccer/…) As you notice when you land on this page it defaults you to giving stats to the 'Championship group' However there is a tab that says 'main'. This is the data i need. The further problem is that not all leagues have this. As far as i can tell, the 'main' stats tab has a different URL within an HREF tag so perhaps the VBA can use the link i provide on the worksheet, check to see if a 'main' tab exists, if it does not then just pull the data from that page, or if it does then redirect to the URL for the 'main' tab and pull the data from that one...

Can anyone help?

Here is a link to my spreadsheet :

https://www.dropbox.com/s/oumlev7guu5ru5h/Goals.xlsm?dl=0
 

Excel Facts

Format cells as date
Select range and press Ctrl+Shift+3 to format cells as date. (Shift 3 is the # sign which sort of looks like a small calendar).
For the first part you need 2 changes, one in the function that gets the links and one in the main code.

In the function try changing the code like this.
Code:
Public Function GetLinks(ByRef inputArray As Variant) As Variant    Dim i As Long
    ReDim Preserve inputArray(1 To UBound(inputArray, 1), 1 To UBound(inputArray, 2) + 1)


    For i = LBound(inputArray, 1) To UBound(inputArray, 1)
        inputArray(i, 4) = IIf(inputArray(i, 1) = "CURRENT", inputArray(i, 2), "SKIP")
    Next
    GetLinks = inputArray
End Function
Now update the main code like this.
Code:
Public Sub GetSoccerStats()
    Dim ie As Object, t As Date
    Dim objDoc As New MSHTML.HTMLDocument, text As String
    Dim lastRow As Long, dataSheet As Worksheet, inputArray(), i As Long


    Const MAX_WAIT_SEC As Long = 10


    Set dataSheet = ThisWorkbook.Worksheets("AVG GOAL DATA")
    Set ie = CreateObject("InternetExplorer.Application")
    With dataSheet
        lastRow = .Cells(.Rows.Count, "B").End(xlUp).Row
    End With


    inputArray = dataSheet.Range("C4:E" & lastRow).Value
    inputArray = GetLinks(inputArray)


    Dim results(), r As Long, c As Long
    ReDim results(1 To UBound(inputArray, 1), 1 To 8)


    With ie
        .Visible = True
        For i = LBound(inputArray, 1) To UBound(inputArray, 1)
        
            If inputArray(1, 4) <> "SKIP" Then
                r = r + 1
                .navigate2 inputArray(i, 4)
    
    
                While .Busy Or .readyState < 4: DoEvents: Wend
    
    
                Dim objTable As MSHTML.HTMLTable, objTableRow As MSHTML.HTMLTableRow
    
    
                Do
                    DoEvents
                    On Error Resume Next
                    Set objTable = .document.getElementsByClassName("table-main leaguestats")(0)
                    On Error GoTo 0
                    If Timer - t > MAX_WAIT_SEC Then Exit Do
                Loop While objTable Is Nothing
    
    
                If Not objTable Is Nothing Then
                    c = 1
                    For Each objTableRow In objTable.Rows
                        text = objTableRow.Cells(0).innerText
                        Select Case text
                        Case "Matches played", "Matches remaining", "Home goals", "Away goals"
                            results(r, c) = objTableRow.Cells(1).innerText
                            results(r, c + 1) = objTableRow.Cells(2).innerText
                            c = c + 2
                        End Select
                    Next objTableRow
                End If
            End If
        Next
        
        .Quit
    End With
    
    dataSheet.Range("F4").Resize(UBound(results, 1), UBound(results, 2)) = results
    
End Sub
That should skip any URLs that aren't marked as 'CURRENT'.
 
Upvote 0
Thanks, this works to some extent, as instead of loading up the URL it tries to load up a page that says https://skip and obviously that dosnt work. Is there anyway to avoid it loading up that page and just skipping to the next row...

I imagine the code being as follows :

Code:
Public Function GetLinks(ByRef inputArray As Variant) As Variant    Dim i As Long
    ReDim Preserve inputArray(1 To UBound(inputArray, 1), 1 To UBound(inputArray, 2) + 1)




    For i = LBound(inputArray, 1) To UBound(inputArray, 1)
        inputArray(i, 4) = IIf(inputArray(i, 1) = "CURRENT", inputArray(i, 2), --- SKIP TO NEXT LINE IN SPREADSHEET----)
    Next
    GetLinks = inputArray
End Function
 
Upvote 0
I have also tried to set the word "CURRENT" to a variable. This is because i have 2 buttons to run the macro, one that will look for CURRENT and one that will look for LAST. I tried doing this in the first SUB:

Code:
If Application.Caller = "Current" Then    SelectedButton = "CURRENT"
    
Else
 
SelectedButton = "LAST"
    
End If

And then this is in the second:

Code:
   Public Function GetLinks(ByRef inputArray As Variant) As Variant   Dim i As Long
    ReDim Preserve inputArray(1 To UBound(inputArray, 1), 1 To UBound(inputArray, 2) + 1)

    For i = LBound(inputArray, 1) To UBound(inputArray, 1)
        inputArray(i, 4) = IIf(inputArray(i, 1) = SelectedButton, inputArray(i, 2), "SKIP")
    Next
    GetLinks = inputArray
End Function

However it does not work. Any ideas?
 
Last edited:
Upvote 0
Did you make the changes to the main sub that I suggested?

In particular this part, which should 'sklp' the URL if it isn't current.

Code:
If inputArray(1, 4) <> "SKIP" Then

    ' code to go to URL

End If
 
Upvote 0
There's no reference to 'LAST' in the code I posted or in the original code.:eek:
 
Upvote 0
Ah, sorry i probably wasn't clear. Basically in column C on the sheet it will either contain the word CURRENT or LAST. Basically I have two buttons on the sheet, one for CURRENT and one for LAST. If i push the 'Current' button i want it to go through each row and fetch the data for where CURRENT is in column C. If it says LAST i want it to ignore that row and go to the next row. What you coded 'works' but when it reaches a row with the word LAST in column C the browser loads a page https://skip. Hope thats clearer
 
Upvote 0
There was no mentions of buttons or use of Application.Caller in the original code.

Can you post your current code?

How are you using the 2 buttons?
 
Upvote 0
The two buttons are at the top of my sheet and are called 'Current' and 'Last'

Here is the code:

Code:
Option Explicit
Dim SelectedButton As String


Sub ClearConents()


With ActiveSheet
    .Range("B3:BP" & .Range("A3").End(xlDown).Row).ClearContents
End With


End Sub


'VBE > Tools > References:'1: Microsoft HTML Object library  2: Microsoft Internet Controls
Public Sub GetSoccerStats()
    Dim ie As Object, t As Date
    Dim objDoc As New MSHTML.HTMLDocument, text As String
    Dim lastRow As Long, dataSheet As Worksheet, inputArray(), i As Long
 
 
    Const MAX_WAIT_SEC As Long = 10
 
    Set dataSheet = ThisWorkbook.Worksheets("AVG GOAL DATA")
    Set ie = CreateObject("InternetExplorer.Application")
    With dataSheet
        lastRow = .Cells(.Rows.Count, "B").End(xlUp).Row
    End With
 
 
    inputArray = dataSheet.Range("C4:E" & lastRow).Value
    inputArray = GetLinks(inputArray)
    
    If Application.Caller = "Current" Then
    SelectedButton = "CURRENT"
    End If
 
    Dim results(), r As Long, c As Long
    ReDim results(1 To UBound(inputArray, 1), 1 To 8)
 
 
    With ie
        .Visible = True
        For i = LBound(inputArray, 1) To UBound(inputArray, 1)
        [COLOR=#333333] If inputArray(1, 4) <> "SKIP" Then[/COLOR]
         r = r + 1
            .Navigate2 inputArray(i, 4)
 
 
            While .Busy Or .readyState < 4: DoEvents: Wend
 
            ' may need additional wait here
            Dim objTable As MSHTML.HTMLTable, objTableRow As MSHTML.HTMLTableRow
           
            If .document.querySelectorAll(".list-tabs--secondary").Length > 0 Then
                'championship tab present
                'switch to main
                .document.querySelector(".list-tabs--secondary a").Click
 
                While .Busy Or .readyState < 4: DoEvents: Wend
            Else                                 'you don't need this part
                'Championship tab is not present
            End If
 
            t = Timer
            Do
                DoEvents
                On Error Resume Next
                Set objTable = .document.getElementsByClassName("table-main leaguestats")(0)
                On Error GoTo 0
                If Timer - t > MAX_WAIT_SEC Then Exit Do
            Loop While objTable Is Nothing
 
 
            If Not objTable Is Nothing Then
                c = 1
                For Each objTableRow In objTable.Rows
                    text = objTableRow.Cells(0).innerText
                    Select Case text
                    Case "Matches played", "Matches remaining", "Home goals", "Away goals"
                        results(r, c) = objTableRow.Cells(1).innerText
                        results(r, c + 1) = objTableRow.Cells(2).innerText
                        c = c + 2
                    End Select
                Next objTableRow
            End If
            Set objTable = Nothing
        Next
        .Quit
    End With
    dataSheet.Range("F4").Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub
 
Public Function GetLinks(ByRef inputArray As Variant) As Variant
    Dim i As Long
    ReDim Preserve inputArray(1 To UBound(inputArray, 1), 1 To UBound(inputArray, 2) + 1)
 
 
    For i = LBound(inputArray, 1) To UBound(inputArray, 1)
        inputArray(i, 4) = IIf(inputArray(i, 1) = SelectedButton, inputArray(i, 2), inputArray(i, 3))
    Next
    GetLinks = inputArray
End Function
 
Last edited:
Upvote 0

Forum statistics

Threads
1,215,430
Messages
6,124,846
Members
449,194
Latest member
HellScout

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top