Web Scraping Add problem in list

fredrerik84

Active Member
Joined
Feb 26, 2017
Messages
383
Hi I would like to start this thread to thank the community here and especially Domenic for helping me together the code bellow:

Rich (BB code):
Rich (BB code):
    Set HTMLDoc = IE.document
Sub testing2()




    Dim IE As New SHDocVw.InternetExplorer
    Dim HTMLDoc As New MSHTML.HTMLDocument
    Dim HTMLRows As MSHTML.IHTMLElementCollection
    Dim HTMLRow As MSHTML.IHTMLElement
    Dim League As String
    Dim sheet As Worksheet
    Dim r As Long
    Dim i As Long
    Dim j As Long
    Dim K As Long
    Dim vMatch As Variant
    
    
    Set sheet = ActiveWorkbook.Sheets("Conversion")
    IE.Navigate "http://www.xscores.com/soccer/scheduled_games/24-05"
    IE.Visible = True
    
With IE
        Do While .Busy Or .readyState <> READYSTATE_COMPLETE
            DoEvents
        Loop
    End With
    
    Set HTMLDoc = IE.document
    Set HTMLRows = HTMLDoc.getElementById("scoretable").getElementsByTagName("tr")
    
    Columns("B:F").ClearContents
     Columns("B:F").Borders.LineStyle = xlNone
    
    r = 2
    For i = 0 To HTMLRows.Length - 3
        If HTMLRows(i).Cells.Length - 1 >= 10 Then
            Cells(r, "B").Value = HTMLRows(i).Cells(0).innerText
            Cells(r, "B").Borders(xlEdgeLeft).Color = RGB(91, 155, 213)
            League = HTMLRows(i).Cells(IIf(i > 0, 4, 3)).innerText
            Cells(r, "C").Value = League
            vMatch = Application.VLookup(League, sheet.Range("V1:X62"), 3, False)
            If Not IsError(vMatch) Then
                Cells(r, "C").Value = vMatch
            Else
                Cells(r, "C").Value = League
            End If
            Cells(r, "D").Value = HTMLRows(i).Cells(IIf(i > 0, 5, 4)).innerText & " (" & HTMLRows(i).Cells(IIf(i > 0, 6, 5)).innerText & ")"
            If i > 0 Then
                Cells(r, "E").Value = GetCountry(HTMLRows(i).Cells(3).getElementsByTagName("a")(0).getAttribute("title"))
            Else
                Cells(r, "E").Value = "Country"
            End If
            Cells(r, "F").Value = HTMLRows(i).Cells(IIf(i > 0, 9, 7)).innerText & " (" & HTMLRows(i).Cells(IIf(i > 0, 10, 8)).innerText & ")"
            Cells(r, "F").Borders(xlEdgeRight).Color = RGB(91, 155, 213)
            r = r + 1
        End If
    Next i
    Cells(r - 1, 2).Resize(, 5).Borders(xlEdgeBottom).Color = RGB(91, 155, 213)
    MsgBox i - 4 & " Matches has been Loaded.", vbInformation
    IE.Quit
    Set HTMLDoc = Nothing
    Set HTMLRows = Nothing
    Set HTMLRow = Nothing
    
End Sub

but I have made some changes to this code as this page has also "all_games" link so the line in red above will be: IE.Navigate "http://www.xscores.com/soccer/all_games/24-05" but this gives following problem: as this is just as summerize page for Finished_games, Live_games and up_comming games so this site handles this with adding an Add in the middle of the list. , Is there an way/code I can try to jump over this without going into infinite loop Any help would be greatly appreciated , my sulltion is to run the code from 3 links but I relly hope there is an better way

 
Last edited:
Wow this is great :) this code is sick, you will not believe how much this will help, thank you so much..

It's a little late here so will have to wait untill tomorrow to add latest changes..

I think ill add all your changes for sure except I think 5sec is enough waiting time to kill it, after all its 2017.. :)
 
Upvote 0

Excel Facts

Excel Joke
Why can't spreadsheets drive cars? They crash too often!
hi i tested your latest edits and included it in my full code gonna try to fetch the other sports results *** well

here is my working code:

Code:
Option Explicit




Sub testing2()




    Dim IE As New SHDocVw.InternetExplorer
    Dim HTMLDoc As New MSHTML.HTMLDocument
    Dim HTMLRows As MSHTML.IHTMLElementCollection
    Dim HTMLRow As MSHTML.IHTMLElement
    Dim League As String
    Dim sheet As Worksheet
    Dim R As Long
    Dim i As Long
    Dim j As Long
    Dim K As Long
    Dim L As Long
    Dim MSG As Long
    Dim vMatch As Variant
    Dim Sport As String
    Dim GameStatus As String
    Dim vStatus As Variant
    Dim mDate As String
    Dim sKO As String
    Dim strURL As String
    Dim sngStartTime As Single
    Dim strErrMsg As String
    Dim blnSuccessful As Boolean
    
    
    Const MAX_WAIT_SEC As Integer = 6 'change wait time as desired
    Const URL As String = "http://www.xscores.com/"
    
    If TypeName(ActiveSheet) <> "Worksheet" Then
        MsgBox "Please make sure that a worksheet is the active sheet, " & _
            "and try again.", vbExclamation
        Exit Sub
    End If
    
    Sport = Cells(6, "j")
    GameStatus = Cells(9, "j")
    mDate = Cells(12, "j")
    Set sheet = ActiveWorkbook.Sheets("Conversion")
    vStatus = Application.VLookup(GameStatus, sheet.Range("AA2:AB5"), 2, False)
    strURL = "http://www.xscores.com/" & Sport & "/" & vStatus & "/" & Format(mDate, "dd-mm")
    
    On Error GoTo ErrHandler
    
    If vStatus = "scheduled_games" Then
       L = 0
       MSG = 5
    ElseIf vStatus = "live_games" Then
       L = 0
    ElseIf vStatus = "finished_games" Then
       L = 0
       MSG = 4
    Else
       L = 0
       MSG = 3
       
    End If
    
    With Application
        .DisplayAlerts = False
        .EnableEvents = False
        .ScreenUpdating = False
    End With
    
    
    With IE
        .Navigate strURL
        .Visible = False
        sngStartTime = Timer
        Do While .Busy Or .readyState <> READYSTATE_COMPLETE
            DoEvents
            If Timer - sngStartTime > MAX_WAIT_SEC Then
                strErrMsg = "Unable to connect to :" & vbCrLf & vbCrLf & strURL
                GoTo ErrHandler
            End If
        Loop
    End With
            
    Set HTMLDoc = IE.document
    Set HTMLRows = HTMLDoc.getElementById("scoretable").getElementsByTagName("tr")
    
    Columns("B:G").ClearContents
     Columns("B:G").Borders.LineStyle = xlNone
    
    R = 2
    Columns("F").NumberFormat = "@"
    For i = L To HTMLRows.Length - 1
        sKO = HTMLRows(i).Cells(0).innerText
        If i = 0 Or IsDate(sKO) Then
            Cells(R, "B").Value = sKO
            Cells(R, "B").Borders(xlEdgeLeft).Color = RGB(91, 155, 213)
            League = HTMLRows(i).Cells(IIf(i > 0, 4, 3)).innerText
            Cells(R, "C").Value = League
            vMatch = Application.VLookup(League, sheet.Range("V1:X62"), 3, False)
            If Not IsError(vMatch) Then
                Cells(R, "D").Value = vMatch
            Else
                Cells(R, "D").Value = League
            End If
            Cells(R, "E").Value = HTMLRows(i).Cells(IIf(i > 0, 5, 4)).innerText & " (" & HTMLRows(i).Cells(IIf(i > 0, 6, 5)).innerText & ")"
            If i > 0 Then
                Cells(R, "C").Value = GetCountry(HTMLRows(i).Cells(3).getElementsByTagName("a")(0).getAttribute("title"))
            Else
                Cells(R, "C").Value = "Country"
            End If
            Cells(R, "F").Value = HTMLRows(i).Cells(IIf(i > 0, 14, 11)).innerText
            Cells(R, "G").Value = HTMLRows(i).Cells(IIf(i > 0, 9, 7)).innerText & " (" & HTMLRows(i).Cells(IIf(i > 0, 10, 8)).innerText & ")"
            Cells(R, "G").Borders(xlEdgeRight).Color = RGB(91, 155, 213)
            R = R + 1
        End If
    Next i
    Cells(R - 1, 2).Resize(, 6).Borders(xlEdgeBottom).Color = RGB(91, 155, 213)
    blnSuccessful = True
    
ExitHandler:
    If Not IE Is Nothing Then
        IE.Quit
    End If
    
    With Application
        .DisplayAlerts = True
        .EnableEvents = True
        .ScreenUpdating = True
    End With
    
    Set IE = Nothing
    Set HTMLDoc = Nothing
    Set HTMLRows = Nothing
    Set HTMLRow = Nothing
    Set sheet = Nothing
    
    If blnSuccessful Then
        MsgBox i - MSG & " Matches has been Loaded.", vbInformation
    End If
    
    Exit Sub
    
ErrHandler:
    If Len(strErrMsg) > 0 Then
        MsgBox strErrMsg, vbCritical, "Error"
        GoTo ExitHandler
    Else
        If Err <> 0 Then
            MsgBox "Error " & Err.Number & ":  " & Err.Description, vbCritical, "Error"
            Resume ExitHandler
        End If
    End If
    
End Sub




 Sub Clear()
     Dim sheet As Worksheet
     
     Columns("B:G").ClearContents
     Columns("B:G").Borders.LineStyle = xlNone
     Cells(2, "B").Value = "K/O"
     Cells(2, "C").Value = "Country"
     Cells(2, "D").Value = "League"
     Cells(2, "E").Value = "Home (LP)"
     Cells(2, "F").Value = "-"
     Cells(2, "G").Value = "Away (LP)"
 End Sub
Function GetCountry(ByVal sTitle As String) As String
    Dim sCountry As String
    sCountry = Mid(sTitle, InStrRev(sTitle, " ") + 1)
    GetCountry = sCountry
End Function

did I edit it correct ?
 
Upvote 0
I would probably move On Error GoTo ErrHandler just before Sport = Cells(6, "j") just in case of an error within that section. Other than that, anything else I think is really a matter of preference. For example, I would have kept the two lines clearing the contents and borders before setting Application.ScreenUpdating to False so that the worksheet gets cleared even if there's a time out or an error occurs.
 
Upvote 0
Yes I guess you are right I will adjust those smaller things before moving on to next part of this code..

I really appreciate all the help :)
 
Upvote 0
So now im trying to sort all of this data , the plan is to make many dynamic dependent list(s)

Here is what I got so far:

Rich (BB code):
Sub UpdateKey()
Dim lr As Long
Dim lr2 As Long
Dim Rng As Range
Dim i As Long
Dim j As Long


    With Application
       .ScreenUpdating = False
    End With
 
    i = 3
    lr = Cells(Rows.Count, 3).End(xlUp).Row
    lr2 = Cells(Rows.Count, 3).End(xlUp).Row
    Columns("N:BT").ClearContents


    Range("C" & 3 & ":C" & lr).Copy
    Worksheets("Test").Range("N3").PasteSpecial Paste:=xlPasteValues
    Application.CutCopyMode = False
    
    With Range("N" & 3 & ":N" & lr2)
        .Value = .Value
        .RemoveDuplicates Columns:=1, Header:=xlNo
        
        On Error Resume Next
        .SpecialCells(xlCellTypeBlanks).Delete xlShiftUp
        On Error GoTo 0
    End With
    Range("N" & 3 & ":N" & lr2).CurrentRegion.Sort _
    key1:=Range("N" & 3 & ":N" & lr2), order1:=xlAscending, _
    key2:=Range("N" & 3 & ":N" & lr2), order2:=xlAscending, Header:=xlNo
    
    
    For j = 1 To lr2
       Cells(2, 12 + j).Value = Cells(j, "N")
    Next j
    
    With Application
       .ScreenUpdating = True
    End With
    
End Sub


 
Sub trysort()


    Dim lr As Long
    Dim i As Long
    Dim j As Long
    Dim iVal As Integer
    lr = Cells(Rows.Count, 3).End(xlUp).Row
    iVal = Application.WorksheetFunction.CountIf(Range("C" & 3 & ":C" & lr), Cells(3, "N"))
    


    For i = 1 To lr
     If Cells(3, "N").Value = Cells(i, "C") Then
       For j = 2 To iVal
          Cells(1 + j, "O").Value = Cells(i, "D")
       Next j
     End If
    Next i
End Sub

I have several problems with this:

* To make it copy all leagues for desired outcome
* also make this code loop trough all countries to store all leagues without duplicates

then im gonna move on to the teams

What im I doing wrong in this code ?
 
Upvote 0
For help with a new question, please start a new thread...
 
Upvote 0
Ok sorry . I think I will be able to work this out myself. :)

I have a more related question. You helped with this:

Code:
Function GetCountry(ByVal sTitle As String) As String
    Dim sCountry As String
    sCountry = Mid(sTitle, InStrRev(sTitle, " ") + 1)
    GetCountry = sCountry
End Function

Your simple function :) hehe its not as simple for me, Id imagine its quite similar to make a function to also get the league name as its only in short form on the page.

Could you please help ... . I was only able to get it to return "table" in all rows
 
Upvote 0
Actually a little update i used this code:

Code:
Cells(R, "D").Value = HTMLRows(i).Cells(4).getElementsByTagName("a")(0).getAttribute("*******")

which I think can be used. Returns:

leagueData('37949','PRIMERA DIVISION','ARGENTINA','L','2016/2017','PRIMERA DIVISION'); return false;"

Now If its its somehow possible to extract "PRIMERA DIVISION" from this it will be golden
 
Upvote 0
Try...

Code:
Cells(R, "D").Value = GetLeague(HTMLRows(i).Cells(4).getElementsByTagName("a")(0).getAttribute("on_click"))

...which calls the following function...

Code:
Function GetLeague(ByVal sLeagueData As String) As String
    Dim sLeague As String
    sLeague = Mid(sLeagueData, InStr(1, sLeagueData, "','") + 3)
    sLeague = Left(sLeague, InStr(1, sLeague, "','") - 1)
    GetLeague = sLeague
End Function
 
Upvote 0

Forum statistics

Threads
1,214,793
Messages
6,121,617
Members
449,039
Latest member
Mbone Mathonsi

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top