Web Scape
Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: Web Scape
Thanks Thanks: 0 Likes Likes: 0

  1. #1
    Board Regular
    Join Date
    Jan 2009
    Location
    uk bexhill sussex
    Posts
    1,638
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Web Scape

    Hello

    I need to get the data from this webpage

    Meeting Results

    in a specific layout

    if you see there are 12 races on this page.
    so i would like 12 lines of data as setout below. I have given the class names as help

    "track", "grade", "date", "datetime", "essential greyhound" 1-6(in trap order if possible), "first essential fin" 1-6, "sp" 1-6


    I can actually do the above, but my problems come when in some races, perhaps non runners means that maybe only 5 runners, then all of my code goes wrong. I ma not able to account for the numbers going out of sync.

    part of my code below so you can see my approach.

    BTW, dont have to use ie, but i dont know any other way.

    Code:
    Sub IE_getdogs()
        Dim ie As InternetExplorer
        Set ie = New InternetExplorer
            With ie
                .Visible = False
                .Navigate "http://www.gbgb.org.uk/resultsMeeting.aspx?id=14000"
                While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
                Set HTMLdoc = .Document
            End With
            Set dogname = HTMLdoc.getElementsByClassName("essential greyhound")
            Set trap = HTMLdoc.getElementsByClassName("trap")
            Set sp = HTMLdoc.getElementsByClassName("sp")
                For i = 1 To dogname.Length - 1
                    If dogname(i).innerText <> "Greyhound" Then
                        Range("'sheet1'!a" & i) = dogname(i).innerText
                        Range("'sheet1'!G" & i) = trap(i).innerText
                        Range("'sheet1'!M" & i) = sp(i).innerText
                    End If
                Next
                ie.Quit
    End Sub
    Thanks for looking, its doing my head in.

    Dave
    Some one always knows more than me,thats why I am here.

  2. #2
    MrExcel MVP
    Join Date
    Mar 2004
    Location
    Canada
    Posts
    18,796
    Post Thanks / Like
    Mentioned
    32 Post(s)
    Tagged
    5 Thread(s)

    Default Re: Web Scape

    First, instead of using IE, we can use the XML object. You'll see that it works almost instantaneously. Secondly, set a reference to the following object libraries...

    Code:
    Microsoft XML, v6.0 (or whatever version you have)
    
    Microsoft HTML Object Library
    Then, copy the following code into your module, and simply run it...

    Code:
    Option Explicit
    
    Sub GetDogs()
    
        Dim XMLReq As New MSXML2.XMLHTTP60
        Dim HTMLDoc As New HTMLDocument
        Dim HTMLHeaders As MSHTML.IHTMLElementCollection
        Dim HTMLResults As MSHTML.IHTMLElementCollection
        Dim wksDest As Worksheet
        Dim NextRow As Long
        Dim i As Long
        Dim j As Long
        Dim k As Long
    
        XMLReq.Open "GET", "http://www.gbgb.org.uk/resultsMeeting.aspx?id=14000", False
        XMLReq.send
        
        If XMLReq.Status <> 200 Then
            MsgBox "Problem:" & vbNewLine & vbNewLine & XMLReq.Status & " - " & XMLReq.StatusText
            Exit Sub
        End If
        
        HTMLDoc.body.innerHTML = XMLReq.responseText
        Set XMLReq = Nothing
        
        Set wksDest = Worksheets("Sheet1")
        
        wksDest.Columns("D").NumberFormat = "@"
        
        Set HTMLHeaders = HTMLDoc.getElementsByClassName("resultsBlockHeader")
        Set HTMLResults = HTMLDoc.getElementsByClassName("resultsBlock")
        
        NextRow = 1
        For i = 0 To HTMLHeaders.Length - 1
            For j = 0 To HTMLHeaders(i).Children.Length - 1
                wksDest.Cells(NextRow, j + 1).Value = Trim(Split(HTMLHeaders(i).Children(j).innerText, "|")(0))
            Next j
            NextRow = NextRow + 2
            For j = 0 To HTMLResults(i).Children(0).Children.Length - 1
                wksDest.Cells(NextRow, j + 1).Value = HTMLResults(i).Children(0).Children(j).innerText
            Next j
            NextRow = NextRow + 1
            For k = 1 To HTMLResults(i).Children.Length - 1
                If k Mod 3 = 1 Then
                    For j = 0 To HTMLResults(i).Children(k).Children.Length - 1
                        wksDest.Cells(NextRow, j + 1).Value = HTMLResults(i).Children(k).Children(j).innerText
                    Next j
                    NextRow = NextRow + 1
                End If
            Next k
            NextRow = NextRow + 2
        Next i
        
    End Sub
    Hope this helps!
    Last edited by Domenic; May 13th, 2017 at 08:50 PM.

  3. #3
    MrExcel MVP
    Join Date
    Mar 2004
    Location
    Canada
    Posts
    18,796
    Post Thanks / Like
    Mentioned
    32 Post(s)
    Tagged
    5 Thread(s)

    Default Re: Web Scape

    Actually, it looks like not all ID's, such as ID 14002, have prizes listed. Therefore, we need to make the following change in red...

    Code:
    Option Explicit
    
    Sub GetDogs()
    
        Dim XMLReq As New MSXML2.XMLHTTP60
        Dim HTMLDoc As New HTMLDocument
        Dim HTMLHeaders As MSHTML.IHTMLElementCollection
        Dim HTMLResults As MSHTML.IHTMLElementCollection
        Dim wksDest As Worksheet
        Dim NextRow As Long
        Dim i As Long
        Dim j As Long
        Dim k As Long
    
        XMLReq.Open "GET", "http://www.gbgb.org.uk/resultsMeeting.aspx?id=14002", False
        XMLReq.send
        
        If XMLReq.Status <> 200 Then
            MsgBox "Problem:" & vbNewLine & vbNewLine & XMLReq.Status & " - " & XMLReq.StatusText
            Exit Sub
        End If
        
        HTMLDoc.body.innerHTML = XMLReq.responseText
        Set XMLReq = Nothing
        
        Set wksDest = Worksheets("Sheet1")
        
        wksDest.Columns("D").NumberFormat = "@"
        
        Set HTMLHeaders = HTMLDoc.getElementsByClassName("resultsBlockHeader")
        Set HTMLResults = HTMLDoc.getElementsByClassName("resultsBlock")
        
        NextRow = 1
        For i = 0 To HTMLHeaders.Length - 1
            For j = 0 To HTMLHeaders(i).Children.Length - 1
                wksDest.Cells(NextRow, j + 1).Value = Trim(Split(HTMLHeaders(i).Children(j).innerText & "|", "|")(0))
            Next j
            NextRow = NextRow + 2
            For j = 0 To HTMLResults(i).Children(0).Children.Length - 1
                wksDest.Cells(NextRow, j + 1).Value = HTMLResults(i).Children(0).Children(j).innerText
            Next j
            NextRow = NextRow + 1
            For k = 1 To HTMLResults(i).Children.Length - 1
                If k Mod 3 = 1 Then
                    For j = 0 To HTMLResults(i).Children(k).Children.Length - 1
                        wksDest.Cells(NextRow, j + 1).Value = HTMLResults(i).Children(k).Children(j).innerText
                    Next j
                    NextRow = NextRow + 1
                End If
            Next k
            NextRow = NextRow + 2
        Next i
        
    End Sub

  4. #4
    Board Regular
    Join Date
    Jan 2009
    Location
    uk bexhill sussex
    Posts
    1,638
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Web Scape

    HI Domenic

    You absolute genius, that is so fast i can harldy believe it, i think i must learn more about "MSXML2.XMLHTTP60"
    I bet its more reliable within a loop.

    The outout looks great, but i want to get it out onto the page in a different configuration.
    if youy look at this link, you will see it has a non runner in race 2 and 3, Meeting Results
    I want to do more with the data once it is output and when runners are missing, the numbers go out of sync, sort of out of blocks, you have it blocks of 11, but race 2 is a block of 10.

    Ulitmatly, i would like to get the data out like this, i made a small code to demonstrate the layout.

    My question is, as i have allot of data to get out, should i try to adjust the output code you supplied to obtain the layout, or build further codes to convert output into a new sheet output as i wish.

    is it even possible to get my output directly, i cant wait to hear your feedback.

    thanks again dominic.

    Dave

    Code:
    Sub output_layout_example()
    
    Range("a1") = "sheffield"
    Range("b1") = "13/05/2017"
    Range("c1") = "17:25"
    Range("d1") = "A4"
    
    Range("e1") = "Unique Nero"
    Range("f1") = "Valiant Striker"
    Range("g1") = "Breakthenews"
    Range("h1") = "Glenbuck Rose"
    Range("i1") = "Geelo Racer"
    Range("j1") = "Yahoo Angela"
    
    Range("k1") = "2"
    Range("l1") = "1"
    Range("m1") = "3"
    Range("n1") = "6"
    Range("o1") = "5"
    Range("p1") = "2"
    
    Range("q1") = "4/1 , text not date"
    Range("r1") = "3/1, text not date"
    Range("s1") = "6/1, text not date"
    Range("t1") = "4/1, text not date"
    Range("u1") = "2/1F, text not date"
    Range("v1") = "3/1, text not date"
    
    ' when a runner is missing, the data would be blank for that trap number
    
    '"track", "grade", "date", "datetime", "essential greyhound" 1-6(in trap order if possible), "first essential fin" 1-6, "sp" 1-6
    
    End Sub
    Some one always knows more than me,thats why I am here.

  5. #5
    MrExcel MVP
    Join Date
    Mar 2004
    Location
    Canada
    Posts
    18,796
    Post Thanks / Like
    Mentioned
    32 Post(s)
    Tagged
    5 Thread(s)

    Default Re: Web Scape

    Hi Dave,

    Yeah, as you can see, XML is pretty fast. There are times, though, where you'll need to use IE instead. I believe you'll need to use it when a website contains JavaScript or some other dynamic content.

    With regards to your desired output, just to be sure, how would the order take place for the second race?

  6. #6
    Board Regular
    Join Date
    Jan 2009
    Location
    uk bexhill sussex
    Posts
    1,638
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Web Scape

    Hi Domenic

    ok, glad you asked, i had 1 error within what i sent you before.
    I have rebuild the code to show the output for both race 1 and 2.

    Basically, i wish to account for the issues when there is not 6 runners.
    and then
    transpose the data i suppose. at least thats what i would be trying to do if adjusting it after it was downloaded.

    Thanks for looking at the domenic, although i really wonder if it is possible?

    Still cannot believe how fast the the data arrives, not to mention how clean it is to run.

    Code:
    Sub output_layout_example()
    
    ''''''''''race 1
    Range("a1") = "sheffield"
    Range("b1") = "13/05/2017"
    Range("c1") = "19:25"
    Range("d1") = "A4"
    
    Range("e1") = "Unique Nero"
    Range("f1") = "Valiant Striker"
    Range("g1") = "Breakthenews"
    Range("h1") = "Glenbuck Rose"
    Range("i1") = "Geelo Racer"
    Range("j1") = "Yahoo Angela"
    
    Range("k1") = "2"
    Range("l1") = "1"
    Range("m1") = "3"
    Range("n1") = "6"
    Range("o1") = "5"
    Range("p1") = "4"
    
    Range("q1") = "4/1 , text not date"
    Range("r1") = "3/1, text not date"
    Range("s1") = "6/1, text not date"
    Range("t1") = "4/1, text not date"
    Range("u1") = "2/1F, text not date"
    Range("v1") = "3/1, text not date"
    
    '''''''''''''''''race 2
    Range("a2") = "sheffield"
    Range("b2") = "13/05/2017"
    Range("c2") = "19:40"
    Range("d2") = "D2"
    
    Range("e2") = "Dash Away Wink"
    Range("f2") = "Night Of Thunder"
    Range("g2") = "Lightfoot Niamh"
    Range("h2") = "Its Jack"
    Range("i2") = ""
    Range("j2") = "Swift Darius"
    
    Range("k2") = "2"
    Range("l2") = "5"
    Range("m2") = "4"
    Range("n2") = "1"
    Range("o2") = ""
    Range("p2") = "3"
    
    Range("q2") = "3/1 , text not date"
    Range("r2") = "4/1, text not date"
    Range("s2") = "9/2, text not date"
    Range("t2") = "9/4, text not date"
    Range("u2") = ""
    Range("v2") = "7/4F, text not date"
    
    ' when a runner is missing, the data would be blank for that trap number
    
    '"track", "grade", "date", "datetime", "essential greyhound" 1-6(in trap order if possible), "first essential fin" 1-6, "sp" 1-6
    
    End Sub
    Last edited by SQUIDD; May 14th, 2017 at 12:48 PM.
    Some one always knows more than me,thats why I am here.

  7. #7
    MrExcel MVP
    Join Date
    Mar 2004
    Location
    Canada
    Posts
    18,796
    Post Thanks / Like
    Mentioned
    32 Post(s)
    Tagged
    5 Thread(s)

    Default Re: Web Scape

    I have rebuild the code to show the output for both race 1 and 2.
    Ah yes, now that makes sense.

    Thanks for looking at the domenic, although i really wonder if it is possible?
    Yes it is, I'll have a go at it when I get a chance.

    Still cannot believe how fast the the data arrives, not to mention how clean it is to run.
    Last edited by Domenic; May 14th, 2017 at 02:09 PM.

  8. #8
    Board Regular
    Join Date
    Jan 2009
    Location
    uk bexhill sussex
    Posts
    1,638
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Web Scape

    Thankyou so much domanic.

    The logic behind it blows my mind.

    Dave
    Some one always knows more than me,thats why I am here.

  9. #9
    Board Regular
    Join Date
    Jan 2009
    Location
    uk bexhill sussex
    Posts
    1,638
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Web Scape

    Hi Domenic

    I have build some code, if you have 2 sheets, "sheet2" and "sheet3"
    run this code, it actually gives my desired output, although im sure i have made this way more complicated than it needs to be.

    run the "RUN_ME" Macro

    I thought this code would serve as possibly 4 purposes.

    1, you can see the desired output
    2, in case it was too difficult to do, this is option B
    3, maybe seeing this code, someone might be able to write it better
    4, Because i have done this, i totally understand if you dont like the task of doing it directly, having done this now, i realise just how complex it is, well to me anyway.

    Thanks as always, dave.

    Code:
    Sub RUN_ME()
    starttime1 = Now
    get_data
    Range("'SHEET2'!C26") = Now - starttime1
    Range("'SHEET2'!b26") = "time to download"
    starttime2 = Now
    fix_data
    Range("'SHEET2'!C27") = Now - starttime2
    Range("'SHEET2'!b27") = "time to adjust data"
    End Sub
    Sub get_data()
    Application.ScreenUpdating = False
    Application.DisplayAlerts = False
    Sheets("SHEET3").Select
    Range("'sheet3'!A:H").ClearContents
    Range("'sheet2'!A1:Z25").ClearContents
        Dim XMLReq As New MSXML2.XMLHTTP60
        Dim HTMLDoc As New HTMLDocument
        Dim HTMLHeaders As MSHTML.IHTMLElementCollection
        Dim HTMLResults As MSHTML.IHTMLElementCollection
        Dim wksDest As Worksheet
        Dim NextRow As Long
        Dim i As Long
        Dim j As Long
        Dim k As Long
    
        XMLReq.Open "GET", "http://www.gbgb.org.uk/resultsMeeting.aspx?id=14000", False '147942
        XMLReq.send
        
        If XMLReq.Status <> 200 Then
            MsgBox "Problem:" & vbNewLine & vbNewLine & XMLReq.Status & " - " & XMLReq.StatusText
            Exit Sub
        End If
        
        HTMLDoc.body.innerHTML = XMLReq.responseText
        Set XMLReq = Nothing
        
        Set wksDest = Worksheets("Sheet3")
        
        wksDest.Columns("C").NumberFormat = "General"
        wksDest.Columns("D").NumberFormat = "@"
        
        Set HTMLHeaders = HTMLDoc.getElementsByClassName("resultsBlockHeader")
        Set HTMLResults = HTMLDoc.getElementsByClassName("resultsBlock")
        
        NextRow = 1
        For i = 0 To HTMLHeaders.Length - 1
            For j = 0 To HTMLHeaders(i).Children.Length - 1
                wksDest.Cells(NextRow, j + 1).Value = Trim(Split(HTMLHeaders(i).Children(j).innerText, "|")(0))
            Next j
            NextRow = NextRow + 2
            For j = 0 To HTMLResults(i).Children(0).Children.Length - 1
                wksDest.Cells(NextRow, j + 1).Value = HTMLResults(i).Children(0).Children(j).innerText
            Next j
            NextRow = NextRow + 1
            For k = 1 To HTMLResults(i).Children.Length - 1
                If k Mod 3 = 1 Then
                    For j = 0 To HTMLResults(i).Children(k).Children.Length - 1
                        wksDest.Cells(NextRow, j + 1).Value = HTMLResults(i).Children(k).Children(j).innerText
                    Next j
                    NextRow = NextRow + 1
                End If
            Next k
            NextRow = NextRow + 2
        Next i
    End Sub
    Sub fix_data()
    
    lr = Range("A" & Rows.Count).End(xlUp).Row
        For a = lr To 1 Step -1
            If Range("B" & a) = "" Then Range("a" & a).EntireRow.Delete
        Next
        
        track = Range("A1")
        RACECOUNT = Application.WorksheetFunction.CountIf(Range("A:A"), track)
        p = 9
            For s = 1 To RACECOUNT
                For q = 1 To 6
                    If Range("A" & p) <> track And Range("A" & p) = "Fin" Then Rows(p - 1).Insert
                        For vv = 1 To 6
                            If Range("A" & p) <> track And Range("A" & p) = vv Then Rows(p - vv - 1).Insert
                        Next vv
                Next q
                p = p + 8
            Next s
            
        w = 2
        For u = 1 To RACECOUNT
            Set rng = Range("A" & w & ":D" & w + 6)
            Set coll = Range("c" & w)
            rng.sort Key1:=coll, Order1:=xlAscending, Header:=xlYes
            w = w + 8
        Next u
        
    q = 3
    qq = 7
        For JJ = 1 To RACECOUNT
            For j = 1 To 5
                If Range("C" & q) <> j Then Range("A" & q & ":d" & qq).Cut Destination:=Range("A" & q + 1)
                q = q + 1
            Next j
        q = q + 3
        qq = qq + 8
        Next JJ
        
        c = 1
        cC = 3
        For e = 1 To RACECOUNT
            Range("'SHEET3'!A" & cC - 2 & ":D" & cC - 2).Copy Destination:=Range("'SHEET2'!A" & c)
            Range("'SHEET3'!B" & cC & ":B" & cC + 5).Copy
            Range("'SHEET2'!E" & c).PasteSpecial Paste:=xlPasteAll, Operation:=xlNone, SkipBlanks:=False, TRANSPOSE:=True
            Range("'SHEET3'!A" & cC & ":A" & cC + 5).Copy
            Range("'SHEET2'!K" & c).PasteSpecial Paste:=xlPasteAll, Operation:=xlNone, SkipBlanks:=False, TRANSPOSE:=True
            Range("'SHEET3'!D" & cC & ":D" & cC + 5).Copy
            Range("'SHEET2'!Q" & c).PasteSpecial Paste:=xlPasteAll, Operation:=xlNone, SkipBlanks:=False, TRANSPOSE:=True
        c = c + 1
        cC = cC + 8
        Next e
    Application.CutCopyMode = False
    Sheets("SHEET2").Select
    Application.DisplayAlerts = True
    Application.ScreenUpdating = True
        
    End Sub
    Some one always knows more than me,thats why I am here.

  10. #10
    MrExcel MVP
    Join Date
    Mar 2004
    Location
    Canada
    Posts
    18,796
    Post Thanks / Like
    Mentioned
    32 Post(s)
    Tagged
    5 Thread(s)

    Default Re: Web Scape

    I haven't had a chance to look at your latest code, but here's my version. It uses the Dictionary object to maintain a list of traps, along with their corresponding finishes. First, make sure you set the following references...

    Code:
    1) Microsoft XML, v6.0
    
    2) Microsoft HTML Object Library
    
    3) Microsoft Scripting Runtime
    Then try...

    Code:
    Option Explicit
    
    Sub GetDogs()
    
        Dim XMLReq As New MSXML2.XMLHTTP60
        Dim HTMLDoc As New HTMLDocument
        Dim HTMLHeaders As MSHTML.IHTMLElementCollection
        Dim HTMLResults As MSHTML.IHTMLElementCollection
        Dim HTMLResult As MSHTML.IHTMLElement
        Dim dicTraps As New Scripting.Dictionary
        Dim wksDest As Worksheet
        Dim NextRow As Long
        Dim i As Long
        Dim j As Long
        Dim k As Long
        Dim l As Long
    
        XMLReq.Open "GET", "http://www.gbgb.org.uk/resultsMeeting.aspx?id=147942", False
        XMLReq.send
        
        If XMLReq.Status <> 200 Then
            MsgBox "Problem:" & vbNewLine & vbNewLine & XMLReq.Status & " - " & XMLReq.StatusText
            Exit Sub
        End If
        
        HTMLDoc.body.innerHTML = XMLReq.responseText
        Set XMLReq = Nothing
        
        Set wksDest = Worksheets("Sheet1")
        
        wksDest.Cells.ClearContents
        
        wksDest.Columns("Q:V").NumberFormat = "@"
        
        Set HTMLHeaders = HTMLDoc.getElementsByClassName("resultsBlockHeader")
        Set HTMLResults = HTMLDoc.getElementsByClassName("resultsBlock")
        
        NextRow = 1
        For i = 0 To HTMLHeaders.Length - 1
            For j = 0 To 3
                wksDest.Cells(NextRow, j + 1).Value = Trim(Split(HTMLHeaders(i).Children(j).innerText & "|", "|")(0))
            Next j
            For k = 1 To HTMLResults(i).Children.Length - 1 Step 3
                dicTraps.Add Key:=HTMLResults(i).Children(k).Children(2).innerText, Item:=HTMLResults(i).Children(k).Children(0).innerText
            Next k
            On Error Resume Next
            For l = 1 To 6
                Set HTMLResult = HTMLResults(i).Children(dicTraps.Item(CStr(l)) * 3 - 3 + 1)
                If Not HTMLResult Is Nothing Then
                    wksDest.Cells(NextRow, 5 + l - 1).Value = HTMLResult.Children(1).innerText
                    wksDest.Cells(NextRow, 11 + l - 1).Value = HTMLResult.Children(0).innerText
                    wksDest.Cells(NextRow, 17 + l - 1).Value = HTMLResult.Children(3).innerText
                    Set HTMLResult = Nothing
                End If
            Next l
            On Error GoTo 0
            Set dicTraps = Nothing
            NextRow = NextRow + 1
        Next i
        
    End Sub
    Last edited by Domenic; May 14th, 2017 at 07:02 PM.

Some videos you may like

User Tag List

Like this thread? Share it with others

Like this thread? Share it with others

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •