Racing Post: Web Scraping

white_lightning

New Member
Joined
Jun 7, 2011
Messages
22
Hi Guys

Seems a few other people are interested in this subject so I would like to keep it going to try and bounce a few ideas around.

@Norie gave me alot of help to get me started on another thread and his code helped me retrieve some data from the main page of the racing post.

Was wondering if anyone could elaberate on using the IE object data model and give some examples.

Also whats the diff between innertext and outertext?

Im now looking to get more data, i.e horse form, age, weight etc

Any help appreciated.
 

Excel Facts

Return population for a City
If you have a list of cities in A2:A100, use Data, Geography. Then =A2.Population and copy down.
Well a good place to look for information for the DOM (Document Object Model) is here http://msdn.microsoft.com/en-us/library/ms533050(VS.85).aspx.

That's really for reference though and you probably won't find much information on how to do particular things.

I find it very useful.

The difference between outer and inner text is the latter gets or sets the text of an object and the former gets the text between the tags.

In reality I don't think it makes a lot of difference which you use, though I'm sure it will in some cases.

If it does make a difference just change it.

Anyway, here's the code I came up with do get the runners for a race:
Rich (BB code):
Option Explicit
 
Sub GetHorses()
Dim IE As Object
Dim doc As Object
Dim divRunners As Object
Dim divHorse As Object
Dim divCard As Object
Dim tblRaceHeader As Object
Dim elmt As Object
Dim lnk As Object
Dim strURL As String
Dim ws As Worksheet
Dim rng As Range
 
    strURL = "http://www.racingpost.com/horses2/cards/card.sd?race_id=533485&r_date=2011-06-20"
 
    Set IE = CreateObject("InternetExplorer.Application")
    With IE
 
        .navigate strURL
 
        Do While .Busy: DoEvents: Loop
        Do While .ReadyState <> 4: DoEvents: Loop
 
        .Visible = True
 
        Set ws = Worksheets.Add
 
        Set rng = ws.Range("A1")
 
        Set doc = IE.Document
 
        ' this small table has the stat headers
 
        Set tblRaceHeader = doc.getelementbyid("lc_cardHead")
 
        GetTableData tblRaceHeader, rng 
 
        ws.Range("B1").Insert xlShiftToRight
 
        ws.Range("B1") = "FORM"
 
        Set rng = rng.Offset(1)
 
        Set divRunners = doc.getelementbyid("lc_sortBlock")
 
        Set divRunners = divRunners.getelementsbytagname("DIV")
 
        For Each divHorse In divRunners
 
            If divHorse.classname = "cardItem" Then    
 
                For Each elmt In divHorse.all
 
                    Select Case elmt.tagname
 
                        Case "TABLE"
 
                            Select Case elmt.classname
                                Case "cardGl" ' this table has the runners and stats
                                    GetTableData elmt, rng
 
                            End Select
 
                        Case "P"
 
                    End Select
                Next elmt
            End If
        Next divHorse
 
        IE.Quit
 
        Set IE = Nothing
 
    End With
 
    ws.Cells.WrapText = False
 
    ws.Range("A1:J1").EntireColumn.AutoFit
 
    Application.Goto Worksheets(1).Range("A1"), Scroll:=True
 
End Sub

It needs this slightly altered code to get the data.

Rich (BB code):
Sub GetTableData(ByRef tbl, rng As Range, Optional HasHeader = False)
Dim cl As Object
Dim rw As Object
Dim hdr As Object
Dim I As Long
 
    For Each rw In tbl.Rows
 
        For Each cl In rw.Cells
 
            rng.NumberFormat = "@" ' format destination cell as text so form and weight
                                              ' not interpreted as dates
 
            rng.Value = cl.outertext
 
            Set rng = rng.Offset(, 1)
 
        Next cl
 
        ' if the row in the table has a header add it after the rest of the data
        ' this is to get the URLS for the cards of each individual race
 
        If HasHeader Then
 
            Set hdr = rw.getelementsbytagname("TH")(0)
 
            If hdr.all.Length > 0 Then
                Set rng = rng.Offset(, 1)
                rng.Value = hdr.all(0).href
            End If
 
        End If
 
        Set rng = Cells(rng.Row + 1, 1)
 
    Next rw
 
End Sub

The header change is really only specific to main page and is for getting the links.

The other change can probably be applied with any table to prevent Excel/VBA misinterpreting data.

At the moment I've hard-coded the URL for a particular race (I think it's the 6:30 at Chepstow) and this is the result:

<TABLE style="WIDTH: 505pt; BORDER-COLLAPSE: collapse" border=0 cellSpacing=0 cellPadding=0 width=674><COLGROUP><COL style="WIDTH: 57pt; mso-width-source: userset; mso-width-alt: 2779" width=76><COL style="WIDTH: 38pt; mso-width-source: userset; mso-width-alt: 1828" width=50><COL style="WIDTH: 101pt; mso-width-source: userset; mso-width-alt: 4937" width=135><COL style="WIDTH: 26pt; mso-width-source: userset; mso-width-alt: 1280" width=35><COL style="WIDTH: 29pt; mso-width-source: userset; mso-width-alt: 1426" width=39><COL style="WIDTH: 109pt; mso-width-source: userset; mso-width-alt: 5302" width=145><COL style="WIDTH: 83pt; mso-width-source: userset; mso-width-alt: 4059" width=111><COL style="WIDTH: 21pt; mso-width-source: userset; mso-width-alt: 1024" width=28><COL style="WIDTH: 18pt; mso-width-source: userset; mso-width-alt: 877" width=24><COL style="WIDTH: 23pt; mso-width-source: userset; mso-width-alt: 1133" width=31><TBODY><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; WIDTH: 57pt; HEIGHT: 15pt; BORDER-TOP: windowtext 0.5pt solid; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 height=20 width=76>NO. DRAW </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; WIDTH: 38pt; BORDER-TOP: windowtext 0.5pt solid; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 width=50>FORM</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; WIDTH: 101pt; BORDER-TOP: windowtext 0.5pt solid; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 width=135>HORSE </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; WIDTH: 26pt; BORDER-TOP: windowtext 0.5pt solid; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 width=35>AGE </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; WIDTH: 29pt; BORDER-TOP: windowtext 0.5pt solid; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 width=39>WGT </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; WIDTH: 109pt; BORDER-TOP: windowtext 0.5pt solid; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 width=145>TRAINER RTF% </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; WIDTH: 83pt; BORDER-TOP: windowtext 0.5pt solid; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 width=111>JOCKEY </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; WIDTH: 21pt; BORDER-TOP: windowtext 0.5pt solid; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 width=28>OR </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; WIDTH: 18pt; BORDER-TOP: windowtext 0.5pt solid; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 width=24>TS </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; WIDTH: 23pt; BORDER-TOP: windowtext 0.5pt solid; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 width=31>RPR</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl64 height=20></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl64></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl64></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl64></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl64></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl64></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl64></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl64></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl64></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl64></TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 height=20>11 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Ballina Blue </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>3</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>8-12</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>John Gallagher43 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Tadhg O'Shea </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63></TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 height=20>22 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>50 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Band Of Thunder21 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>3</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>8-12</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Andrew Balding41 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>David Probert </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>29 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>62</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 height=20>34 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>733-64 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Captain Sharpe13J p </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>3</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>8-12</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Bernard Llewellyn100 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Tom McLaughlin </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>65 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>79 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>71</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 height=20>43 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>6 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Cluain Dara10 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>3</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>8-7</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Mark Johnston33 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Royston Ffrench </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>40 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>81</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 height=20>55 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>0-242 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Galivant21 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>3</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>8-7</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>J W Hills75 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Martin Dwyer </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>84 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>83 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>103</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63 height=20>66 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>49-038 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Miskin Diamond9 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>3</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>8-7</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Bryn Palling36 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>Neil Chalmers </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>46 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>45 </TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl63>74</TD></TR></TBODY></TABLE>

Note, I realise the race name isn't there but that can probably be dealt with.

I've also just noticed that there are numbers after the horse and trainer names.

On the page those are superscript, eg Band Of Thunder<SUP>21</SUP>.

I think I know how to deal with that too but perhaps not right now.:)
 
Upvote 0
@Norie

You really are a genius!

That code works a treat, thanks you have helped me alot.

The formating of the little numbers I will work on but what you have produced is great.

I have read lots of posts on this subject and various different ways to achieve this data retreival, this including, web querys, lots of parsing, use of instr() and mid() and RegEx however the way you have shown me is by far the best and efficient and its pretty straight forward when you sit back and analyse the code.

Do you use and software to help identify the data/blocks of code you require or do you just go thru the html manually and hunt for the tables etc?

Il prob have a few more questions in the morning but thanks again.
 
Upvote 0
I've done some code for the superscripts, but it isn't ideal and I can't test it anymore tonight - the cards have just changed.

Would it actually be needed?

One reason I ask is the more you do things that deal with specifics the less 'portable' the code becomes.

Also you could end up with a whole load of redundant code that will do handle all these little details but really isn't needed to get
the 'raw' data.
 
Upvote 0
@Norie

The Superscript numbers are not needed on the draw, trainer % or the jockey weight but are needed after the horses name, the numbers relate to the days since last run and could hopefully be dispalyed in a seperate column.

Another problem that has now popped up and this data is a must and could be a pain to get is on some lines after the horses name there are small images, see link below

http://www.racingpost.com/horses2/cards/card.sd?race_id=533647&r_date=2011-06-25

now these images are contained within a span element and there are 4 different images that equate to three things. i.e BF=beaten fav, CD=course and distance winner, C=course winner, D=won at this distance before.

Just to complicated things, sometimes it can show a seperate C and D image when in reallity it means the same as the CD image. So in effect there are only three possible outcomes from the 4 images and I would hopefully like them retrieved and put into 3 seperate columns, BF, Course, Distance with a 1 in each column if there is an image if that makes sense.


The code for the images come right after the superscript numbers after the horses name that I mentioned above and the code for the images looks like (removed<>):

HTML:
span title="Course Distance" 
img src="http://ui.racingpost.com/ico/distance-c.gif" class="cdbf" alt=""
img src="http://ui.racingpost.com/ico/distance-d.gif" class="cdbf" alt=""
img src="http://ui.racingpost.com/ico/distance-bf.gif" class="cdbf" alt=""
/span
How hard would this be? Could a case statement just be inbed inside the loop that retrieves the horses name?

Looking for help please.
 
Upvote 0
Why a case statement?

What decisions/comparisons need to be made to get the data?

Would just finding out what images are in this span be enough to go on?
 
Upvote 0
What decisions/comparisons need to be made to get the data?

Would just finding out what images are in this span be enough to go on?

@Norie

This is where i get slightly lost as to the object model element stuff.

We can tell by the html code what the image is as each one is named differently.

I was thinking something along the lines:

After we retreive the horses name and days since last run,

For every span element -

If img scr="r...post.com/ico/distance-d.gif" then cell e(row) = "1"
If img scr="r...post.com/ico/distance-c.gif" then cell f(row) = "1"
If img scr="r...post.com/ico/distance-bf.gif" then cell g(row) = "1"
If img scr="r...post.com/ico/distance-cd.gif" then cell e(row) = "1" & cell f(row) = "1"


Next

would this not accumplish something that im looking for or is there a far easier way to do what i want?
 
Upvote 0
Wouldn't an output of DC, BF or whatever be enough?

You could then do whatever you need to do later once you've got the raw data.

Incorporating something that will add extra columns wouldn't be straightforward.
 
Upvote 0
Extracting the raw data is the main focus and yes as long as i got the info i could format it later but was just wondering if it would be more efficient to do it this way or something similar inside the loop code that was already written to same having to do extra code/formating later on.

Would i be looking to treat the span elements the same as say a div element to extract the image data required?

Would it not be easy enough to just add additional columns at the end of the data that was already there, i.e. one to the right of the RPR data.

What did you manage to work out for the superscript along side the horses name? This is the days since last run and ideally this would be in a seperate column, is this going to be a big problem?

Could i use instr() or RegEx to seperate the data we have already retreived?
 
Upvote 0
Do you actually want the image(s)?

Wouldn't the letters, perhaps separate by a column be enough to work with?

As for the superscript thing, yes it could be extracted into a separate column as could the letters, whatever they are/mean.

I just don't think it's a good idea to keep on altering code that works in the most part just for one specific webpage.

What I would prefer to do is have all the data in one cell, perhaps delimited.

eg Classic Punch - 14 - CD or Beaumont's Party - 22 - C, D, F

Does that make sense?
 
Upvote 0

Forum statistics

Threads
1,224,522
Messages
6,179,292
Members
452,902
Latest member
Knuddeluff

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top