Scraping website data into excel & Access via a txt file problem

fragle

New Member
Joined
Dec 4, 2008
Messages
15
Hi i was wondering if somebody would be kind enough to help me as i dont have any experience with VBA and i am having a small problem with it.
What the code should do is go to a sports website and download the results into excel and then break the title into fields and put that data into a race details txt doc and the rest of the data into a results txt doc so that i can then transfer that into a database, but the problem i am having is that not all the fields are being entered into the race details txt doc, the racecourse and race times are missing and i dont know how to go about fixing this problem as i have spent the last few weeks trying to sort it myself but i cannot.
I am using excel 2007 with vista home premium

'********************************************************
'Select Tools - References
'and add a reference to the Microsoft HTML Object Library
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)

'********************************************************
Public Sub SLResults()

Application.ScreenUpdating = False

'Enter RaceIDs in sheet "RaceIDS" column A
'Create a range name for RaceIDS
Sheets("RaceIDS").Select
ActiveWorkbook.Names.Add Name:="RIDS", RefersToR1C1:="" _
& "=offset(RaceIDS!R1C1,0,0,counta(RaceIDS!C1),3)"


'Make "Results" the active sheet
Sheets("Results").Select

Dim ie As Object, Table As Object
Dim tblRow As Object, tblCell As Object
Dim myrow As Integer

Dim startURL As String
Dim endURL As String
Dim raceid As String
Dim strURL As String


myrow = 1
startURL = "http://horses.sportinglife.com/Full_Results/0,12493,"
endURL
= ",00.html"

Set ie = CreateObject("InternetExplorer.Application")


'Read raceids into array
Dim arrRaceIDs As Variant
arrRaceIDs = Range("RIDS").Value

'loop through array to get each result
For i = 1 To UBound(arrRaceIDs)
raceid = arrRaceIDs(i, 1)
racetime = Format(arrRaceIDs(i, 2), "HH:mm")
racecourse = arrRaceIDs(i, 3)

If raceid = "" Then MsgBox "No RaceID": Exit Sub

strURL = startURL & raceid & endURL

ie.Navigate strURL

'Wait in case page hasn't loaded properly
Sleep (500)

Do While ie.busy: DoEvents: Loop
Do While ie.ReadyState <> 4: DoEvents: Loop

Set mydoc = ie.document

'Get Race Information
For Each Tag In mydoc.all
If InStr(Tag.innerHTML, "race_title_hdr") > 0 Then
race = Tag.innertext
End If

'Get Winning Time
If InStr(Tag.innerHTML, "race_wintime_detail") > 0 Then
wintime = Trim(Replace(Tag.innertext, "Winning Time: ", ""))
End If
Next


'Use the carriage return/Line Feed to split the string into an array
Dim arrRaceDetails As Variant
arrRaceDetails = Split(race, vbCrLf)

'Loop through each item in the array
'note arrays generally start at 0 not 1
For z = LBound(arrRaceDetails) To UBound(arrRaceDetails)

'Get RaceDate
racedate = arrRaceDetails(0)

'Get separate date elements
myweekday = Split(racedate, " ")(0)

'Remove weekday to get a date string
mymonth = Replace(racedate, myweekday, "")

'Convert string to date then to month number
mymonth = Month(CDate(mymonth))

myday = Split(racedate, " ")(2)
myyear = Split(racedate, " ")(3)
'convert to date
racedate = DateSerial(myyear, mymonth, myday)


racename = arrRaceDetails(2) ' this is the 3rd element in the array
raceconds = arrRaceDetails(3) 'this is the 4th element in the array

'split raceconds into various fields
'e.g. Split(raceconds, ",")(0), " means return all text before the first ","
'replace certain text strings with "" e.g. Class 6 becomes 6
racevalue = Replace(Split(raceconds, ",")(0), " added", "")
raceages = Trim(Split(raceconds, ",")(1))
racedist = Trim(Split(raceconds, ",")(2))
raceclass = Replace(Trim(Split(raceconds, ",")(3)), "Class ", "")
racerunners = Replace(Trim(Split(raceconds, ",")(5)), " ran", "")
racegoing = Replace(arrRaceDetails(4), "Going: ", "")

Next

''Remove carriage return / line feed characters then write to cell
Cells(myrow, 1) = Replace(Replace(race, Chr(13), ""), Chr(10), "")
Cells(myrow, 1).Font.Bold = True

'Write RaceDetails to text file
Open "C:\USERS\\DOCUMENTS\SL_RACE_DETAILS.txt" For Append As #1

Print #1, raceid & ";" & racetime & ";" & racecourse & ";" _
& racename & ";" & racevalue & ";" & raceages & ";" _
& racedist & ";" & raceclass & ";" _
& racerunners & ";" & racegoing & ";" & wintime & ";" & racedate

Close #1

'Get Race Result from table 6 of the web page
Set Table = ie.document.all.tags("table").Item(6)

myrow = myrow + 1
mystr = ""

For Each tblRow In Table.Rows

mycount = mycount + 1

For Each tblCell In tblRow.Cells

'use delimiter to separate fields - semi colon preferred
mydelim = ";"

Select Case Trim(tblCell.innertext)

Case "NR"
mystr = mystr & mydelim & tblCell.innertext
mycount = 0

Case "Pos."
mycount = 0
mystr = mystr & mydelim & tblCell.innertext

Case "Non-Runners"
mystr = ""

Case Else
mystr = mystr & mydelim & tblCell.innertext

End Select

Next tblCell


'Write result line (result and comments in running joined together)
If mycount = 0 Or mycount = 2 Then

Dim writeFile As Boolean 'True or False
writeFile = False

Select Case mycount
Case 0
'do not write to file i.e. writefile=False unless NR row
If Left(mystr, 3) = ";NR" Then
writeFile = True
Else
writeFile = False
End If

Case 2
writeFile = True

End Select

Cells(myrow, 1) = Mid(mystr, 2)

'Write RaceDetails to text file
If writeFile = True And Len(mystr) > 1 Then
Open "C:\USERS\ \DOCUMENTS\SL_RACE_RESULTS.txt" For Append As #1

Print #1, raceid, mystr

Close #1
End If


'reset values
mystr = ""
myrow = myrow + 1
mycount = 0

End If


Next tblRow

'insert 2 blank rows between race results
myrow = myrow + 2
raceid = ""
Next i 'next raceid

ie.Quit: Set ie = Nothing

'Split text to columns using delimiter (;)
'treat each delimiter separately
Columns("A:A").Select
Selection.TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=True, _
Semicolon:=True, Comma:=False, Space:=False, Other:=False, FieldInfo _
:=Array(Array(1, 1), Array(2, 1), Array(3, 1), Array(4, 1), Array(5, 2), Array(6, 1), _
Array(7, 1), Array(8, 2), Array(9, 2), Array(10, 1), Array(11, 1)), _
TrailingMinusNumbers:=True

Range("D:D,F:F,G:G,K:K").Select
Range("K1").Activate
Selection.Columns.AutoFit
Columns("A:C").Select

Range("C1").Activate
With Selection
.HorizontalAlignment = xlLeft
End With

Columns("J:J").Select
Selection.Delete Shift:=xlToLeft
Range("A1").Select

Application.ScreenUpdating = True

End Sub

 

Excel Facts

What is =ROMAN(40) in Excel?
The Roman numeral for 40 is XL. Bill "MrExcel" Jelen's 40th book was called MrExcel XL.
I have also been thinking if this code could be improved so that it placed results directly into access rather than going via excel and txt docs
 
Upvote 0
Could somebody please help me with this or explain what is wrong with the code, i have been trying to solve the problem myself but not having any knowledge of VBA i am just stabing in the dark.
This is the result i am getting at the moment.
312846;;;Hartfield Maiden Stakes (aw);£4000;3yo only;1m;5;12;Good to Firm (Turf Course;All Weather;Standard);1m 37.42s;06/06/2009
312847;;;E.b.f. Maiden Fillies' Stakes;£5500;2yo only;6f;5;5;Good to Firm (Turf Course;All Weather;Standard);1m 10.87s;06/06/2009
312848;;;Betdaq The Betting Exchange Fillies' Handicap;£4500;3yo plus;5f;5;8;Good to Firm (Turf Course;All Weather;Standard);0m 57.2s;06/06/2009

These are the results i have been able to achieve by playing about with the code.
312846;;; Lingfield;18:50;£4000;3yo only;1m;5;12;£4000 added, 3yo only, 1m, Class 5, £2730 penalty, 12 ran ;1m 37.42s;
312847;;; Lingfield;19:20;£5500;2yo only;6f;5;5;£5500 added, 2yo only, 6f, Class 5, £3562 penalty, 5 ran ;1m 10.87s;
312848;;; Lingfield;19:50;£4500;3yo plus;5f;5;8;£4500 added, 3yo plus, 5f, Class 5, £3071 penalty, 8 ran ;0m 57.2s

As you can see if i retrieve the course and time then i lose the race title

but this is how i need it to be.

312846;;; Lingfield;18:50;Hartfield Maiden Stakes;£4000;3yo only;1m;5;12;£4000 added, 3yo only, 1m, Class 5, £2730 penalty, 12 ran ;1m 37.42s;
312847;;; Lingfield;19:20;E.b.f. Maiden Fillies' Stakes;£5500;2yo only;6f;5;5;£5500 added, 2yo only, 6f, Class 5, £3562 penalty, 5 ran ;1m 10.87s;
312848;;; Lingfield;19:50;Betdaq The Betting Exchange Fillies' Handicap;£4500;3yo plus;5f;5;8;£4500 added, 3yo plus, 5f, Class 5, £3071 penalty, 8 ran ;0m 57.2s
 
Upvote 0
Hi
yeh so do i but dont hold much hope out on here though as they dont seem very helpfull with anything like this on here so i will probably look somewhere else for an answer, posted other things on here before without any joy they seem very selective about what they will help you with but it is a good piece of code and if i could just solve this problem then it would save me loads a headaches, all i want to do is be able to put it into a database for my own use so hope you find a solution to your problem mate.
 
Upvote 0
Not sure what you problem was,an atachment is helpful to understand your needs.
As an expert of datamining & VBA,I'm looking forward to helping you,provide more details,please.
Best Regards
Northwolves
 
Upvote 0
Could somebody please help me with this or explain what is wrong with the code, i have been trying to solve the problem myself but not having any knowledge of VBA i am just stabing in the dark.
This is the result i am getting at the moment.
312846;;;Hartfield Maiden Stakes (aw);£4000;3yo only;1m;5;12;Good to Firm (Turf Course;All Weather;Standard);1m 37.42s;06/06/2009
312847;;;E.b.f. Maiden Fillies' Stakes;£5500;2yo only;6f;5;5;Good to Firm (Turf Course;All Weather;Standard);1m 10.87s;06/06/2009
312848;;;Betdaq The Betting Exchange Fillies' Handicap;£4500;3yo plus;5f;5;8;Good to Firm (Turf Course;All Weather;Standard);0m 57.2s;06/06/2009

These are the results i have been able to achieve by playing about with the code.
312846;;; Lingfield;18:50;£4000;3yo only;1m;5;12;£4000 added, 3yo only, 1m, Class 5, £2730 penalty, 12 ran ;1m 37.42s;
312847;;; Lingfield;19:20;£5500;2yo only;6f;5;5;£5500 added, 2yo only, 6f, Class 5, £3562 penalty, 5 ran ;1m 10.87s;
312848;;; Lingfield;19:50;£4500;3yo plus;5f;5;8;£4500 added, 3yo plus, 5f, Class 5, £3071 penalty, 8 ran ;0m 57.2s

As you can see if i retrieve the course and time then i lose the race title

but this is how i need it to be.

312846;;; Lingfield;18:50;Hartfield Maiden Stakes;£4000;3yo only;1m;5;12;£4000 added, 3yo only, 1m, Class 5, £2730 penalty, 12 ran ;1m 37.42s;
312847;;; Lingfield;19:20;E.b.f. Maiden Fillies' Stakes;£5500;2yo only;6f;5;5;£5500 added, 2yo only, 6f, Class 5, £3562 penalty, 5 ran ;1m 10.87s;
312848;;; Lingfield;19:50;Betdaq The Betting Exchange Fillies' Handicap;£4500;3yo plus;5f;5;8;£4500 added, 3yo plus, 5f, Class 5, £3071 penalty, 8 ran ;0m 57.2s
Hi Northwolves
The problem i am having is that when the data is copied to the text file it is not moving the course and race time as in the example above sorry but i dont have permission to attach my workbook.
 
Upvote 0

Forum statistics

Threads
1,215,460
Messages
6,124,949
Members
449,198
Latest member
MhammadishaqKhan

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top