Reliable responsible web scrape code

mikecro

New Member
Joined
Aug 9, 2014
Messages
10
Firstly this is hobby not work - as will become apparent. I am a reasonable VBA programmer but have never done web-related stuff before.

I want to scrape a number of pages from the English Premier League Fantasy football. I have some basic code but it hangs - sometimes after a few scrapes.Sometimes aftr 2+. Is it timeout handling I am missing? I am trying to be responsible with a sleep between scrapes and have tried longer waits. And I need to change my 404 detection in main. Have done a lot of research but just want a reliable version of GetWebSource. Then I will just have some JSON objects to process.

Any pointers? Code below.

Code:
[FONT=Calibri][SIZE=3][COLOR=#000000]Const URLBase As String = "http://fantasy.premierleague.com/web/api/elements/"[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]Const MaxPlayerNum = 1000[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]Const NotFound = "404 Not Found"[/COLOR][/SIZE][/FONT]



[FONT=Calibri][SIZE=3][COLOR=#000000]#If VBA7 Then[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr) 'For 64 Bit Systems[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]#Else[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long) 'For 32 Bit Systems[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]#End If[/COLOR][/SIZE][/FONT]





[FONT=Calibri][SIZE=3][COLOR=#000000]Sub main()[/COLOR][/SIZE][/FONT]
[FONT=Calibri][SIZE=3][COLOR=#000000]Dim urlstring As String[/COLOR][/SIZE][/FONT]


[FONT=Calibri][SIZE=3][COLOR=#000000]urlstring = URLBase + CStr(999)[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]PlayerJSon = GetWebSource(urlstring)[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]NF = InStr(PlayerJSon, NotFound)[/COLOR][/SIZE][/FONT]


[FONT=Calibri][SIZE=3][COLOR=#000000]maxplayer = -1[/COLOR][/SIZE][/FONT]



[FONT=Calibri][SIZE=3][COLOR=#000000]For i = 1 To MaxPlayerNum[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]urlstring = URLBase + CStr(i)[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]Debug.Print urlstring[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]PlayerJSon = GetWebSource(urlstring)[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]'Debug.Print PlayerJSon[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]If (InStr(PlayerJSon, NotFound) = 0) Then[/COLOR][/SIZE][/FONT]
[FONT=Calibri][SIZE=3][COLOR=#000000]maxplayer = i[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]End If[/COLOR][/SIZE][/FONT]
[FONT=Calibri][SIZE=3][COLOR=#000000]Sleep 1000[/COLOR][/SIZE][/FONT]
[FONT=Calibri][SIZE=3][COLOR=#000000]Debug.Print maxplayer[/COLOR][/SIZE][/FONT]
[FONT=Calibri][SIZE=3][COLOR=#000000]Next i[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]End Sub[/COLOR][/SIZE][/FONT]




[FONT=Calibri][SIZE=3][COLOR=#000000]Public Function GetWebSource(Url As String) As String[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]Dim xml As IXMLHTTPRequest[/COLOR][/SIZE][/FONT]
[FONT=Calibri][SIZE=3][COLOR=#000000]On Error Resume Next[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]Set xml = CreateObject("Microsoft.XMLHTTP")[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]With xml[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000].Open "GET", Url, False[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000].send[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]If .Status = 404 Then[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]GetWebSource = ""[/COLOR][/SIZE][/FONT]
[FONT=Calibri][SIZE=3][COLOR=#000000]Exit Function[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]End If[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]Do While .Status <> 200[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]DoEvents[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]Loop[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]While .readyState <> 4[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]DoEvents[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]Wend[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]GetWebSource = .responseText[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]Status = .Status[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]End With[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]Set xml = Nothing[/COLOR][/SIZE][/FONT]

[FONT=Calibri][SIZE=3][COLOR=#000000]End Function[/COLOR][/SIZE][/FONT]
 

Excel Facts

Pivot Table Drill Down
Double-click any number in a pivot table to create a new report showing all detail rows that make up that number
Found the problem by replacing the sleep with Application.Wait. Don't know what was wrong with the sleep code
 
Upvote 0

Forum statistics

Threads
1,216,163
Messages
6,129,223
Members
449,495
Latest member
janzablox

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top