Import data from multiple pages of a website into a single Excel sheet

V S S Sarma

New Member
Joined
Jul 13, 2012
Messages
11
I am a cricket buff and like to see statistics. But they run to 47 pages running to 9,394 rows in a website. Copy and paste is not a solution. I look forward to your help in either developing an Excel VBA macro or any other way. Please see the first two pages here:

Page 1
http://stats.espncricinfo.com/ci/engine/stats/index.html?class=2;filter=advanced;orderby=start;size=200;spanmax1=12+Jul+2012;spanmin1=13+Jul+2009;spanval1=span;template=results;type=batting;view=innings;wrappertype=print
Page 2
http://stats.espncricinfo.com/ci/engine/stats/index.html?class=2;filter=advanced;orderby=start;page=2;size=200;spanmax1=12+Jul+2012;spanmin1=13+Jul+2009;spanval1=span;template=results;type=batting;view=innings;wrappertype=print

I will be grateful to receive an immediate response.
 

Warship

Well-known Member
Joined
Jul 17, 2007
Messages
1,051
Doing all 47 pages at once may cause their site to block your IP.
Though usually only temporarily - a few minutes to a few days.
You could do it in sessions by changing "For i = 1 To 47" to the appropriate page numbers.
After each session be sure to rename the "Stats" sheet as the code starts by deleting and re-creating it.
After all sessions combine the various "Stat" sheets.

Doing all 47 at once will take a few minutes to complete and you won't see anything happening.
The cursor will only spin occasionally.
You wil get "Query complete" when it finishes.

Code:
Sub QueryWeb()
    Dim i As Integer
    Dim firstRow As Integer
    Dim lastRow As Integer
    Dim nextRow As Integer
    Dim URLstart As String
    Dim URLend As String
    Dim shStats As Worksheet
    Dim shQuery As Worksheet
    Dim rgQuery As Range
    Dim found As Range
    Dim TimeOutWebQuery
    Dim TimeOutTime
    Dim objIE As Object
    Application.ScreenUpdating = False
    URLstart = "http://stats.espncricinfo.com/ci/engine/stats/index.html?class=2;filter=advanced;orderby=start;page="
    URLend = ";size=200;spanmax1=12+Jul+2012;spanmin1=13+Jul+2009;spanval1=span;template=results;type=batting;view=innings;wrappertype=print"
    Application.DisplayAlerts = False
    On Error Resume Next
    Sheets("Stats").Delete
    On Error GoTo 0
    Application.DisplayAlerts = True
    Sheets.Add after:=Sheets(Sheets.Count)
    ActiveSheet.Name = "Stats"
    Set shStats = Sheets("Stats")
    For i = 1 To 47
        Sheets.Add after:=Sheets(Sheets.Count)
        Set shQuery = ActiveSheet
        Set objIE = CreateObject("InternetExplorer.Application")
        With objIE
            .Visible = False
            .Navigate CStr(URLstart & i & URLend)
        End With
        TimeOutWebQuery = 10
        TimeOutTime = DateAdd("s", TimeOutWebQuery, Now)
        Do Until objIE.ReadyState = 4
            DoEvents
            If Now > TimeOutTime Then
                objIE.stop
                GoTo ErrorTimeOut
            End If
        Loop
        objIE.ExecWB 17, 2
        objIE.ExecWB 12, 2
        shQuery.Range("A1").Select
        shQuery.PasteSpecial NoHTMLFormatting:=True
        objIE.Quit
        Set objIE = Nothing
        Set found = shQuery.Columns(1).Find("Player", , , xlWhole)
        If Not found Is Nothing Then
            firstRow = found.Row
            If i > 1 Then firstRow = firstRow + 1
        Else
            GoTo FormatError
        End If
        Set found = shQuery.Columns(1).Find("Page ", found, , xlPart)
        If Not found Is Nothing Then
            lastRow = found.Row - 1
        Else
            GoTo FormatError
        End If
        Set rgQuery = shQuery.Rows(firstRow & ":" & lastRow)
        nextRow = shStats.Cells(Rows.Count, "A").End(xlUp).Row
        If nextRow > 1 Then nextRow = nextRow + 1
        rgQuery.Copy shStats.Cells(nextRow, 1)
        Application.DisplayAlerts = False
        shQuery.Delete
        Application.DisplayAlerts = True
    Next i
    shStats.Columns.AutoFit
    MsgBox "Query complete"
    Exit Sub
FormatError:
    MsgBox "Format Error"
    Exit Sub
ErrorTimeOut:
    objIE.Quit
    Set objIE = Nothing
    MsgBox "WebSite Error"
End Sub
 

V S S Sarma

New Member
Joined
Jul 13, 2012
Messages
11
You are so good, my friend. I am grateful to you. Thanks a lot.

Since no one was coming forward, I made three macros and solved it. Please see:

(1)
Sub webquery()
startrow = 1
For i = 1 To 55
If i = 1 Then
curl = "URL;http://stats.espncricinfo.com/ci/engine/stats/index.html?class=2;filter=advanced;orderby=start;size=200;spanmax1=30+Jun+2012;spanmin1=01+Jan+2009;spanval1=span;template=results;type=batting;view=innings;wrappertype=print"
Else
curl = "URL;http://stats.espncricinfo.com/ci/engine/stats/index.html?class=2;filter=advanced;orderby=start;page=" & i & ";size=200;spanmax1=30+Jun+2012;spanmin1=01+Jan+2009;spanval1=span;template=results;type=batting;view=innings;wrappertype=print"
End If

With ActiveSheet.QueryTables.Add(Connection:=curl, Destination:=Range("$A$" & startrow))
.Name = "Webquery" & i
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = "3"
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
startrow = startrow + 202
Next i
End Sub
(2)
Sub DelRows()
Application.ScreenUpdating = False
With Sheets("Sheet2").Range("A3", Range("A" & Rows.Count).End(xlUp))
.AutoFilter Field:=1, Criteria1:="Innings by innings list"
.Offset(1).EntireRow.Delete
.AutoFilter
End With
Application.ScreenUpdating = True
End Sub
(3)
Sub DelRows2()
Application.ScreenUpdating = False
With Sheets("Sheet2").Range("A3", Range("A" & Rows.Count).End(xlUp))
.AutoFilter Field:=1, Criteria1:="Player"
.Offset(1).EntireRow.Delete
.AutoFilter
End With
Application.ScreenUpdating = True
End Sub

This has worked. But then, I see that you are a unique guy, wanting to help. Thanks again.
 

Warship

Well-known Member
Joined
Jul 17, 2007
Messages
1,051
You are quite welcome. Glad you got it worked out.

I find Table Queries are good for one-time use.
Because over time webmasters may change the table's number and you'll have to modify your code.
But if it works for you that's the important thing.

Please don't discount MrExcel.com's help that quick.
There are plenty of helpful folks here and are truly a great bunch.
I'm far from unique but thank you anyway.

Hope to see you here again soon.
 

WHardy

New Member
Joined
Jun 3, 2013
Messages
1
Hello!
I'm trying to do something similiar to what is described here. Unfortunately, my VBA skills are very poor and I'm not sure how to modify the code for it to work in my case.

What I'm trying to import is the data from the following website: Names Pg. 1 (only the number after page= changes), on the names, gender and origins. However, I get errors along the way.

Would any of you be so kind as to help me with this?

Regards,
WH
 

Warship

Well-known Member
Joined
Jul 17, 2007
Messages
1,051
I haven't been getting on the board much.
Maybe you still need help.

As there are 4000+ pages this will take a while.
It is possible your IP may be blocked before it finishes.
You will see your progress in the status bar (lower left of excel window).

Your WB is saved after each page so if it fails you'll have up to the failure saved.

You'll have some cleaning to do when it's done.

Please let me know how it goes...

Code:
Sub BabyNames()
    Dim nextRow As Integer, i As Integer
    Application.ScreenUpdating = False
    Application.DisplayStatusBar = True
    For i = 1 To 2 'this is the page range to be captured. At the time there was 4083 total.
        Application.StatusBar = "Processing Page " & i
        nextRow = ActiveSheet.Cells(Rows.Count, "A").End(xlUp).Row + 1
        With ActiveSheet.QueryTables.Add(Connection:= _
            "URL;http://babynames.merschat.com/index.cgi?function=Search&page=" & i, _
            Destination:=Range("A" & nextRow))
            
            .Name = "index.cgi?function=Search&page=4083"
            .FieldNames = True
            .RowNumbers = False
            .FillAdjacentFormulas = False
            .PreserveFormatting = True
            .RefreshOnFileOpen = False
            .BackgroundQuery = True
            .RefreshStyle = xlInsertDeleteCells
            .SavePassword = False
            .SaveData = True
            .AdjustColumnWidth = True
            .RefreshPeriod = 0
            .WebSelectionType = xlSpecifiedTables
            .WebFormatting = xlWebFormattingNone
            .WebTables = "22"
            .WebPreFormattedTextToColumns = False
            .WebConsecutiveDelimitersAsOne = True
            .WebSingleBlockTextImport = False
            .WebDisableDateRecognition = False
            .WebDisableRedirections = False
            .Refresh BackgroundQuery:=False
        End With
        ThisWorkbook.Save
    Next i
    Application.StatusBar = False
End Sub
 

trainheavy

New Member
Joined
Apr 24, 2015
Messages
4
Hi,

I'm looking to do something similar in importing data ESPN. I'm trying to figure out a way to query data for batters vs left hand and batters vs right hand. The problem is, it only lists 40 per page and I don't know how to query all the pages onto a single page in excel. This is what I'm referring to: 2015 Regular Season MLB Baseball Batting Statistics and League Leaders - Major League Baseball - ESPN. Any help would be greatly appreciated.
 

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
6,001
Hi,

I'm looking to do something similar in importing data ESPN. I'm trying to figure out a way to query data for batters vs left hand and batters vs right hand. The problem is, it only lists 40 per page and I don't know how to query all the pages onto a single page in excel. This is what I'm referring to: 2015 Regular Season MLB Baseball Batting Statistics and League Leaders - Major League Baseball - ESPN. Any help would be greatly appreciated.
Page 1 URL is:
HTML:
http://espn.go.com/mlb/stats/batting/_/split/31/count/1/qualified/false
I guessed the "/count/1" part based on the other pages, to give a uniform URL.

Page 2 URL is:
HTML:
http://espn.go.com/mlb/stats/batting/_/split/31/count/41/qualified/false
Page 3 URL is:
HTML:
http://espn.go.com/mlb/stats/batting/_/split/31/count/81/qualified/false
With that, use a For n = 1 to 121 Step 40 loop (or whatever the last page is) with the BabyNames QueryTables code, updating the Connection string URL with the value of 'n' and the Destination cell argument, as needed.
 

Forum statistics

Threads
1,082,574
Messages
5,366,387
Members
400,887
Latest member
tporeda

Some videos you may like

This Week's Hot Topics

Top