Extracting Tables from website

ExcelStudent00

New Member
Joined
Oct 13, 2016
Messages
2
Hello everyone,

I've come here with no other options. Usually I'm able to figure out the solutions to the problems that I run into but this time I am truly stumped. If possible, please help me out.

I am trying to extract tables from the following url: Pittsburgh Steelers at New England Patriots - September 10th, 2015 | Pro-Football-Reference.com

I am able to successfully extract the Scoring, Game Info, Officials, Expected Points Summary, and Team Stats tables, but I am finding it impossible to access the "Passing, Rushing and Receiving" and "Defense" tables.

I can see that the design for these tables are different. They seem to be wrapped up or hidden, but I can't extract them via the following methods.

This is the code I'm using:

Certain tables I can grab by ID:
Code:
[INDENT=2]                Set elemCollection = Doc.getElementById(tableId)
                
                For r = 0 To (elemCollection.Rows.Length - 1)
                    marker = 1
                    For C = 0 To (elemCollection.Rows(r).Cells.Length - 1)
                        ws2.Cells(r + 1 + LR, C + marker) = elemCollection.Rows(r).Cells(C).innerText
                    Next C
                Next r
[/INDENT]

Certain tables I can grab by ClassName:
Code:
                Set elemCollection = Doc.getElementsByClassName(tableId)
                
                For Each objOne In elemCollection
                    For r = 0 To (objOne.Rows.Length - 1)
                        marker = 1
                        For C = 0 To (objOne.Rows(r).Cells.Length - 1)
                            ws2.Cells(r + 1 + LR, C + marker) = objOne.Rows(r).Cells(C).innerText
                        Next C
                    Next r
                Next

I have both references Microsoft HTML Object Library and Microsoft Internet Controls activated.

--------------

As a side-note, I first tried to access the tables via Query Tables. I did find success, but I didn't like how I had to do it. I couldn't access the table through a regular query, so I did it manually and recorded the macro.

Here's the code:
Code:
Sub TestingWebQueryImport()
'
' TestingWebQueryImport Macro
'


'
    Sheets.Add After:=ActiveSheet
    ActiveWorkbook.Queries.Add name:="Defense Table", Formula:= _
        "let" & Chr(13) & "" & Chr(10) & "    Source = Web.Page(Web.Contents(""http://www.pro-football-reference.com/boxscores/201610060sfo.htm""))," & Chr(13) & "" & Chr(10) & "    Data27 = Source{27}[Data]," & Chr(13) & "" & Chr(10) & "    #""Changed Type"" = Table.TransformColumnTypes(Data27,{{""Player"", type text}, {""Tm"", type text}, {""Def Interceptions Int"", type text}, {""Def Interceptions Yds"", type text}, {""Def Interceptions TD"", type text}," & _
        " {""Def Interceptions Lng"", type text}, {""Sacks & Tackles Sk"", type text}, {""Sacks & Tackles Tkl"", type text}, {""Sacks & Tackles Ast"", type text}, {""Fumbles FR"", type text}, {""Fumbles Yds"", type text}, {""Fumbles TD"", type text}, {""Fumbles FF"", type text}})" & Chr(13) & "" & Chr(10) & "in" & Chr(13) & "" & Chr(10) & "    #""Changed Type"""
    Sheets.Add After:=ActiveSheet
    With ActiveSheet.ListObjects.Add(SourceType:=0, Source:= _
        "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=""Defense Table""" _
        , destination:=Range("$A$1")).QueryTable
        .CommandType = xlCmdSql
        .CommandText = Array("SELECT * FROM [Defense Table]")
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .PreserveColumnInfo = True
        .ListObject.DisplayName = "Defense_Table"
        .Refresh BackgroundQuery:=False
    End With
    ActiveWorkbook.Queries.Add name:="Passing, Rushing, & Receiving Table", _
        Formula:= _
        "let" & Chr(13) & "" & Chr(10) & "    Source = Web.Page(Web.Contents(""http://www.pro-football-reference.com/boxscores/201610060sfo.htm""))," & Chr(13) & "" & Chr(10) & "    Data25 = Source{25}[Data]," & Chr(13) & "" & Chr(10) & "    #""Changed Type"" = Table.TransformColumnTypes(Data25,{{""Player"", type text}, {""Tm"", type text}, {""Passing Cmp"", type text}, {""Passing Att"", type text}, {""Passing Yds"", type text}, {""Passing TD"", type text}," & _
        " {""Passing Int"", type text}, {""Passing Sk"", type text}, {""Passing Yds2"", type text}, {""Passing Lng"", type text}, {""Passing Rate"", type text}, {""Rushing Att"", type text}, {""Rushing Yds"", type text}, {""Rushing TD"", type text}, {""Rushing Lng"", type text}, {""Receiving Tgt"", type text}, {""Receiving Rec"", type text}, {""Receiving Yds"", type text}, {" & _
        """Receiving TD"", type text}, {""Receiving Lng"", type text}, {""Fumbles Fmb"", type text}, {""Fumbles FL"", type text}})" & Chr(13) & "" & Chr(10) & "in" & Chr(13) & "" & Chr(10) & "    #""Changed Type"""
    Sheets.Add After:=ActiveSheet
    With ActiveSheet.ListObjects.Add(SourceType:=0, Source:= _
        "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=""Passing, Rushing, & Receiving Table""" _
        , destination:=Range("$A$1")).QueryTable
        .CommandType = xlCmdSql
        .CommandText = Array("SELECT * FROM [Passing, Rushing, & Receiving Table]")
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .PreserveColumnInfo = True
        .ListObject.DisplayName = "Passing__Rushing____Receiving_Table"
        .Refresh BackgroundQuery:=False
    End With
    ActiveWorkbook.Queries.Add name:="Scoring Table", Formula:= _
        "let" & Chr(13) & "" & Chr(10) & "    Source = Web.Page(Web.Contents(""http://www.pro-football-reference.com/boxscores/201610060sfo.htm""))," & Chr(13) & "" & Chr(10) & "    Data16 = Source{16}[Data]," & Chr(13) & "" & Chr(10) & "    #""Changed Type"" = Table.TransformColumnTypes(Data16,{{""Quarter"", Int64.Type}, {""Time"", type time}, {""Tm"", type text}, {""Detail"", type text}, {""ARI"", Int64.Type}, {""SFO"", Int64.Type}})" & Chr(13) & "" & Chr(10) & "in" & Chr(13) & "" & Chr(10) & "    #""Changed Type"""
    Sheets.Add After:=ActiveSheet
    With ActiveSheet.ListObjects.Add(SourceType:=0, Source:= _
        "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=""Scoring Table""" _
        , destination:=Range("$A$1")).QueryTable
        .CommandType = xlCmdSql
        .CommandText = Array("SELECT * FROM [Scoring Table]")
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .PreserveColumnInfo = True
        .ListObject.DisplayName = "Scoring_Table"
        .Refresh BackgroundQuery:=False
    End With
    ActiveWorkbook.Queries.Add name:="Team Stats Table", Formula:= _
        "let" & Chr(13) & "" & Chr(10) & "    Source = Web.Page(Web.Contents(""http://www.pro-football-reference.com/boxscores/201610060sfo.htm""))," & Chr(13) & "" & Chr(10) & "    Data23 = Source{23}[Data]," & Chr(13) & "" & Chr(10) & "    #""Changed Type"" = Table.TransformColumnTypes(Data23,{{"""", type text}, {""ARI"", type text}, {""SFO"", type text}})" & Chr(13) & "" & Chr(10) & "in" & Chr(13) & "" & Chr(10) & "    #""Changed Type"""
    Sheets.Add After:=ActiveSheet
    With ActiveSheet.ListObjects.Add(SourceType:=0, Source:= _
        "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=""Team Stats Table""" _
        , destination:=Range("$A$1")).QueryTable
        .CommandType = xlCmdSql
        .CommandText = Array("SELECT * FROM [Team Stats Table]")
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .PreserveColumnInfo = True
        .ListObject.DisplayName = "Team_Stats_Table"
        .Refresh BackgroundQuery:=False
    End With
    Selection.ListObject.QueryTable.Refresh BackgroundQuery:=False
    Selection.ListObject.QueryTable.Refresh BackgroundQuery:=False
    Selection.ListObject.QueryTable.Refresh BackgroundQuery:=False
    Selection.ListObject.QueryTable.Refresh BackgroundQuery:=False
    Range("I14").Select
End Sub

I didn't like how heavy this method was. It seems to be adding each individual query to the workbook, and then adding 3 worksheets for each table after doing a select all to extract each tables data. Is it possible to do this any cleaner or smoother?

I'd really appreciate the help, thanks.
 

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.
For more clarification:

I can see the table id's that I need when I look at the page source, but I get an error when I refer to them. Almost as if the tables are wrapped or hidden.

Is anyone familiar with tables like this?

Btw this is the page that I want to extract tables from : Arizona Cardinals at San Francisco 49ers - October 6th, 2016 | Pro-Football-Reference.com

It's the "Passing, Rushing, and Receiving" and "Defense" tables that I can't get access to. The table id's are "player_offense" and "player_defense" but they can't be touched, meanwhile the other tables like the team table can be extracted as normal
 
Upvote 0

Forum statistics

Threads
1,215,202
Messages
6,123,625
Members
449,109
Latest member
Sebas8956

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top