htmlTable: collecting and correlating

tweedle

Well-known Member
Joined
Aug 1, 2010
Messages
1,559
Hi, would appreciate some assistance on this one.
So we have HTMLtables with Col labels and Row Labels and values at the intersects.

For each intersect cell, I need to collect the Column Title, Row Label and data value (in the intersection).
[Can collect to an array]

Here's what I have so far [less the miserable failures]
Code:
Option Compare Database
Sub CollectCLSearchDemographics()
    Dim IEDoc As HTMLDocument
    Dim tbl As HTMLTable
    'http://msdn.microsoft.com/en-us/library/aa752084(v=vs.85).aspx
    rootURL = "[URL]http://www.clrsearch.com/[/URL]"
    midURL = "Raleigh_Demographics/"
    endURL = "NC/27610/"
    fullURL = rootURL & midURL & endURL
    Set ie = CreateObject("InternetExplorer.Application")
    ie.Visible = True
    ie.Navigate fullURL
    Do While ie.busy
        DoEvents
    Loop
    Set IEDoc = ie.document
    With IEDoc
        Set tbl = ie.document.getElementById("tab-crime")
        'Errors - object variable not set
        'Debug.Print tbl.cells.Item(Index:=1)
        For Each ele In ie.document.getElementsByTagName("table"): i = i + 1: Next
        MsgBox i 'Reports 33 Tables
    End With
    Exit Sub
    ie.Quit
End Sub

From http://www.clrsearch.com/Raleigh_Demographics/NC/27610/

This is one of many tables to collect.
HTML:
                        2010 Crime Rate Indexes
                        Raleigh, NC 27610
                        North Carolina
                        United States
 
 
 
 
 
 
                        Total Crime Risk Index
                        90
                        115
                        100
 
 
 
                        Murder Risk Index
                        89
                        107
                        100
 
 
 
                        Rape Risk Index
                        66
                        87
                        100
 
 
 
                        Robbery Risk Index
                        77
                        93
                        100
 
 
 
                        Assault Risk Index
                        49
                        88
                        100
 
 
 
                        Burglary Risk Index
                        84
                        160
                        100
 
 
 
                        Larceny Risk Index
                        127
                        110
                        100
 
 
 
                        Motor Vehicle Theft Risk Index
                        70
                        86
                        100

Thoughts?
Am I on the best path?
 
Last edited:

Excel Facts

What is the fastest way to copy a formula?
If A2:A50000 contain data. Enter a formula in B2. Select B2. Double-click the Fill Handle and Excel will shoot the formula down to B50000.
You could probably iterate through the td and tr elements of each table.
 
Upvote 0
How to do that though without hundreds of string functions...finding the right object model;

this is closer to what I was seeking:

Code:
Option Compare Database
Sub CollectCLSearchDemographics()
    Dim IEDoc As HTMLDocument
    Dim tbl As HTMLTable
    Dim hTitle As HTMLTitleElement
    Dim hRow As HTMLTableRow
    Dim hCol As HTMLTableCol
    Dim hCell As HTMLTableCell
    'http://msdn.microsoft.com/en-us/library/aa752084(v=vs.85).aspx
    rootURL = "[URL]http://www.clrsearch.com/[/URL]"
    midURL = "Raleigh_Demographics/"
    endURL = "NC/27610/"
    fullURL = rootURL & midURL & endURL
    Set ie = CreateObject("InternetExplorer.Application")
    ie.Visible = True
    ie.Navigate fullURL
    Do While ie.busy
        DoEvents
    Loop
    Dim objHTMLTable As HTMLTable
    Set IEDoc = ie.document
    With IEDoc
        Set hDataTables = IEDoc.getElementsByClassName("data-table")
        For TableIdx = 0 To hDataTables.length - 1
            Set objHTMLTable = hDataTables(TableIdx)
            'http://msdn.microsoft.com/en-us/library/ms537443(v=VS.85).aspx
            For RowIdx = 0 To objHTMLTable.rows.length - 1
                Set oCells = objHTMLTable.rows.Item(RowIdx).cells
                CellsLength = oCells.length
                For j = 0 To CellsLength - 1 Step oCells.length
                    If CellsLength = 4 Then
                        Debug.Print oCells.Item(j).innerHTML; Tab(60); oCells.Item(j + 1).innerHTML, oCells.Item(j + 2).innerHTML, oCells.Item(j + 3).innerHTML
                    End If
                Next j
            Next RowIdx
        Next
    End With
    ie.Quit
End Sub
 
Upvote 0
That code is basically what I meant.

As far as I can see you are iterating through the table element.

Is it not working?

Can tell me the tables you are interested in?

I think I know but I only had a quick look last night.
 
Upvote 0
This is working, and close enough that I can work w/it to parse to an array and ultimately a db.table. Just needed to get the lower level processes worked out before cycling through higher level loops.
I'm ultimately interested in [all of them/zip code] to match against R/E properties in undefined future queries. (need to see what we get before we start trimming off parts).

Next step; work with the page.form.submit...
Pretty new to coding against web controls; promises to be a learning experience.

Thanks for taking a peek.
 
Last edited:
Upvote 0
I've had a go at this.

Usually I would put the data in Excel but I've always meant to try some kind of database approach.

Here's what I came up with, pretty basic and the table design is somewhat limited by the way they are organised on the page.



Anyway here it is:

Rich (BB code):
Option Compare Database
Option Explicit

Sub CollectCLSearchDemographics()
Dim IE As Object
Dim doc As Object
Dim divs As Object
Dim divContent As Object
Dim tbls As Object
Dim tbl As Object
Dim tblData
Dim CurDB As DAO.Database
Dim tblNew As DAO.TableDef
Dim fld As DAO.Field
Dim strTblName As String
Dim NoTbl As Long
Dim I As Long
Dim baseURL As String
Dim searchURL As String
 
    baseURL = "http://www.clrsearch.com/"

    searchURL = "Raleigh_Demographics/NC/27610/"
 
    Set IE = CreateObject("InternetExplorer.Application")
 
    'IE.Visible = True ' optional
 
    IE.Navigate baseURL & searchURL
 
    Do While IE.busy
        DoEvents
    Loop
 
    Set doc = IE.Document

    Set divs = IE.Document.getelementsbytagname("DIV")
 
    For Each divContent In divs

        If divContent.classname = "tabcontent3" Then
            
            Set tbls = divContent.getelementsbytagname("TABLE")
            
            strTblName = GetIntials(divContent.getelementsbytagname("H4")(0).outerText)
 
            For Each tbl In tbls

                tblData = GetTableData(tbl)
                
                Set tblNew = CurrentDb.CreateTableDef("Table" & NoTbl & "-" & strTblName)
                For I = 1 To 4
                    Set fld = tblNew.CreateField("Field" & I, DAO.DataTypeEnum.dbText)
                    tblNew.Fields.Append fld
                Next I
 
                CurrentDb.TableDefs.Append tblNew
 
                PopulateTable "Table" & NoTbl & "-" & strTblName, tblData
                
                NoTbl = NoTbl + 1
'
            Next tbl

        End If

    Next divContent
 
    IE.Quit
 
End Sub
 
Function GetTableData(ByRef tbl As Object) As Variant
Dim cl As Object
Dim rw As Object
Dim arrdata()
Dim x As Long
Dim y As Long
 
    ReDim arrdata(tbl.rows.length, tbl.rows(0).cells.length)
    
    For Each rw In tbl.rows
 
        For Each cl In rw.cells

            arrdata(x, y) = cl.innertext
            y = y + 1

        Next cl
        y = 0
        x = x + 1
    Next rw

    GetTableData = arrdata
    
End Function
 
Function GetIntials(strText As String) As String
Dim arrWords
Dim I As Long
    arrWords = Split(strText, " ")
    For I = LBound(arrWords) To UBound(arrWords)
        GetIntials = GetIntials & Left(arrWords(I), 1)
    Next I
End Function
 
Function PopulateTable(tblName As String, arrdata As Variant) As Boolean
Dim rst As Recordset
Dim I As Long
Dim J As Long
 
    Set rst = CurrentDb.OpenRecordset(tblName)
       
    For I = LBound(arrdata, 1) To UBound(arrdata, 1)
    
        With rst
            .AddNew
            
            For J = LBound(arrdata, 2) To UBound(arrdata, 2) - 1
            
            .Fields(J).Value = arrdata(I, J)
        
            Next J
            
           .Update
        End With
        
    Next I

End Function
Pretty sure some of the code could be moved out of the main sub, eg the creation of the tables.
 
Upvote 0
You somehow expect me to believe that in 54,873 posts, you've never done this in Access...:)

Thanks for taking a swing; you present some very interesting functions.
 
Upvote 0
I don't think I've ever seen anyone ask how to get the data into Access, always Excel.

Don't know why, perhaps because the data needs a lot of work to get it into a decent format for use in a database.

Who knows?:)
 
Upvote 0
Greetings;
Thought I would report back as to what the 'interim final version' is:

Below is the Make Table code, the primary collector and two supporting string-parsing functions as well as your GetIntials function.

Thanks again for getting me pushed in the right direction.

Whoops! Late for work - I hate that (not that I'm late, that I have to go at all).

Code:
Sub MakeTable()
'---------------------------------------------------------------------------------------
' Procedure : MakeTable
' Author    : tweedle
' Citation  : [URL]http://www.mrexcel.com/forum/showthread.php?p=2810971#post2810971[/URL]
' Adapted By: tweedle
' Date      : Tue 2011-08-02
' Purpose   : Creates table Demographix; Loaded by CollectCLSearchDemographics
'---------------------------------------------------------------------------------------
'
Dim tdef As TableDef
Dim fld As Field
Set tdef = CurrentDb.CreateTableDef("Demographix")
Set fld = tdef.CreateField("Year", dbInteger, 4)
tdef.Fields.Append fld
Set fld = tdef.CreateField("MasterCategory", dbText, 255)
tdef.Fields.Append fld
Set fld = tdef.CreateField("MasterCatCd", dbText, 10)
tdef.Fields.Append fld
Set fld = tdef.CreateField("StatDescription", dbText, 70)
tdef.Fields.Append fld
Set fld = tdef.CreateField("ZipCd", dbText, 5)
tdef.Fields.Append fld
Set fld = tdef.CreateField("StateCd", dbText, 2)
tdef.Fields.Append fld
Set fld = tdef.CreateField("ZipVal", dbDouble)
tdef.Fields.Append fld
Set fld = tdef.CreateField("StateVal", dbDouble)
tdef.Fields.Append fld
Set fld = tdef.CreateField("USVal", dbDouble)
tdef.Fields.Append fld
Set fld = tdef.CreateField("StatValType", dbText, 2)
tdef.Fields.Append fld
CurrentDb.TableDefs.Append tdef
Set fld = Nothing
Set tdef = Nothing
End Sub

Code:
Sub CollectCLSearchDemographics()
'---------------------------------------------------------------------------------------
' Procedure : CollectCLSearchDemographics
' Author    : Tweedle
' Citation  : [URL]http://www.mrexcel.com/forum/showthread.php?p=2810971#post2810971[/URL]
' Adapted By: Tweedle
' Date      : Tue 2011-08-02
' Purpose   : Collects a single page of demographic statistics from CLSearch.com
'             Loads data to table Demographix
'---------------------------------------------------------------------------------------
'
    Dim IEDoc As HTMLDocument
    Dim tbl As HTMLTable
    Dim hTitle As HTMLTitleElement
    Dim hRow As HTMLTableRow
    Dim hCol As HTMLTableCol
    Dim hCell As HTMLTableCell
    Dim rsDemoG As Recordset
    Set rsDemoG = CurrentDb.OpenRecordset("Demographix", dbOpenTable)
    'http://msdn.microsoft.com/en-us/library/aa752084(v=vs.85).aspx
    rootURL = "[URL]http://www.clrsearch.com/[/URL]"
    midURL = "Raleigh_Demographics/"
    endURL = "NC/27610/"
    fullURL = rootURL & midURL & endURL
    Set IE = CreateObject("InternetExplorer.Application")
    IE.Visible = True
    IE.Navigate fullURL
    Do While IE.busy
        DoEvents
    Loop
    Dim objHTMLTable As HTMLTable
    Set IEDoc = IE.Document
    With IEDoc
        Set hDataTables = IEDoc.getElementsByClassName("data-table")
        For TableIdx = 0 To hDataTables.length - 1
            Set objHTMLTable = hDataTables(TableIdx)
            'http://msdn.microsoft.com/en-us/library/ms537443(v=VS.85).aspx
            For RowIdx = 0 To objHTMLTable.rows.length - 1
                Set oCells = objHTMLTable.rows.Item(RowIdx).cells
                CellsLength = oCells.length
                For J = 0 To CellsLength - 1 Step oCells.length
                    If CellsLength = 4 Then
                        Debug.Print _
                                oCells.Item(J).innerText; _
                                Tab(60); _
                                oCells.Item(J + 1).innerHTML, _
                                oCells.Item(J + 2).innerHTML, _
                                oCells.Item(J + 3).innerHTML
                        If RowIdx = 0 Then
                            vYear = Left(oCells.Item(J).innerText, 4)
                            MasterCatDesc = Trim(Right(oCells.Item(J).innerText, Len(oCells.Item(J).innerText) - 4))
                            MasterCatCd = GetIntials(Trim(Right(oCells.Item(J).innerText, Len(oCells.Item(J).innerText) - 4)))
                            ZipCD = Right(Trim(oCells.Item(J + 1).innerHTML), 5)
                            StateCd = Left(Right(Trim(oCells.Item(J + 1).innerHTML), 8), 2)
                            StateLong = oCells.Item(J + 2).innerHTML
                            StatDescription = Null
                            StatVal = Null
                            StatValType = Null
                        Else
                            OK = True
                            If Trim(oCells.Item(J + 0).innerText) = "Total Area Household Income" Then OK = False
                            If OK Then
                                StatDescription = oCells.Item(J + 0).innerText
                                ZipStatVal = oCells.Item(J + 1).innerText
                                StateStatVal = oCells.Item(J + 2).innerText
                                USStatVal = oCells.Item(J + 3).innerText
                                StatValType = ""
                                rsDemoG.AddNew
                                rsDemoG!Year = vYear
                                rsDemoG!MasterCategory = MasterCatDesc
                                rsDemoG!MasterCatCd = MasterCatCd
                                rsDemoG!StatDescription = StatDescription
                                rsDemoG!ZipCD = ZipCD
                                rsDemoG!StateCd = StateCd
                                rsDemoG!ZipVal = GetValue(ZipStatVal)
                                rsDemoG!StateVal = GetValue(Trim(StateStatVal))
                                rsDemoG!USVal = GetValue(Trim(USStatVal))
                                rsDemoG!StatValType = GetValType(Trim(ZipStatVal))
                                rsDemoG.Update
                            End If
                        End If
                    End If
                Next J
            Next RowIdx
        Next
    End With
    IE.Quit
End Sub
Function GetValue(v)
'---------------------------------------------------------------------------------------
' Procedure : GetValue
' Author    : Tweedle
' Citation  :
' Adapted By: Tweedle
' Date      : Tue 2011-08-02
' Purpose   : Returns the numeric value from string with specific non-numeric characters embedded
'---------------------------------------------------------------------------------------
'
    strOut = Trim(v)
    strOut = Replace(strOut, "N/A", "0", Compare:=vbTextCompare)
    strOut = Replace(strOut, "%", "", Compare:=vbTextCompare)
    strOut = Replace(strOut, Chr(34), "", Compare:=vbBinaryCompare)
    strOut = Replace(strOut, "°F", "", Compare:=vbTextCompare)
    strOut = Replace(strOut, "°C", "", Compare:=vbTextCompare)
    strOut = Replace(strOut, "$", "", Compare:=vbTextCompare)
    strOut = Replace(strOut, ",", "", Compare:=vbTextCompare)
    GetValue = CCur(Trim(strOut))
End Function
Function GetValType(v)
'---------------------------------------------------------------------------------------
' Procedure : GetValType
' Author    : Tweedle
' Citation  :
' Adapted By: Tweedle
' Date      : Tue 2011-08-02
' Purpose   : Identifies what type of value is being collected
'---------------------------------------------------------------------------------------
'
    GetValType = "#"
    If Right(v, 1) = "%" Then GetValType = "%"
    If Right(v, 1) = Chr(34) Then GetValType = "in"
    If Right(v, 2) = "°F" Then GetValType = "°F"
    If Right(v, 2) = "°C" Then GetValType = "°C"
    If Left(v, 1) = "$" Then GetValType = "$"
End Function
 
Function GetIntials(strText As String) As String
'---------------------------------------------------------------------------------------
' Procedure : GetIntials
' Author    : Norie [[URL]http://www.mrexcel.com/forum/member.php?u=26558][/URL]
' Citation  : [URL]http://www.mrexcel.com/forum/showthread.php?p=2810694#post2810694[/URL]
' Adapted By:
' Date      : Tue 2011-08-02
' Purpose   : Creates abbreviated code from longer strings of words
'---------------------------------------------------------------------------------------
'
    Dim arrWords
    Dim I As Long
    arrWords = Split(strText, " ")
    For I = LBound(arrWords) To UBound(arrWords)
        GetIntials = GetIntials & Left(arrWords(I), 1)
    Next I
End Function
 
Upvote 0

Forum statistics

Threads
1,224,516
Messages
6,179,231
Members
452,898
Latest member
Capolavoro009

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top