Scraping web works on same kind of webpage except one.

Certified

Board Regular
Joined
Jan 24, 2012
Messages
189
My company uses a webpage created by ASP.NET to upload data from difference sources (people). Employees from different regions input data on the webpage and I consolidate the data into a report.

I am writing a vba code to pull (scrap) info from the page to an Excel sheet.

The first part of the code creates the link and saves the website data into a variable (htmldoc).

Code:
Sub getdatafromsubmission(schedule As String)    
    Dim xmlPage As New MSXML2.XMLHTTP60
    Dim htmlDoc As New MSHTML.HTMLDocument
    Dim urlLink As String
  
    
    urlLink = "http://w7gs-5n1h9y1/AMGSubmission/" & schedule & "list.cshtml"
    Debug.Print urlLink
    
    
    xmlPage.Open "GET", urlLink, False
    xmlPage.send
    
    'creates a new html document from XMLPage.
    htmlDoc.body.innerHTML = xmlPage.responseText
    
    'THis sub process the html
    ProcessHTMLPage htmlDoc, schedule


End Sub


Next I call another sub to scrap the data and paste into an Excel Sheet.

Code:
Sub ProcessHTMLPage(htmlPage As MSHTML.HTMLDocument, schedule As String)

    'class name = table ewTable


    Dim htmlTable As MSHTML.IHTMLElement
    Dim htmltables As MSHTML.IHTMLElementCollection
    Dim htmlRow As MSHTML.IHTMLElement
    Dim htmlcell As MSHTML.IHTMLElement
    Dim rowNum As Long, colNum As Integer
    Dim x As Integer
    Dim y As Integer
        
    
    
    'Note: htmltable(s) is collection of elements not a single element.
    Set htmltables = htmlPage.getElementsByTagName("table")
    
    For Each htmlTable In htmltables
        'Debug.Print htmlTable.className
        
        x = 1
        For Each htmlRow In htmlTable.getElementsByTagName("tr")
            'Debug.Print vbTab & htmlRow.innertext
            
            y = 1
            '//This prints out any level below the "tr" element
            For Each htmlcell In htmlRow.Children
                'Debug.Print vbTab & htmlcell.innertext
                
                Sheet3.Activate
                
                Cells(x, y) = htmlcell.innertext
                
            y = y + 1
            Next htmlcell
        
        x = x + 1
        Next htmlRow
    
    Next htmlTable
End Sub

I use the same code for 4 different web pages all created by the same template.

As you can see above the code first finds the "table" tag name. before going further.

This is where my issue starts -

The codes works fine for the first 3 pages, but errors on the 4th. I seems like the code can't find the "table" tag name.

I check the html code in Chrome and the page does have a table id.

Does anyone have any idea or suggestion on what I can do?
 

Excel Facts

Did you know Excel offers Filter by Selection?
Add the AutoFilter icon to the Quick Access Toolbar. Select a cell containing Apple, click AutoFilter, and you will get all rows with Apple

Forum statistics

Threads
1,141,046
Messages
5,703,925
Members
421,321
Latest member
blusky4

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top