Auto-Update Data from Same Table of a URL

flagen

New Member
Joined
Jun 22, 2015
Messages
16
I'm new to this forum. I have some experience with VBA but none with HTML or Javascript. I am trying to get excel to reference (and regularly refresh once a day) 5 different data tables from a single url of a website that changes the data depending on 5 different regions of a U.S. map. I tried a web query with 5 different connections, but once it refreshes, all 5 tables in excel reference the last set of data used. It won't let me select all 5 tables at once with a single connection. I have no idea how to web scrape the tables I need as I don't understand the html/java language the website uses in its source code. Any direction would help. I am not sure what part of the source code you would need to solve this so I thought I would wait on posting that. The site does use a username and password.
 

Excel Facts

When did Power Query debut in Excel?
Although it was an add-in in Excel 2010 & Excel 2013, Power Query became a part of Excel in 2016, in Data, Get & Transform Data.
Hello flagen,

This is a daunting task even for more experienced coders. You should post the URL and point out which tables you are interested in.

Once I see what you want to scrape, I can write a macro for you.
 
Upvote 0
Thanks Leith,

Will I need to provide the username/password in order for you to visualize it? I'm afraid I'm not at liberty to provide that information. I apologize if that has an obvious answer.
 
Upvote 0
Hello flagen,

Since you mention a user name and password, this must be on a private network (intranet) or a restricted public site. In either case, I would need to see the HTML code for the page you want to scrape data from. You can save the HTML page source as a text file and upload it to a file sharing site or email it to me. I can send my email address to you via private message.
 
Upvote 0
Hello flagen,

Here is the code so far. This will scrape the text from each table in the URL list. Currently, I have 3 entries in the list. These are the files that you sent me. You will need to change these to match the sites you want to scrape. The code you need to change is marked in blue.

After you have logged on, the macro will navigate to each site in the URL list. The table data will scraped and saved to "Sheet1" with one table under the previous one. This worked with files you sent me. Hopefully this work for you as well.

Rich (BB code):
' Thread:   http://www.mrexcel.com/forum/excel-questions/863282-auto-update-data-same-table-url.html
' Poster:   flagen (James Durst) www.mrexcel.com

Global HTMLdoc As Object
Global PageSrc As String

Function GetElemText(ByRef Elem As Object, ByRef ElemText As String, Optional ByVal Suffix As String) As String

  ' Written:  March 18, 2015
  ' Updated:  June 21, 2015
  ' Author:   Leith Ross
  
  ' This is a recursive procedure to extract text from between
  ' an element's start tag and end tag and everything in between.
        
    If Elem Is Nothing Then Exit Function
    
    ElemText = ElemText & Suffix
    
      ' Is this element a text value?
        If Elem.NodeType = 3 Then
          ' Separate text elements with a Pipe character.
            ElemText = ElemText & Elem.NodeValue & "|"
        Else
          ' Keep parsing - Element contains other non text elements.
            For Each Elem In Elem.ChildNodes
              ' Add new line characters to certain tags.
                Select Case UCase(Elem.NodeName)
                    Case Is = "P":  Suffix = "~"
                    Case Is = "BR": Suffix = "~"
                    Case Is = "TR": Suffix = "~"
                    Case Else: Suffix = ""
                End Select
                
                Call GetElemText(Elem, ElemText, Suffix)
            Next Elem
        End If
            
Finished:
    GetElemText = ElemText
    Suffix = ""
    
End Function


Sub ScrapeTables()

    Dim IEapp   As Object
    Dim oDiv    As Object
    Dim Path    As String
    Dim n       As Variant
    Dim oShell  As Object
    Dim r       As Long
    Dim ret     As Long
    Dim running As Boolean
    Dim Text    As String
    Dim URL     As Variant
    Dim URLs    As Variant
    Dim Wks     As Worksheet
       
      ' Select the output worksheet.
        Set Wks = ThisWorkbook.Worksheets("Sheet1")
    
      ' Clear the output worksheet.
        Wks.UsedRange.Clear
        
      ' Web site URL list for web pages that have tables to be scraped.
        ReDim URLs(1 To 3)
        
      ' Assign the URLs to each array element.
        URLs(1) = "file:///C:/Users/Owner/Documents/MrExcel%20Forum/flagen%20(James%20Durst)/source_code_midwest.htm"
        URLs(2) = "file:///C:/Users/Owner/Documents/MrExcel%20Forum/flagen%20(James%20Durst)/source_code_plains.htm"
        URLs(3) = "file:///C:/Users/Owner/Documents/MrExcel%20Forum/flagen%20(James%20Durst)/source_code_western.htm"
        
        Set oShell = CreateObject("Shell.Application")
        
      ' Get the running Internet Explorer application object.
        For n = 0 To oShell.Windows.Count - 1
            If Not oShell.Windows(n) Is Nothing Then
                If oShell.Windows(n).Name = "Internet Explorer" Then
                    running = True
                    Set IEapp = oShell.Windows(n)
                    Exit For
                End If
            End If
        Next n

      ' Is there a running instance of Internet Explorer?
        If Not running Then
            MsgBox "Internet Explorer is Not Running"
            Exit Sub
        End If
        
      ' Scrape table data from each URL in the list.
        For Each URL In URLs
            
            Application.Cursor = xlWait
            
            IEapp.Navigate "about:blank"
            While IEapp.Busy: DoEvents: Wend
            
            IEapp.Navigate URL
            Set HTMLdoc = IEapp.Document
            
          ' Wait until the table becomes available.
            On Error Resume Next
                Do
                    DoEvents
                    Set oDiv = HTMLdoc.getElementById("ctl00_ContentPlaceHolder_divDensity")
                    If Not oDiv Is Nothing Then Exit Do
                Loop
            On Error GoTo 0
            
            Application.Cursor = xlDefault
            
          ' Get the table data.
            Text = ""
            Text = GetElemText(oDiv, Text)
    
          ' Remove HTML new line characters.
            Text = Replace(Text, vbLf, "")
        
          ' Change HTML Non Breaking Space character to a space.
            Text = Replace(Text, Chr(160), " ")
            
            Lines = Split(Text, "~")
            
            For n = 4 To UBound(Lines)
                Text = Lines(n)
                Data = Split(Text, "|")
                
              ' Format the table output.
                With Wks.Range("A2")
                    Select Case n
                        Case 4: .Offset(r, 0).Value = Data(2)
                        Case 5: .Offset(r, 3).Value = Data(3): .Offset(r, 11).Value = Data(5)
                        Case Is > 5: .Offset(r, 0).Resize(1, UBound(Data)).Value = Data
                    End Select
                End With
                
                r = r + 1
            Next n
        
            Set HTMLdoc = Nothing
            
        Next URL
        
End Sub
 
Upvote 0
I tried it out and I ran into an error.

Run-time error '-2147024726 (800700aa)':
Method 'Navigate' of object 'IWebBrowser2' failed


Here is the line debugging highlights:

IEapp.Navigate "about:blank"

Here is what I changed that blue portion of the code to:

' Web site URL list for web pages that have tables to be scraped.

ReDim URLs(1 To 5)

' Assign the URLs to each array element.
URLs(1) = "https://truckstop.com/Decision_Tools/MainReports.aspx?DecisionTool=LoadDensities#WESTERN"
URLs(2) = "https://truckstop.com/Decision_Tools/MainReports.aspx?DecisionTool=LoadDensities#PLAINS"
URLs(3) = "https://truckstop.com/Decision_Tools/MainReports.aspx?DecisionTool=LoadDensities#MIDWEST"
URLs(4) = "https://truckstop.com/Decision_Tools/MainReports.aspx?DecisionTool=LoadDensities#SOUTHERN"
URLs(5) = "https://truckstop.com/Decision_Tools/MainReports.aspx?DecisionTool=LoadDensities#NORTHEAST"

I had the site opened in Internet Explorer and logged in. Do I need to have all five URLs open?
 
Upvote 0
I also tried having each set of data open in 5 tabs in IE. This produced a comparable result to what my web queries were doing. All 5 tables the web scrape output were the Northeast table (it was the last page I opened). No error in this case.
 
Upvote 0
After several attempts the error isn't occurring anymore (not sure what was different). But it continues to only output one of the data sets all 5 times, whichever one is the last data set I clicked on. The URLs don't seem to differentiate from each other. Is there a way to activate the buttons for each region, then scrape the data table each time? Or is there a simple mistake I am making?
 
Last edited:
Upvote 0
Hello flagen,

Your changes to the macro code look fine.

The problem may be related to the version of Internet Explorer you are using. I have Internet Explorer 11. I experienced a similar problem with the output order. For me, it would repeat the last entry. Initially, I tried clearing the cache and deleting the history but that didn't work. I solved the problem by navigating to "about:blank" and then to the URL. This removed all previous entries in the address bar. Perhaps, your version requires a different approach.

The macro opens each of the URLs in a running instance of Internet Explorer using the active page. It is interesting it worked will all 5 URLs open at the same time.
 
Upvote 0
I am using Internet Explorer 11 as well. I downloaded your original code again and re-input my 5 URLs. Now I run into the same error again. Doesn't that error mean the code is trying to do something while "about:blank" is being loaded? Or something along those lines? I realize "While IEapp.Busy: DoEvents: Wend" is intended to prevent that, but that's what I interpreted the error to mean.
 
Upvote 0

Forum statistics

Threads
1,213,565
Messages
6,114,338
Members
448,569
Latest member
Honeymonster123

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top