Auto-Update Data from Same Table of a URL

flagen

New Member
Joined
Jun 22, 2015
Messages
16
I'm new to this forum. I have some experience with VBA but none with HTML or Javascript. I am trying to get excel to reference (and regularly refresh once a day) 5 different data tables from a single url of a website that changes the data depending on 5 different regions of a U.S. map. I tried a web query with 5 different connections, but once it refreshes, all 5 tables in excel reference the last set of data used. It won't let me select all 5 tables at once with a single connection. I have no idea how to web scrape the tables I need as I don't understand the html/java language the website uses in its source code. Any direction would help. I am not sure what part of the source code you would need to solve this so I thought I would wait on posting that. The site does use a username and password.
 
Additionally, I realized the error was happening when I signed out and opened a new IE window to sign in to. If I click on one of the regions of the map and 'open link in new tab' then the error doesn't occur anymore. But the duplicating data of the most recently opened data set happens then. Is there a clear reason why this would be happening?

I apologize for having so many questions.
 
Upvote 0

Excel Facts

What is the last column in Excel?
Excel columns run from A to Z, AA to AZ, AAA to XFD. The last column is XFD.
Hello flagen,

It is most likely a synchronization problem. An HTML document is rendered linearly. Each element or resource must be loaded before the next one. In the mean time, VBA will keep executing.

Browsers do not and are not required by the HTML standard to inform the client (user) the operations have fully finished. This is this the biggest PIA when writing code to automate a browser using VBA. There is simply no good way to sync the two codes.

Maybe the better approach would be to have all 5 pages open at the same time. The macro then can be called after everything has finally loaded and scrape those pages. It would require the macro be tweaked a bit to make it work. What do you think about this approach?
 
Upvote 0
Hmm, that is a problem. I see how that could work, but I'm still not sure that it will reference each data set individually. What would prevent it from referencing the most recent page's data?
 
Upvote 0
Hello flagen,

Each data set can be identified in Internet Explorer by it's URL.
 
Upvote 0
So the #WESTERN, #PLAINS, etc. do differentiate them? I guess I'm confused why it's not working with the current code (ignoring the error message that happens).
 
Upvote 0
Hi Leith,
I wanted to the thank you for all of your help, the code is incredibly helpful. With more understanding I may be able to figure out why the error occurs. I am currently running the macro 5 times and open each data set manually by clicking on each region once at a time. It doesn't make sense why the error is occurring when it tries to navigate "about:blank" though. Or why opening a specific tab of one of the regions results in none of the individual URLs grabbing the correct data. It is frustrating. But again, thank you very much for the help! If you or anyone else has an idea of a solution to these issues, please post it.
 
Upvote 0
Hello flagen,

You're welcome.

These problems with private networks are difficult to troubleshoot without being there. There are so many variables that can affect the code it is almost impossible to narrow down the cause just by Q&A.

Best of luck to you in tracking it down. If I can help you further, let me know.
 
Upvote 0
I ran into a new error, it appears to have passed where the original error occurred, which is possibly good? Now it appears at the highlighted portion of the code. Error message says "Compile error: Ambiguous name detected: ~". Did I accidentally change something simple in the code? I'm not sure why htmldoc is suddenly issues.

Rich (BB code):
Rich (BB code):
' Thread:   http://www.mrexcel.com/forum/excel-questions/863282-auto-update-data-same-table-url.html
' Poster:   flagen (James Durst) www.mrexcel.com


Global HTMLdoc As Object
Global PageSrc As String


Function GetElemText(ByRef Elem As Object, ByRef ElemText As String, Optional ByVal Suffix As String) As String


  ' Written:  March 18, 2015
  ' Updated:  June 21, 2015
  ' Author:   Leith Ross
  
  ' This is a recursive procedure to extract text from between
  ' an element's start tag and end tag and everything in between.
        
    If Elem Is Nothing Then Exit Function
    
    ElemText = ElemText & Suffix
    
      ' Is this element a text value?
        If Elem.NodeType = 3 Then
          ' Separate text elements with a Pipe character.
            ElemText = ElemText & Elem.NodeValue & "|"
        Else
          ' Keep parsing - Element contains other non text elements.
            For Each Elem In Elem.ChildNodes
              ' Add new line characters to certain tags.
                Select Case UCase(Elem.NodeName)
                    Case Is = "P":  Suffix = "~"
                    Case Is = "BR": Suffix = "~"
                    Case Is = "TR": Suffix = "~"
                    Case Else: Suffix = ""
                End Select
                
                Call GetElemText(Elem, ElemText, Suffix)
            Next Elem
        End If
            
Finished:
    GetElemText = ElemText
    Suffix = ""
    
End Function




Sub ScrapeTables()


    Dim IEapp   As Object
    Dim oDiv    As Object
    Dim Path    As String
    Dim n       As Variant
    Dim oShell  As Object
    Dim r       As Long
    Dim ret     As Long
    Dim running As Boolean
    Dim Text    As String
    Dim URL     As Variant
    Dim URLs    As Variant
    Dim Wks     As Worksheet
       
      ' Select the output worksheet.
        Set Wks = ThisWorkbook.Worksheets("Sheet1")
    
      ' Clear the output worksheet.
        Wks.UsedRange.Clear
        
      ' Web site URL list for web pages that have tables to be scraped.
        ReDim URLs(1 To 5)
        
      ' Assign the URLs to each array element.
        URLs(1) = "view-source:https://truckstop.com/Decision_Tools/MainReports.aspx?DecisionTool=LoadDensities#NORTHEAST"
        URLs(2) = "view-source:https://truckstop.com/Decision_Tools/MainReports.aspx?DecisionTool=LoadDensities#SOUTHERN"
        URLs(3) = "view-source:https://truckstop.com/Decision_Tools/MainReports.aspx?DecisionTool=LoadDensities#MIDWEST"
        URLs(4) = "view-source:https://truckstop.com/Decision_Tools/MainReports.aspx?DecisionTool=LoadDensities#PLAINS"
        URLs(5) = "view-source:https://truckstop.com/Decision_Tools/MainReports.aspx?DecisionTool=LoadDensities#WESTERN"
    
        Set oShell = CreateObject("Shell.Application")
        
      ' Get the running Internet Explorer application object.
        For n = 0 To oShell.Windows.Count - 1
            If Not oShell.Windows(n) Is Nothing Then
                If oShell.Windows(n).Name = "Internet Explorer" Then
                    running = True
                    Set IEapp = oShell.Windows(n)
                    Exit For
                End If
            End If
        Next n


      ' Is there a running instance of Internet Explorer?
        If Not running Then
            MsgBox "Internet Explorer is Not Running"
            Exit Sub
        End If
        
      ' Scrape table data from each URL in the list.
        For Each URL In URLs
            
            Application.Cursor = xlWait
            
            IEapp.Navigate "about:blank"
            While IEapp.Busy: DoEvents: Wend
            
            IEapp.Navigate URL
            Set HTMLdoc = IEapp.Document
            
          ' Wait until the table becomes available.
            On Error Resume Next
                Do
                    DoEvents
                    Set oDiv = HTMLdoc.getElementById("ctl00_ContentPlaceHolder_divDensity")
                    If Not oDiv Is Nothing Then Exit Do
                Loop
            On Error GoTo 0
            
            Application.Cursor = xlDefault
            
          ' Get the table data.
            Text = ""
            Text = GetElemText(oDiv, Text)
    
          ' Remove HTML new line characters.
            Text = Replace(Text, vbLf, "")
        
          ' Change HTML Non Breaking Space character to a space.
            Text = Replace(Text, Chr(160), " ")
            
            Lines = Split(Text, "~")
            
            For n = 4 To UBound(Lines)
                Text = Lines(n)
                data = Split(Text, "|")
                
              ' Format the table output.
                With Wks.Range("A2")
                    Select Case n
                        Case 4: .Offset(r, 0).Value = data(2)
                        Case 5: .Offset(r, 3).Value = data(3): .Offset(r, 11).Value = data(5)
                        Case Is > 5: .Offset(r, 0).Resize(1, UBound(data)).Value = data
                    End Select
                End With
                
                r = r + 1
            Next n
        
            Set HTMLdoc = Nothing
            
        Next URL
        
End Sub
 
Upvote 0

Forum statistics

Threads
1,215,917
Messages
6,127,703
Members
449,399
Latest member
VEVE4014

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top