Excel VBA Macro to Scrape Webpages

jalexmiller0

New Member
Joined
Nov 10, 2017
Messages
8
Hi there, I am a complete beginner working on this macro. The goal is to have it input pickup numbers from an excel spreadsheet, put them on the carrier's website form, submit, then pull the results into excel. Which is essentially what I have managed to figure out so far. Now my roadblock is figuring out how to format it into excel so it's easy to read. (If there are more efficient ways to write what I already have please let me know.)


There is a lot of stuff commented out because I have been trying different techniques from all across the web. With varying amounts of success. I would like to get the code to put the results into the same worksheet as the pickup numbers. In the column next to the pickup number column so it can quickly be determined if the shipment has been picked up yet or not.
If that's not possible then outputting to a new sheet also works. Just so long as the information lines up with the corresponding pickup number.

Now on the carriers website after the code submits the pickup numbers. I need it to take "Shipment spotted on trailer JBHUXXXXXX" and if that isn't displayed. I need it to output "Arrived at destination service center" with the time and location as well. An example of this result is under 145-5244858-2.

Everything is stored in tables inside tables with the main table having "boxTableBorder" as the class name.

Code:
Sub TracePickupNumbers()

Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument


IE.Visible = True
IE.navigate "http://www.centraltransportint.com/confirm/trace.aspx"


Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop


Dim str As String
Dim arr() As Variant
Dim tablerow As Integer
Dim tablecol As Integer


arr = Range("A2:A7")


For tablerow = LBound(arr) To UBound(arr)
    For tablecol = LBound(arr, 2) To UBound(arr, 2)
        str = str & arr(tablerow, tablecol) & vbTab
    Next tablecol
    str = str & vbNewLine
Next tablerow


With IE.Document
    .all("_ctl0_lstType").Value = "PuN"
'    .all("_ctl0_lstType").fireevent.onchange() 'Gives syntax error
    .all("_ctl0:traceNumbers").innerText = "1"
    .all("_ctl0:traceSubmit").Click
End With


Application.Wait (Now + TimeValue("00:00:01"))


With IE.Document
    .all("_ctl0:traceNumbers").innerText = str
    .all("_ctl0:traceSubmit").Click
End With


Application.Wait (Now + TimeValue("00:00:02"))


'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dim HTMLElement As MSHTML.IHTMLElement
Dim HTMLElements As MSHTML.IHTMLElementCollection
Dim HTMLBody As MSHTML.IHTMLElement
Dim HTMLRow As MSHTML.IHTMLElement
Dim HTMLhh As MSHTML.IHTMLElement
Dim HTMLdates As MSHTML.IHTMLElement
Dim HTMLss As MSHTML.IHTMLElement
Dim Doc As HTMLDocument
Dim RowNum As Long, ColNum As Integer


Dim count As Long
Dim erow As Long


Application.Wait (Now + TimeValue("00:00:01"))


Set HTMLElements = IE.Document.getElementsByClassName("boxTableBorder")


'count = 0


For Each HTMLElement In HTMLElements


'    If HTMLElement.className = "boxTableBorder" Then
'
'        erow = ThisWorkbook.Sheets("pickups").Cells(Rows.count, 2).End(xlUp).Offset(1, 0).Row
'        Cells(erow, 2) = HTML.getElementsByClassName("hh")(count).innerText
'        Cells(erow, 3) = HTML.getElementsByClassName("dates")(count).innerText
'        Cells(erow, 4) = HTML.getElementsByClassName("ss")(count).innerText
'
'        count = count + 1
'
'    End If
    


'    ThisWorkbook.Sheets("pickups").Range("B2").Value = HTMLElement.className
'    Worksheets.Add
'    Range("A1").Value = HTMLElement.className
'
'    RowNum = 1
'
     For Each HTMLBody In HTMLElement.getElementsByTagName("tbody")
'    For Each HTMLRow In HTMLElement.getElementsByTagName("tbody")
        Debug.Print vbTab & HTMLBody.innerText
'
'        ColNum = 1
'        For Each HTMLCell In HTMLBody.Children
'        For Each HTMLCell In HTMLRow.Children
'        For Each HTMLhh In HTMLBody.getElementsByClassName("hh")
'            Debug.Print vbTab & HTMLhh.innerText
'
'        Next HTMLhh
'
'                For Each HTMLdates In HTMLBody.getElementsByClassName("dates")
'                    Debug.Print vbTab & HTMLdates.innerText
'
'                Next HTMLdates
'                        For Each HTMLss In HTMLBody.getElementsByClassName("ss")
'                            Debug.Print vbTab & HTMLss.innerText
''            Cells(RowNum, ColNum) = HTMLCell.innerText
''            ColNum = ColNum + 1
'                        Next HTMLss
        
        
        
'        RowNum = RowNum + 1
    Next HTMLBody
'    Next HTMLRow


Next HTMLElement


'Range("B2:C20").Select
'Columns("B:B").EntireColumn.AutoFit
'Columns("C:C").EntireColumn.AutoFit


IE.Quit
Set IE = Nothing


End Sub
 
So I have been working with your code and trying to apply it to different types of numbers. Specifically, PO numbers now. This one is really tricky because I can only search five at a time and they have to be entered individually.

There is also two drop down menus to select the month and year. The majority of the time the macro won't have to interact with them. However near the beginning/end of a month it might need to be switched. Can VBA look at the date and, if the date is within the first or last days of a month, open a dialog box that asks the user to enter in the month they want to search for?

I have figured out a way to look at 5 individual numbers at a time and enter them in separately. Then after it runs the rest of the macro, starts back over to look at the next 5 PO numbers.
However, I'm having issues wrapping my mind around the part of the code where it matches the PO number with the PO number cell on the results webpage. I'm not sure how to set them into a group then look at them individually. If that's even what needs to be done. I got down to the below line and now I'm stuck.
Code:
PONumber1 = infoTables(i + 2).Rows(4).Cells(3).innerText

Here is the whole code:
Code:
'Set these references via Tools -> References in VBA editor:
'Microsoft Internet Controls
'Microsoft HTML Object Library




Option Explicit


'https://msdn.microsoft.com/en-us/library/office/gg264421.aspx
'64-Bit Visual Basic for Applications Overview


[URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=If]#If[/URL]  VBA7 Then
    'New VBA version 7 compiler, therefore >= Office 2010
    'PtrSafe means function works in 32-bit and 64-bit Office
    'LongPtr type alias resolves to Long (32 bits) in 32-bit Office, or LongLong (64 bits) in 64-bit Office
    Private Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr)
    Private Declare PtrSafe Function SetForegroundWindow Lib "user32.dll" (ByVal hwnd As LongPtr) As LongPtr
    Private Declare PtrSafe Function FindWindowEx Lib "user32" Alias "FindWindowExA" (ByVal hWnd1 As LongPtr, ByVal hWnd2 As LongPtr, ByVal lpsz1 As String, ByVal lpsz2 As String) As LongPtr
[URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=Else]#Else[/URL] 
    'Old VBA version 6 or earlier compiler, therefore <= Office 2007
    Private Declare Sub Sleep Lib "kernel32" (ByVal milliseconds As Long)
    Private Declare Function SetForegroundWindow Lib "user32.dll" (ByVal hwnd As Long) As Long
    Private Declare Function FindWindowEx Lib "user32" Alias "FindWindowExA" (ByVal hWnd1 As Long, ByVal hWnd2 As Long, ByVal lpsz1 As String, ByVal lpsz2 As String) As Long
[URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=End]#End[/URL]  If




Public Sub IE_Trace_PO_Numbers()
    
    Dim IE As InternetExplorer
    Dim HTMLdoc As HTMLDocument
    Dim URL As String
    Dim PONumber1 As Range
    Dim PONumber2 As Range
    Dim PONumber3 As Range
    Dim PONumber4 As Range
    Dim PONumber5 As Range
    Dim traceSelect As HTMLSelectElement
    Dim traceTextArea As HTMLTextAreaElement
    Dim traceButton As HTMLInputButtonElement
    Dim traceNumbers As String
    Dim resultsDiv As HTMLDivElement
    Dim resultsTable As HTMLTable, infoTable As HTMLTable
    Dim infoTables As IHTMLElementCollection
    Dim i As Long, x As Long
    Dim result As String
    Dim puTime As String
    Dim puLocation As String
    Dim PuNumber As String, PoNumberRow As Variant
    


    
    'Get PO numbers from column A in 1st sheet starting at A2
    
    
    For x = 2 To 10000
    
    With Worksheets(1)
        If Trim(Sheet1.Cells(x, 1)) = "" Then
            Exit For
        End If
        
        Set PONumber1 = .Cells(x, 1)  '.Cells(Rows.Count, "A").End(xlUp))
        x = x + 1
        Set PONumber2 = .Cells(x, 1)
        x = x + 1
        Set PONumber3 = .Cells(x, 1)
        x = x + 1
        Set PONumber4 = .Cells(x, 1)
        x = x + 1
        Set PONumber5 = .Cells(x, 1)
        x = x + 1
    End With
    
    
    Debug.Print PONumber1, PONumber2, PONumber3, PONumber4, PONumber5
'    result = infoTable.Rows(0).Cells(1).innerText
'    puTime = infoTable.Rows(1).Cells(1).innerText
'    puLocation = infoTable.Rows(2).Cells(1).innerText
    
    Worksheets(1).Range("B2:k1000").ClearContents
'    traceNumbers = Join(Application.Transpose(PONumber1, PONumber2, PONumber3, PONumber4, PONumber5), vbNewLine)
    
    URL = "http://www.centraltransportint.com/confirm/trace.aspx"


    'Get existing IE window open at page, if any
    
    Set IE = Get_IE_Window2(URL)
    If IE Is Nothing Then Set IE = New SHDocVw.InternetExplorer
    
    With IE
        .Visible = True
        SetForegroundWindow .hwnd
        .navigate URL
        While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
        Set HTMLdoc = .Document
    End With
    
    'Hide IE window until page has updated, otherwise VBA incorrectly references a 2nd IE window (hidden) with no results
    
    IE.Visible = True
    
    With HTMLdoc
        .all("_ctl0_lstType").Value = "PO"
        'Put dummy text in input box and click Trace to trigger page update
        .all("_ctl0:traceNumbers").innerText = "1"
        .all("_ctl0:traceSubmit").Click
        While .ReadyState <> "complete": DoEvents: Wend
        
        'Put trace numbers in input box and click Trace again
        .all("_ctl0:BOLnumber1").innerText = PONumber1
        .all("_ctl0:BOLnumber2").innerText = PONumber2
        .all("_ctl0:BOLnumber3").innerText = PONumber3
        .all("_ctl0:BOLnumber4").innerText = PONumber4
        .all("_ctl0:BOLnumber5").innerText = PONumber5
        .all("_ctl0:traceSubmit").Click
        While .ReadyState <> "complete": DoEvents: Wend
    End With
    
    'Show IE window again
    
    IE.Visible = True
    
    'Extract results
    
    Set resultsDiv = HTMLdoc.getElementById("_ctl0_pnlResultSet")
    Set resultsTable = resultsDiv.getElementsByTagName("TABLE")(0) 'main results table inside div
    Set infoTables = resultsTable.getElementsByTagName("TABLE")    'child tables
    
    i = 0
    While i < infoTables.Length - 1
        
        Set infoTable = infoTables(i)
        
        
'~~~~~~~~~~~This is where i got stuck~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        PONumber1 = infoTables(i + 2).Rows(4).Cells(3).innerText
        Debug.Print PONumber1
        
        ' Just trying to grab the 3 columns of information.
'        If InStr(1, infoTable.Rows(0).Cells(0).innerText, "This shipment requires", vbTextCompare) > 0 Then
'
'            result = infoTable.Rows(0).Cells(0).innerText
' '           Debug.Print result
'
'        ElseIf InStr(1, infoTable.Rows(0).Cells(1).innerText, "Shipment was Spotted", vbTextCompare) > 0 Then
'
'            result = infoTable.Rows(0).Cells(1).innerText
        
        'Else
            
            Set infoTable = infoTables(i + 1)
            result = infoTable.Rows(0).Cells(1).innerText
            puTime = infoTable.Rows(1).Cells(1).innerText
            puLocation = infoTable.Rows(2).Cells(1).innerText
            
        End If
    
        'Find this PO number in Excel cells and if found put result in adjacent cell
        
        PoNumberRow = Application.Match(PuNumber, PONumber1, 0)
        If Not IsError(PoNumberRow) Then
            PONumber1(PoNumberRow, 2).Value = result
            PONumber1(PoNumberRow, 3).Value = puTime
            PONumber1(PoNumberRow, 4).Value = puLocation
        Else
            MsgBox "PO number " & PuNumber & " in results not found in cells " & PONumber1.Address
        End If
        
        'Increment table index for next shipment, depending on whether this shipment has a 'heading' table in bold, e.g. 145-5545080-9
        
        If Trim(infoTables(i).innerText <> "") Then
            i = i + 5   'heading table
        Else
            i = i + 4   'no heading table
        End If
        
    Wend
    
        'Find this PO number in Excel cells and if found put result in adjacent cell
        
        PoNumberRow = Application.Match(PuNumber, PONumber1, 0)
        If Not IsError(PoNumberRow) Then
            PONumber1(PoNumberRow, 2).Value = result
            PONumber1(PoNumberRow, 3).Value = puTime
            PONumber1(PoNumberRow, 4).Value = puLocation
        Else
            MsgBox "PO number " & PuNumber & " in results not found in cells " & PONumber1.Address
        End If
        
    Next x
    'Close IE window if necessary
    
    'IE.Quit
    Set IE = Nothing
    
    Columns("B:B").EntireColumn.AutoFit
    Columns("C:C").EntireColumn.AutoFit
    Columns("D:D").EntireColumn.AutoFit
    
    SetForegroundWindow Application.hwnd
    MsgBox "Done"
    
End Sub




Private Function Get_IE_Window2(URLorName As String) As InternetExplorer


    'Look for an IE browser window or tab already open at the (partial) URL or location name and, if found, return
    'that browser as an InternetExplorer object.  Otherwise return Nothing


    Dim Shell As Object
    Dim IE As InternetExplorer
    Dim i As Variant 'Must be a Variant to index Shell.Windows.Item() array
    
    Set Shell = CreateObject("Shell.Application")
    
    i = 0
    Set Get_IE_Window2 = Nothing
    While i < Shell.Windows.Count And Get_IE_Window2 Is Nothing
        Set IE = Shell.Windows.Item(i)
        If Not IE Is Nothing Then
            'Debug.Print IE.LocationURL, IE.LocationName
            If TypeOf IE Is InternetExplorer And InStr(IE.LocationURL, "file://") <> 1 Then
                'Debug.Print i; IE.LocationName, IE.LocationURL
                If InStr(1, IE.LocationURL, URLorName, vbTextCompare) > 0 Or InStr(1, IE.LocationName, URLorName, vbTextCompare) > 0 Then
                    If Not IE.Busy Then Set Get_IE_Window2 = IE
                End If
            End If
        End If
        i = i + 1
    Wend
    
End Function
 
Upvote 0

Excel Facts

Bring active cell back into view
Start at A1 and select to A9999 while writing a formula, you can't see A1 anymore. Press Ctrl+Backspace to bring active cell into view.

Forum statistics

Threads
1,214,915
Messages
6,122,217
Members
449,074
Latest member
cancansova

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top