Web Scraping with Excel VBA

ToyoMike

New Member
Joined
Jul 5, 2020
Messages
7
Office Version
  1. 2016
Platform
  1. Windows
Hello,

I have some script below in which I am trying to extract the information listed in the red box into an Excel file using VBA. Ideally, I would really love to get everything under the <div class="kioskDockDoor blueBg" line but I'll take any help I can get. I've tried multiple variations of code with no luck. I should mention that this is my first time trying to "scrape" a website. I am also using Chrome if that helps. Below is the only code that I could find that doesn't give any errors but doesn't give a result either.

Sub WebData()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim source As Object

With http
.Open "GET", "https://whelo.4sightsolution.net/dockKiosk/dockKioskList?dockGroupId=84", False
.send
html.body.innerHTML = .responseText
End With
For Each source In html.getElementsByClassName(" col-md-6 col-xs-6 rights ")
x = x + 1: Cells(x, 1) = source.getAttribute("kioskLabel")
Cells(x, 2) = source.getAttribute("trailerId")
Next source
End Sub


HTML Class.PNG

Thank you for any help or direction you can provide,

Mike
 

Rijnsent

Well-known Member
Joined
Oct 17, 2005
Messages
1,298
Office Version
  1. 365
Platform
  1. Windows
Hi Mike,
the page you're loading has an issue: after the page is finished loading, the javascript will load the info in the boxes you're after in the background. As in: after the page states "i am ready", they data is delayed by a bit. That causes your weird results. So the easiest solution is to add a couple of seconds waiting time. Furthermore, if you have to repeat code (as you did in your example), there is probably a loop you can use. Finally, I added a auto-login (yes, VBA can do that too) at the start of your code. Finally: a bit more fancy/advanced would be to check the page source and find that e.g. https://whelo.4sightsolution.net/dockKiosk/dockList?dockGroupId=76 gives you the data you're looking for in raw JSON format. So you could simply log in and from there not load all pages, but go for that JSON directly. That's especially practical if you have lots of pages you want to scrape/visit.
I hope the code below is understandable & self-explanatory.
Cheers,
Koen
P.S. Do not forget to change the password if it works...

VBA Code:
Sub Scrape_YMS()

UsrNm = "020885"
Pwd = "Ilovealisha@22"
ListDGI = Array(75, 76, 84, 85, 108)

'Start browser & login
Dim objIE As InternetExplorer
Dim itemEle As Object
Dim d As Object

    Set objIE = New InternetExplorer
    objIE.Visible = True
 
    objIE.navigate "https://whelo.4sightsolution.net/dockKiosk/dockKioskList?dockGroupId=76"
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
    'Login if needed
    If InStr(objIE.LocationURL, "login.jsp") Then
        Set oLogin = objIE.document.getElementsByName("username")(0)
        Set oPassword = objIE.document.getElementsByName("password")(0)
        oLogin.Value = UsrNm
        oPassword.Value = Pwd
        objIE.document.forms(0).submit
        Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
    End If
    On Error GoTo err_clear

'For the results, set the start row and Set the sheet so you don't have to repeat the name
ResRw = 2
Set Sht = Sheets("Sheet1")
Sht.Cells.ClearContents

For dgi = 1 To UBound(ListDGI)

    Sht.Cells(ResRw, 2).Value = ListDGI(dgi)
    Sht.Cells(ResRw + 1, 2).Value = ListDGI(dgi)
    
    objIE.navigate "https://whelo.4sightsolution.net/dockKiosk/dockKioskList?dockGroupId=" & ListDGI(dgi)
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
    
    'Wait some more for JSON to load properly
    Application.Wait (Now + TimeValue("0:00:03"))
    
    For Each itemEle In objIE.document.getElementsByClassname("kioskDockDoor")
         'Every kioskDockDoor goes on a new line, so reset the column to 2
        ResCol = 3
        
        'Debug.Print itemEle.outerHTML
        
        For Each d In itemEle.getElementsByClassname("kioskLabel")
            'Debug.Print d.outerHTML
             'First write the kioskLabel, after that the next Sibling (which should be the trailerId)
             Sht.Cells(ResRw, ResCol).Value = d.innerHTML
             Sht.Cells(ResRw + 1, ResCol).Value = d.nextElementSibling.innerHTML
             'For the next result, add 1 to the column
             ResCol = ResCol + 1
        Next d
        'For the next result, add 2 to the row
        ResRw = ResRw + 2
    Next
    
Next dgi
    
err_clear:
  If Err <> 0 Then
    Err.Clear
  Resume Next
End If

objIE.Quit

End Sub
 

Some videos you may like

Excel Facts

Shade all formula cells
To shade all formula cells: Home, Find & Select, Formulas to select all formulas. Then apply a light fill color.

ToyoMike

New Member
Joined
Jul 5, 2020
Messages
7
Office Version
  1. 2016
Platform
  1. Windows
Good morning Koen,
I truly apologize for the delay in response. I've gotten caught up in other projects since July. I can't thank you enough for your help on this project. After a few small modifications and lots of formulas and conditional formatting, I have a really well working excel project. I appreciate all of the time that you spent helping me. Do you have a patronage page that I can donate to?
Thank you again,
Mike
 

Watch MrExcel Video

Forum statistics

Threads
1,113,907
Messages
5,544,984
Members
410,647
Latest member
LegenDSlayeR
Top