Web Scraping with Excel VBA

ToyoMike

New Member
Joined
Jul 5, 2020
Messages
9
Office Version
  1. 2016
Platform
  1. Windows
Hello,

I have some script below in which I am trying to extract the information listed in the red box into an Excel file using VBA. Ideally, I would really love to get everything under the <div class="kioskDockDoor blueBg" line but I'll take any help I can get. I've tried multiple variations of code with no luck. I should mention that this is my first time trying to "scrape" a website. I am also using Chrome if that helps. Below is the only code that I could find that doesn't give any errors but doesn't give a result either.

Sub WebData()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim source As Object

With http
.Open "GET", "https://whelo.4sightsolution.net/dockKiosk/dockKioskList?dockGroupId=84", False
.send
html.body.innerHTML = .responseText
End With
For Each source In html.getElementsByClassName(" col-md-6 col-xs-6 rights ")
x = x + 1: Cells(x, 1) = source.getAttribute("kioskLabel")
Cells(x, 2) = source.getAttribute("trailerId")
Next source
End Sub


HTML Class.PNG

Thank you for any help or direction you can provide,

Mike
 
Hi Mike,
the page you're loading has an issue: after the page is finished loading, the javascript will load the info in the boxes you're after in the background. As in: after the page states "i am ready", they data is delayed by a bit. That causes your weird results. So the easiest solution is to add a couple of seconds waiting time. Furthermore, if you have to repeat code (as you did in your example), there is probably a loop you can use. Finally, I added a auto-login (yes, VBA can do that too) at the start of your code. Finally: a bit more fancy/advanced would be to check the page source and find that e.g. https://whelo.4sightsolution.net/dockKiosk/dockList?dockGroupId=76 gives you the data you're looking for in raw JSON format. So you could simply log in and from there not load all pages, but go for that JSON directly. That's especially practical if you have lots of pages you want to scrape/visit.
I hope the code below is understandable & self-explanatory.
Cheers,
Koen
P.S. Do not forget to change the password if it works...

VBA Code:
Sub Scrape_YMS()

UsrNm = "020885"
Pwd = "Ilovealisha@22"
ListDGI = Array(75, 76, 84, 85, 108)

'Start browser & login
Dim objIE As InternetExplorer
Dim itemEle As Object
Dim d As Object

    Set objIE = New InternetExplorer
    objIE.Visible = True
 
    objIE.navigate "https://whelo.4sightsolution.net/dockKiosk/dockKioskList?dockGroupId=76"
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
    'Login if needed
    If InStr(objIE.LocationURL, "login.jsp") Then
        Set oLogin = objIE.document.getElementsByName("username")(0)
        Set oPassword = objIE.document.getElementsByName("password")(0)
        oLogin.Value = UsrNm
        oPassword.Value = Pwd
        objIE.document.forms(0).submit
        Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
    End If
    On Error GoTo err_clear

'For the results, set the start row and Set the sheet so you don't have to repeat the name
ResRw = 2
Set Sht = Sheets("Sheet1")
Sht.Cells.ClearContents

For dgi = 1 To UBound(ListDGI)

    Sht.Cells(ResRw, 2).Value = ListDGI(dgi)
    Sht.Cells(ResRw + 1, 2).Value = ListDGI(dgi)
    
    objIE.navigate "https://whelo.4sightsolution.net/dockKiosk/dockKioskList?dockGroupId=" & ListDGI(dgi)
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
    
    'Wait some more for JSON to load properly
    Application.Wait (Now + TimeValue("0:00:03"))
    
    For Each itemEle In objIE.document.getElementsByClassname("kioskDockDoor")
         'Every kioskDockDoor goes on a new line, so reset the column to 2
        ResCol = 3
        
        'Debug.Print itemEle.outerHTML
        
        For Each d In itemEle.getElementsByClassname("kioskLabel")
            'Debug.Print d.outerHTML
             'First write the kioskLabel, after that the next Sibling (which should be the trailerId)
             Sht.Cells(ResRw, ResCol).Value = d.innerHTML
             Sht.Cells(ResRw + 1, ResCol).Value = d.nextElementSibling.innerHTML
             'For the next result, add 1 to the column
             ResCol = ResCol + 1
        Next d
        'For the next result, add 2 to the row
        ResRw = ResRw + 2
    Next
    
Next dgi
    
err_clear:
  If Err <> 0 Then
    Err.Clear
  Resume Next
End If

objIE.Quit

End Sub
 
Upvote 0

Excel Facts

Why does 9 mean SUM in SUBTOTAL?
It is because Sum is the 9th alphabetically in Average, Count, CountA, Max, Min, Product, StDev.S, StDev.P, Sum, VAR.S, VAR.P.
Good morning Koen,
I truly apologize for the delay in response. I've gotten caught up in other projects since July. I can't thank you enough for your help on this project. After a few small modifications and lots of formulas and conditional formatting, I have a really well working excel project. I appreciate all of the time that you spent helping me. Do you have a patronage page that I can donate to?
Thank you again,
Mike
 
Upvote 0

Forum statistics

Threads
1,214,649
Messages
6,120,732
Members
448,987
Latest member
marion_davis

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top