Webpage data scraping help needed...

CrashDDL

Board Regular
Joined
Oct 17, 2016
Messages
66
Hi,

Could someone tell me why the 1st code works but the 2nd doesn't? Both should print out "2" as the result yet only the first one does.

Thanks for looking

Code:
Sub get_commonShips()

    Dim IE As New SHDocVw.InternetExplorer
    Dim HTMLDoc As MSHTML.HTMLDocument


    IE.Visible = True
    IE.navigate "https://robertsspaceindustries.com/pledge/ship-upgrades"
    
    Do While IE.readyState <> READYSTATE_COMPLETE
    Loop
    
    Set HTMLDoc = IE.document
    '===============================================================================
    Dim Buttons As MSHTML.IHTMLElementCollection                                    ' find & click "Choose a ship"
    '---------------------------------------------------
    Set Buttons = HTMLDoc.getElementsByClassName("choose-ship js-choose-ship")
    Debug.Print Buttons.Length
    '===============================================================================
    
End Sub
Code:
Sub get_commonShips_XML()


    Dim XMLPage As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    
    XMLPage.Open "GET", "https://robertsspaceindustries.com/pledge/ship-upgrades", False   ' {False: replaces wait loop}
    XMLPage.send
    
    HTMLDoc.body.innerHTML = XMLPage.responseText
    '===============================================================================
    Dim Buttons As MSHTML.IHTMLElementCollection                                    ' find & click "Choose a ship"
    '---------------------------------------------------
    Set Buttons = HTMLDoc.getElementsByClassName("choose-ship js-choose-ship")
    Debug.Print Buttons.Length
    '===============================================================================
    
End Sub
 

Some videos you may like

Excel Facts

How can you automate Excel?
Press Alt+F11 from Windows Excel to open the Visual Basic for Applications (VBA) editor.

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
6,222
Because the initial page (ship-upgrades.html) includes JavaScript code which a browser (your IE.navigate) automatically loads and runs as required to request other parts of the page. Your XMLHttp request only loads ship-upgrades.html.

You can see the extra requests by looking at the Network tab in IE's Developer Tools (press the F12 key).
 

CrashDDL

Board Regular
Joined
Oct 17, 2016
Messages
66
Because the initial page (ship-upgrades.html) includes JavaScript code which a browser (your IE.navigate) automatically loads and runs as required to request other parts of the page. Your XMLHttp request only loads ship-upgrades.html.

You can see the extra requests by looking at the Network tab in IE's Developer Tools (press the F12 key).
Is there a way to make it work through XML or first option is the only one?
 

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
6,222
In theory it is possible to send multiple XML requests to emulate the browser, but this requires a lot of investigation and isn't really worth the effort. Although slower, it is far easier to use the first method and automate IE.
 

CrashDDL

Board Regular
Joined
Oct 17, 2016
Messages
66
In theory it is possible to send multiple XML requests to emulate the browser, but this requires a lot of investigation and isn't really worth the effort. Although slower, it is far easier to use the first method and automate IE.
That's what I thought. Thanks for the reply :)
 

Watch MrExcel Video

Forum statistics

Threads
1,096,288
Messages
5,449,488
Members
405,566
Latest member
JeIIyfish

This Week's Hot Topics

Top