Retrieve id-, tag-, and class names within HTML code

strooman

Active Member
Joined
Oct 29, 2013
Messages
329
Office Version
  1. 2016
Platform
  1. Windows
I know how to retrieve information from a webpage. Something like:
Code:
With New InternetExplorer
    .navigate "http://dealshout.com/"
    
    Do
        DoEvents
    Loop Until .readyState = 4
    
    For Each objPrice In .Document.getElementsByTagName("div")
        If objPrice.className = "archiveInfoContent" Then
            strPrice = objPrice.innerText
        End If
    Next
End With

To cycle through the dom-object you can use the following methods:

getElementById()
getElementsByTagName()
getElementsByclassName()

But they all require a specified class name.

My question: What if I don't know the specified names. So how can I retrieve the id-, tag-, and class names

HTML example:
HTML:
<div id="main">
    <h3>
        <span class="fr"> DealShout.com </span>
        <span class="highlightSearchTerm">Lego</span>
    </h3>
    <div id="archiveWrap">
        <div class="archiveListing">
			<div class="archiveInfoContent"> $119.94 </div>
            <div class="archiveCompareButton">
                <div>
                    <a class="button large" href="http://dealshout.com/toys-and-games/lego-chima-flying-phoenix-fire-temple-70146/" title="LEGO Chima Flying Phoenix Fire Temple (70146)">Compare Prices</a>
                </div>
            </div>
        </div>
    </div>
</div>

I want to retrieve the following data from the above HTML and put them in a sheet like this:

 

Excel Facts

Easy bullets in Excel
If you have a numeric keypad, press Alt+7 on numeric keypad to type a bullet in Excel.
You could loop through the HTMLDocument.all collection/array. However it may not contain all the elements, in which case you would have to do a recursive scan of the HTML DOM nodes.
 
Upvote 0
You could loop through the HTMLDocument.all collection/array. However it may not contain all the elements, in which case you would have to do a recursive scan of the HTML DOM nodes.

Thanks John that got me going. Forgot completely about the fancy .all tag. Nice for pointing that out. Here's some basic code for people who are interested:

Code:
Sub Test()
Dim n As Long
Dim ohtm As HTMLDocument
Dim sHTM As String

With CreateObject("msxml2.xmlhttp")
    .Open "GET", "http://showmehtml.com/", False
    .send
    sHTM = .responseText
End With

Set ohtm = New HTMLDocument
ohtm.body.innerHTML = sHTM

For n = 0 To ohtm.all.Length - 1
    'Comment or un-comment
    Debug.Print n & "============================================="
    'Debug.Print "OuterHTML = " & ohtm.all(n).outerHTML
    'Debug.Print "InnerHTML = " & ohtm.all(n).innerHTML
    'Debug.Print "InnerText = " & ohtm.all(n).innerText
    Debug.Print "Classname = " & ohtm.all(n).className
    Debug.Print "Id = " & ohtm.all(n).ID
    Debug.Print "Tagname = " & ohtm.all(n).tagName
    Debug.Print "Nodename = " & ohtm.all(n).nodeName
Next n
Set ohtm = Nothing
End Sub

P.s., set the appropiate references in the VBE via Alt+F11 | Tools | References:
Microsoft XML, v6.0
Microsoft HTML Object Library
and perhaps
Microsoft Internet Controls
 
Upvote 0

Forum statistics

Threads
1,215,781
Messages
6,126,859
Members
449,345
Latest member
CharlieDP

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top