Read HTML Source Code with VBA

KevinJ

New Member
Joined
Jun 13, 2011
Messages
9
Using VBA, I am trying to retrieve the contents of the Source of a web page (the same as would appear if you right-clicked on the page and chose "View Source") into a variable so I can work on it in VBA (using InStr, etc.).

The problem is I can use code such as
strHTMLText = ie.Document.body.innerText
or
strHTMLText = ie.Document.body.outerText
to retrieve the code, but in either case only part, not all, of the source code is captured. I need ALL the code. Is there some kind of code such as ie.Document.body.allText or similar that would perform this function?

Much obliged!
 

Excel Facts

Will the fill handle fill 1, 2, 3?
Yes! Type 1 in a cell. Hold down Ctrl while you drag the fill handle.
Well it works for me, it's hard to tell why the code you have isn't working without seeing it.

Anyway, ZVI seems to have provided an alternative solution.:)

PS Got to come clean I can stand going through all that HTML, especially when you might not need to.
 
Upvote 0
PS Got to come clean I can stand going through all that HTML, especially when you might not need to.
Norie, I like your most questions and am real fan of your comments! :)
Seems that using of MSXML2.XMLHTTP for parsing purposes is easier & faster than with IE.
 
Upvote 0
Faster getting the source yes but parsing, not so sure.:)
 
Upvote 0
...but parsing, not so sure.:)
For parsing MSXML2 supports DOMDocument:
oReq.Open "GET", strURL, False
oReq.send
oDOM.loadXML oReq.ResponseText
...

where:
oReq is MSXML2.XMLHTTP object
oDOM is MSXML2.DOMDocument30 object
 
Upvote 0
Vladimir

Well there something I didn't know.

I don't think I've ever seen it used though.

Usually it seems that when talking about using this method it's al about Instr, RegExp etc.

I'll need to look into this of getting the DOM further.

Does it allow you to interact with the page?
 
Upvote 0
Norie,

MSXML is an XML parser, but most of the documents on the web are HTML based, not well-formed XML.
MSXML2 objects can be used only for parsing of XML sites or XML Data Island embedded into an HTML Page. That is why InStr, RegExp etc are still so popular.

A few links:
MSDN: Working with XML Document Parts
MSDN: XML Data Islands
MSDN: Using MSXML with HTML

The simple XML parsing example:
Rich (BB code):

Sub Test1_MSXML2()
  ' Early binding, set Tools - Reference - Microsoft XML,v6.0
  ' Modified example from http://msdn.microsoft.com/en-us/library/ms754585(v=vs.85).aspx
  Dim oDOM As New MSXML2.DOMDocument30
  Dim oNodeList As IXMLDOMNodeList
  Dim myErr
  oDOM.async = False
  oDOM.loadXML ("< customer >" & _
                "< first_name > Joe < / first_name > " & _
                " < last_name > Smith < / last_name > < / customer >")
  If oDOM.parseError.errorCode <> 0 Then
    Set myErr = oDOM.parseError
    MsgBox "You have error " & myErr.reason
  Else
    ' Show XML
    MsgBox oDOM.XML, , "oDOM, XML"
    ' Parse Node < customer >
    Set oNodeList = oDOM.getElementsByTagName("customer")
    ' Show the parsing result
    If oNodeList.Length = 0 Then
      MsgBox "Customer not found!"
    Else
      MsgBox oNodeList.Item(0).Text, , "oNodeList, <customer>"
    End If
  End If
End Sub
P.S. As to this Board limitation the XML-markups are auto parses.
So, please delete all space chars from XML string in example (oDOM.loadXML line),
or just copy-paste it from the link http://msdn.microsoft.com/en-us/library/ms754585(v=vs.85).aspx
 
Last edited:
Upvote 0
Vladimir

So are you saying you can't interact dynamically?
 
Upvote 0
Vladimir (or Norie),

I'm going to have to expose how shallow my VBA skills are, but I can't get beyond a point in the code.

Here's your excellent code reproduced

Code:
[COLOR=#00008b]Sub[/COLOR] Test()
  [COLOR=darkblue]Const[/COLOR] URL$ = "http://online.recoveryversion.org/bibleverses.asp?fvid=2901&lvid=2901"
  [COLOR=darkblue]Const[/COLOR] MASK$ = "href=FootNotes.asp?FNtsID="
  [COLOR=darkblue]Dim[/COLOR] txt [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]String[/COLOR], i [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
  [COLOR=darkblue]With[/COLOR] CreateObject("MSXML2.XMLHTTP")
    .Open "GET", URL, [COLOR=darkblue]False[/COLOR]
    .Send
    txt = .ResponseText
  [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]With[/COLOR]
  [COLOR=darkblue]Do[/COLOR]
    i = InStr(i + 1, txt, MASK)
    [COLOR=darkblue]If[/COLOR] i = 0 [COLOR=darkblue]Then[/COLOR] [COLOR=darkblue]Exit[/COLOR] [COLOR=darkblue]Do[/COLOR]
    Debug.Print Val(Mid$(txt, i + Len(MASK), 15))
  [COLOR=darkblue]Loop[/COLOR]
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Sub[/COLOR]

Where Const URL$ is declared, I would like to put in the string variable my code comes up with when it has found that page. Of course, I get a "Constant Expression Required" when I run the code. So, how can I run this code on the variable string???

Sorry to be a bother due to my ignorance. Thanks.
 
Upvote 0
What variable?

As far as I know you haven't posted any code so far.

So we don't know anything about variables.:)
 
Upvote 0

Forum statistics

Threads
1,214,626
Messages
6,120,602
Members
448,974
Latest member
ChristineC

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top