Read HTML Source Code with VBA

KevinJ

New Member
Joined
Jun 13, 2011
Messages
9
Using VBA, I am trying to retrieve the contents of the Source of a web page (the same as would appear if you right-clicked on the page and chose "View Source") into a variable so I can work on it in VBA (using InStr, etc.).

The problem is I can use code such as
strHTMLText = ie.Document.body.innerText
or
strHTMLText = ie.Document.body.outerText
to retrieve the code, but in either case only part, not all, of the source code is captured. I need ALL the code. Is there some kind of code such as ie.Document.body.allText or similar that would perform this function?

Much obliged!
 
Vladimir

So are you saying you can't interact dynamically?
I'm apologizing for delay - was busy. Deep night (GMT+2) is my time for Excel & VBA hobby :)
Here is more interesting example of MSXML2 usage for parsing of Longitude, Latitude from Address:
Rich (BB code):

' ZVI:2011-06-15 http://www.mrexcel.com/forum/showthread.php?p=2752525#post2752525
Sub Test2_MSXML2()

  Const Address$ = "1600 Pennsylvania Ave"
  Const City$ = "Washington"
  Const State$ = "DC"
  Const URL$ = "http://geocoder.us/service/rest/geocode?address="

  Dim oDOM As Object, oNodeList As Object, SendString As String

  SendString = Replace(Address, " ", "+") & ",+" & City & "+" & State

  Set oDOM = CreateObject("MSXML2.DOMDocument")
  With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", URL & SendString, False
    .send
    oDOM.loadXML .responseText
  End With

  If oDOM.parseError.errorCode <> 0 Then
    MsgBox "You have error " & oDOM.parseError.reason
  Else
    ' With-part below is for debug
    With oDOM
      Debug.Print .XML
      Debug.Print "geo:Point = " & .getElementsByTagName("geo:Point").Item(0).Text & vbLf & "where:"
      Debug.Print "Longitude = " & .getElementsByTagName("geo:long").Item(0).Text
      Debug.Print "Latitude  = " & .getElementsByTagName("geo:lat").Item(0).Text
    End With
    ' Parse Node "geo:Point" to get Address, Longitude, Latitude
    Set oNodeList = oDOM.getElementsByTagName("geo:Point")
    ' Show the parsing result
    If oNodeList.Length = 0 Then
      MsgBox "geo:Point not found!"
    Else
      MsgBox oNodeList.Item(0).Text, , "geo:Point = Addr, Long, Lat"
    End If
  End If

  ' Release the object variables
  Set oNodeList = Nothing
  Set oDOM = Nothing

End Sub
 
Last edited:
Upvote 0

Excel Facts

Which lookup functions find a value equal or greater than the lookup value?
MATCH uses -1 to find larger value (lookup table must be sorted ZA). XLOOKUP uses 1 to find values greater and does not need to be sorted.
...Where Const URL$ is declared, I would like to put in the string variable my code comes up with when it has found that page. Of course, I get a "Constant Expression Required" when I run the code. So, how can I run this code on the variable string???
Please provide us more details.
This modification of code concatenates result in Ret string variable
Rich (BB code):

Sub Test1()
  Const URL$ = "http://online.recoveryversion.org/bibleverses.asp?fvid=2901&lvid=2901"
  Const MASK$ = "href=FootNotes.asp?FNtsID="
  Dim txt As String, i As Long, ret As String
  With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", URL, False
    .send
    txt = .responseText
  End With
  Do
    i = InStr(i + 1, txt, MASK)
    If i = 0 Then Exit Do
    'Debug.Print Val(Mid$(txt, i + Len(MASK), 15))
    ret = ret & Val(Mid$(txt, i + Len(MASK), 15)) & ","
  Loop
  ret = Left(ret, Len(ret) - 1)
  Debug.Print ret
End Sub

If you need in function instead of Sub, then try something like this:
Rich (BB code):

Function ParseUrl(Url, Mask) As String
  Dim txt As String, i As Long, ret As String
  With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", Url, False
    .send
    txt = .responseText
  End With
  Do
    i = InStr(i + 1, txt, Mask)
    If i = 0 Then Exit Do
    ParseUrl = ParseUrl & Val(Mid$(txt, i + Len(Mask), 15)) & ","
  Loop
  ParseUrl = Left(ParseUrl, Len(ParseUrl) - 1)
End Function

Sub Test_ParseUrl()
  Dim MyUrl As String, MyMask As String
  MyUrl = "http://online.recoveryversion.org/bibleverses.asp?fvid=2901&lvid=2901"
  MyMask = "href=FootNotes.asp?FNtsID="
  MsgBox ParseUrl(MyUrl, MyMask)
End Sub
 
Last edited:
Upvote 0
Here is more interesting example of MSXML2 usage for parsing of Longitude, Latitude from Address:...
The same, but without MSXML2.XMLHTTP request class object.
It’s possible because MSXML2.DOMDocument class has its own method Load.
Rich (BB code):

' ZVI:2011-06-15 http://www.mrexcel.com/forum/showthread.php?p=2752525#post2752525
Sub Test3_MSXML2()

  Const Address$ = "1600 Pennsylvania Ave"
  Const City$ = "Washington"
  Const State$ = "DC"
  Const Url$ = "http://geocoder.us/service/rest/geocode?address="

  Dim oNodeList As Object, SendString As String

  SendString = Replace(Address, " ", "+") & ",+" & City & "+" & State
  With CreateObject("MSXML2.DOMDocument")
    .async = False
    .Load Url & SendString
    If .parseError.errorCode <> 0 Then
      MsgBox "You have error " & .parseError.reason
    Else
      ' --> Debug part
      Debug.Print .XML
      Debug.Print "geo:Point = " & .getElementsByTagName("geo:Point").Item(0).Text & vbLf & "where:"
      Debug.Print "Longitude = " & .getElementsByTagName("geo:long").Item(0).Text
      Debug.Print "Latitude  = " & .getElementsByTagName("geo:lat").Item(0).Text
      ' <-- End of Debug part
      
      ' Parse Node "geo:Point" to get Address, Longitude, Latitude
      Set oNodeList = .getElementsByTagName("geo:Point")
      ' Show the parsing result
      If oNodeList.Length = 0 Then
        MsgBox "geo:Point not found!"
      Else
        MsgBox oNodeList.Item(0).Text, , "geo:Point = Addr, Long, Lat"
      End If
    End If
  End With
  ' Release object variable
  Set oNodeList = Nothing
End Sub
 
Last edited:
Upvote 0
Norie & Vladimir,

I was able to get my code working, don't know what was wrong when I reported problem earlier on constants/variables. I don't need any more help right now, but I'm posting it just to thank you for your assistance (you don't get any blame for my sloppiness!). You need to have the website http://online.recoveryversion.org/ loaded for it to run - the full code doesn't have that problem.

Code:
[SIZE=3][FONT=Times New Roman]Dim myIE As New InternetExplorer<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p>[/FONT][/SIZE]
Sub AbbreviatedCode()
Dim strReference As String
Dim strFootnote As String
Dim strEnglishWord As String
strReference = "Luke 1:2"
strFootnote = "2"
strEnglishWord = "minister"
Call OpenScriptureReference(strReference, strFootnote, strEnglishWord)
End Sub
 
Sub OpenScriptureReference(strReference As String, strFootnote As String, strEnglishWord As String)
Dim ie As Object
Dim strSite As String
Dim strVersePage As String
strSite = "[URL]http://online.recoveryversion.org/[/URL]"
Set ie = GetIE(strSite)
'Load main site page in IE and display
ie.Navigate (strSite)
ie.Visible = True
 
While ie.Busy
    DoEvents  'wait until IE is done loading page.
Wend
'Load page with specified verse reference
ie.Document.all("scriptureString").Value = strReference
ie.Document.forms(0).submit
While ie.Busy
    DoEvents  'wait until IE is done loading page.
Wend
'Capture location URL of verse page
strVersePage = ie.LocationURL
'Credit idea for following code to Vladimir Zakharov and Norie on [URL="http://www.MrExcel.com"]www.MrExcel.com[/URL]
Const Mask$ = "href=FootNotes.asp?FNtsID="
Dim txt As String, i As Long
With CreateObject("MSXML2.XMLHTTP")
  .Open "GET", strVersePage, False
  .send
  txt = .responseText
End With
 
i = InStr(i + 1, txt, Mask)
iFootnotePageNo = Val(Mid$(txt, i + Len(Mask), 15))
iFootnotePageNo = iFootnotePageNo + Val(strFootnote) - 1
'Load page with verse and footnote
ie.Navigate ("[URL]http://online.recoveryversion.org/FootNotes.asp?FNtsID[/URL]=" & iFootnotePageNo)
'Display desired information on word for comparison
'MsgBox "Ref.: " & strReference & vbNewLine & "Footnote: " & strFootnote & vbNewLine & "Word: " & strEnglishWord
MsgBox strFootnote & ". " & strEnglishWord, vbInformation, "Key Words"
End Sub
 
 
'Find an IE window with a matching (partial) URL
'Assumes no frames.
Function GetIE(sAddress As String) As Object
Dim objShell As Object, objShellWindows As Object, o As Object
Dim retVal As Object, sURL As String
Set retVal = Nothing
Set objShell = CreateObject("Shell.Application")
Set objShellWindows = objShell.Windows
'see if IE is already open
For Each o In objShellWindows
sURL = ""
On Error Resume Next
sURL = o.******************
On Error GoTo 0
If sURL <> "" Then
If sURL Like sAddress & "*" Then
Set retVal = o
Exit For
End If
End If
Next o
Set GetIE = retVal
End Function

Thanks to both of you again - I can't believe that three countries - America, Turkey, and Ukraine - can get tied together just over some simple code!

In parting, I hope you enjoy the following link on the Recovery Version Online page you have been helping me access by code: http://online.recoveryversion.org/FootNotes.asp?FNtsID=3983.

Thanks again!
 
Upvote 0
Nice code, Kevin!

Below is a little changed version, see if this suits you.
Interface subroutines:
Rich (BB code):

Sub AbbreviatedCode()
  Dim Ref$, Footnote$, EnglishWord$
  Ref = "Luke 1:2"
  Footnote = 4
  EnglishWord = "minister"
  OpenScriptureReference Ref, Footnote, EnglishWord
End Sub

Private Sub OpenScriptureReference(Ref$, Footnote$, EnglishWord$)

  Const MASK$ = "FootNotes.asp?FNtsID="
  Const SITE$ = "http://online.recoveryversion.org/"
  Const Timeout! = 5
  Dim s$

  ' Find/Create IE navigated to SITE
  With GetIE(SITE, Timeout)

    'Load page with specified verse reference
    .Document.All("scriptureString").Value = Ref
    .Document.Forms(0).Submit

    ' Wait till IE complete loads document
    WaitIE .Application

    'Parse FootnotePageNo
    s = .Document.Body.innerHTML
    s = Val(Mid$(s, InStr(s, MASK) + Len(MASK), 15)) + Val(Footnote) - 1

    'Load page with verse and footnote
    .Navigate SITE & MASK & s
    WaitIE .Application

    'Display desired information on word for comparison
    'MsgBox "Ref.: " & Ref & vbLf & "Footnote: " & Footnote & vbLf & "Word: " & EnglishWord, vbInformation, "Key Words"

    ' Put result info into IE Title
    .Document.Title = "Ref.: " & Ref & ", Footnote: " & Footnote & ", Word: " & EnglishWord

    ' Activate IE
    .Visible = True

  End With

End Sub

Functions:
Rich (BB code):

' ZVI:2011-06-17 Get/Create IE object, navigate it to URL
Function GetIE(ByVal Url As String, Optional Timeout As Single = 5) As Object
  Dim t As Single
  Url = LCase(Url)
  If Not Left(Url, 4) = "http" Then Url = "http://" & Url
  For Each GetIE In CreateObject("Shell.Application").Windows
    If LCase(GetIE.LocationURL) Like Url & "*" Then Exit For
  Next
  On Error GoTo if_err
  If GetIE Is Nothing Then
    Set GetIE = CreateObject("InternetExplorer.Application")
    GetIE.Visible = True
    'AppActivate Application.Caption  '<-- Activate Excel back
  End If
  With GetIE
    .Silent = True
    .Navigate Url
    WaitIE .Application, Timeout
  End With
if_err:
  If Err Then MsgBox Err.Description, vbExclamation, Url
End Function

' ZVI:2011-06-17 Wait until IE completly loads document
Private Sub WaitIE(IE As Object, Optional ByVal Timeout As Single = 5)
  With IE

    ' Wait not busy IE state with timeout
    Timeout = Timer + Timeout
    Application.StatusBar = "IE Busy..."
    While .Busy And Timer < Timeout
      DoEvents
    Wend

    ' Wait IE ready state with timeout
    Application.StatusBar = "IE ReadyState..."
    Timeout = Timer + Timeout
    While .ReadyState <> 4 And Timer < Timeout
      DoEvents
    Wend

    ' Wait the complete loading of document
    Application.StatusBar = "IE Document loading ..."
    While .Document Is Nothing
      DoEvents
    Wend
    Application.StatusBar = False

  End With
End Sub

Regards,
 
Last edited:
Upvote 0
The following code produced this on an Excel worksheet.

<TABLE style="WIDTH: 633pt; BORDER-COLLAPSE: collapse" border=0 cellSpacing=0 cellPadding=0 width=843><COLGROUP><COL style="WIDTH: 43pt; mso-width-source: userset; mso-width-alt: 2084" width=57><COL style="WIDTH: 590pt; mso-width-source: userset; mso-width-alt: 28745" width=786><TBODY><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; WIDTH: 43pt; HEIGHT: 15pt; BORDER-TOP: windowtext 0.5pt solid; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65 height=20 width=57>Luke 1:2</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; WIDTH: 590pt; BORDER-TOP: windowtext 0.5pt solid; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65 width=786>2 Even as 1those who from the beginning became aeyewitnesses and 2ministers of the 3word have 4delivered them to us, </TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65 height=20></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65></TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65 height=20></TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65></TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65 height=20>Ref</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65>Link</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65 height=20 align=right>1</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65>http://online.recoveryversion.org/FootNotes.asp?FNtsID=1634</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65 height=20>a</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65>http://online.recoveryversion.org/CrossReferences.asp?XRefID=2461</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65 height=20 align=right>2</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65>http://online.recoveryversion.org/FootNotes.asp?FNtsID=1635</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65 height=20 align=right>3</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65>http://online.recoveryversion.org/FootNotes.asp?FNtsID=1636</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65 height=20 align=right>4</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl65>http://online.recoveryversion.org/FootNotes.asp?FNtsID=1637</TD></TR></TBODY></TABLE>

Rich (BB code):
Option Explicit
Sub ScriptureRefLinks()
Dim ie As Object
Dim doc As Object
Dim inp As Object
Dim divContent As Object
Dim elm As Object
Dim ws As Worksheet
Dim rng As Range
Dim baseURL As String
Dim strChVsLn As String
Dim strVerse As String
Dim strChap As String
 
    baseURL = "http://online.recoveryversion.org/"
 
    strChVsLn = "Luke 1:2"
 
    Set ie = CreateObject("InternetExplorer.Application")
 
    With ie
        .Navigate baseURL
 
        Do While .Busy: DoEvents: Loop
        Do While .ReadyState <> 4: DoEvents: Loop
 
        '.Visible = True
 
        Set doc = ie.Document
 
        Set inp = doc.getelementbyid("Text1")
 
        inp.Value = strChVsLn
 
        doc.forms(0).submit
 
        ' not 100% sure next 3 lines are definitely needed
        Set doc = ie.Document
 
        Do While .Busy: DoEvents: Loop
        Do While .ReadyState <> 4: DoEvents: Loop
 
        Set ws = Worksheets.Add
 
        Set rng = Range("A5")
 
        Set divContent = doc.getelementbyid("content")
 
        strVerse = divContent.innertext
 
        For Each elm In divContent.all
 
            Select Case elm.classname
 
                Case "chapnnum"
                    strChap = strChap & elm.innertext
 
                Case "notes", "ref"
                    rng.Value = elm.innertext
                    rng.Offset(, 1) = elm.href
                    Set rng = rng.Offset(1)
 
                Case "verses"
                    strVerse = elm.innertext
 
            End Select
 
        Next elm
 
        Set rng = rng.Offset(-rng.Row + 1)
 
        rng = strChVsLn    ' Trim(strChap)
 
        rng.Offset(1) = strVerse
 
        rng.Offset(1).WrapText = False
 
        rng.Offset(3).Resize(, 2) = Array("Ref", "Link")
 
    End With
 
    Set ie = Nothing
 
End Sub
 
Upvote 0
Vladimir and Norie,

Thanks for both your further contributions! I'll look them over a bit later, as I have other matters I have to take care of.

OK, it's time for me to come clean - and I apologize! - I'm not actually using Excel at all, but am running this from within MSWord. The reason I was using MrExcel at all was that I was searching on Google and found that this site had some really useful information on VBA-HTML interface which was all I was really needing help on.

I've found this forum (and especially your responses) an excellent resource. Is there perhaps a MrWord forum somewhere?

Thanks again.

Kevin
 
Upvote 0
Kevin

I think most of the code that's been posted could be used from Word VBA with minor modifications.

For example in the code I posted you'd just need to change where the data was ouput to.

I've never actually used Word as the destination for this sort of thing but as I think about it it might have it's advantages and might even be easier.
 
Upvote 0
Norie and Vladimir,

You guys are so great! I promise I'll try to find some time to digest both of your recent code suggestions. I'm sure I'll learn a lot!

God bless you both!
 
Upvote 0
Agree with Norie.
BTW code of post#25 works in Word as well without any changes.
You can try asking about Word VBA code on MrExcel.
At least, there were several such threads here which I participated in.
Regards,
 
Upvote 0

Forum statistics

Threads
1,214,606
Messages
6,120,492
Members
448,967
Latest member
visheshkotha

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top