Read HTML Source Code with VBA

KevinJ

New Member
Joined
Jun 13, 2011
Messages
9
Using VBA, I am trying to retrieve the contents of the Source of a web page (the same as would appear if you right-clicked on the page and chose "View Source") into a variable so I can work on it in VBA (using InStr, etc.).

The problem is I can use code such as
strHTMLText = ie.Document.body.innerText
or
strHTMLText = ie.Document.body.outerText
to retrieve the code, but in either case only part, not all, of the source code is captured. I need ALL the code. Is there some kind of code such as ie.Document.body.allText or similar that would perform this function?

Much obliged!
 

Excel Facts

What is the shortcut key for Format Selection?
Ctrl+1 (the number one) will open the Format dialog for whatever is selected.
Try this code and see result in Immediate window:
Rich (BB code):

Sub Test()
  Const URL$ = "http://online.recoveryversion.org/bibleverses.asp?fvid=2901&lvid=2901"
  Const MASK$ = "href=FootNotes.asp?FNtsID="
  Dim txt As String, i As Long
  With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", URL, False
    .Send
    txt = .ResponseText
  End With
  Do
    i = InStr(i + 1, txt, MASK)
    If i = 0 Then Exit Do
    Debug.Print Val(Mid$(txt, i + Len(MASK), 15))
  Loop
End Sub

This is a very helpful post!

But when I use it, it doesn't retain special characters. It converts them to question marks.

I'm trying to programmatically download hundreds of webpages like this one:

The Comprehensive Aramaic Lexicon

How do I need to change the code so that it retains the utf-8 characters?
 
Upvote 0

Forum statistics

Threads
1,215,945
Messages
6,127,861
Members
449,411
Latest member
adunn_23

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top