import UTF-8 HTML complete source codes of a HTTP webpage via vba

tourist

New Member
Joined
Oct 26, 2016
Messages
1
I have to import several HTTP webpage's source codes into separate Excel worksheets. I found this great code by searching the forum, which does exactly what I need:

Code:
Sub ImportHTMLSource()
Dim FileName As String
Dim FileNum As Long
Dim Sh As Worksheet
FileName = "D:\Source.txt"
FileNum = FreeFile
Open FileName For Output As FileNum
Print #FileNum, GetSource(Range("A1"))
Close FileNum
Set Sh = Worksheets.Add
With Sh.QueryTables.Add(Connection:="TEXT;D:\Source.txt", Destination:=Range("A2"))
.Name = "Source"
.AdjustColumnWidth = True
.TextFileParseType = xlFixedWidth
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileColumnDataTypes = Array(2)
.Refresh BackgroundQuery:=False
End With
End Sub
Function GetSource(sURL As String) As String
Dim oXHTTP As Object
Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
oXHTTP.Open "GET", sURL, False
oXHTTP.send
GetSource = oXHTTP.responsetext
Set oXHTTP = Nothing
End Function

The code is great for English websites but when it comes to other languages like Arabic, it fails and shows Arabic characters as some question marks like "????" while I do have language pack and fonts installed. I tried changing "windows regional and language" to Arabic but it didn't help. also changed the "Source.txt" to "Source.html" (the file that is saved and imported via the code) didn't change the results. It seems a SIMPLE task but is it possible to import full UTF8 html source of a non-English webpage into an Excel worksheet (given that the )? what am I doing wrong?
I have windows 7 and Excel 2013. Thanks in advance.
 

Excel Facts

Repeat Last Command
Pressing F4 adds dollar signs when editing a formula. When not editing, F4 repeats last command.
Welcome to MrExcel.

Try replacing the GetSource = line with:
Code:
    GetSource = StrConv(oXHTTP.responseBody, vbUnicode)
 
Upvote 0
I have to import several HTTP webpage's source codes into separate Excel worksheets. I found this great code by searching the forum, which does exactly what I need:

Code:
Sub ImportHTMLSource()
Dim FileName As String
Dim FileNum As Long
Dim Sh As Worksheet
FileName = "D:\Source.txt"
FileNum = FreeFile
Open FileName For Output As FileNum
Print #FileNum, GetSource(Range("A1"))
Close FileNum
Set Sh = Worksheets.Add
With Sh.QueryTables.Add(Connection:="TEXT;D:\Source.txt", Destination:=Range("A2"))
.Name = "Source"
.AdjustColumnWidth = True
.TextFileParseType = xlFixedWidth
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileColumnDataTypes = Array(2)
.Refresh BackgroundQuery:=False
End With
End Sub
Function GetSource(sURL As String) As String
Dim oXHTTP As Object
Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
oXHTTP.Open "GET", sURL, False
oXHTTP.send
GetSource = oXHTTP.responsetext
Set oXHTTP = Nothing
End Function

The code is great for English websites but when it comes to other languages like Arabic, it fails and shows Arabic characters as some question marks like "????" while I do have language pack and fonts installed. I tried changing "windows regional and language" to Arabic but it didn't help. also changed the "Source.txt" to "Source.html" (the file that is saved and imported via the code) didn't change the results. It seems a SIMPLE task but is it possible to import full UTF8 html source of a non-English webpage into an Excel worksheet (given that the )? what am I doing wrong?
I have windows 7 and Excel 2013. Thanks in advance.
I have just completed such a task. All it is is transposing utf 8 into regular text. UTF8 has 65000+ CHARACTERS in blocks. Basically you need to translate that code page range to Windows Ansi or whatever. Look for my post "Translate UTF-8". Ye VBA code is very simple and effective at doing this. Whatever the language used, it will translate into legible characters.
 
Upvote 0

Forum statistics

Threads
1,215,619
Messages
6,125,875
Members
449,267
Latest member
ajaykosuri

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top