import UTF-8 HTML complete source codes of a HTTP webpage via vba

tourist

New Member
Joined
Oct 26, 2016
Messages
1
I have to import several HTTP webpage's source codes into separate Excel worksheets. I found this great code by searching the forum, which does exactly what I need:

Code:
Sub ImportHTMLSource()
Dim FileName As String
Dim FileNum As Long
Dim Sh As Worksheet
FileName = "D:\Source.txt"
FileNum = FreeFile
Open FileName For Output As FileNum
Print #FileNum, GetSource(Range("A1"))
Close FileNum
Set Sh = Worksheets.Add
With Sh.QueryTables.Add(Connection:="TEXT;D:\Source.txt", Destination:=Range("A2"))
.Name = "Source"
.AdjustColumnWidth = True
.TextFileParseType = xlFixedWidth
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileColumnDataTypes = Array(2)
.Refresh BackgroundQuery:=False
End With
End Sub
Function GetSource(sURL As String) As String
Dim oXHTTP As Object
Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
oXHTTP.Open "GET", sURL, False
oXHTTP.send
GetSource = oXHTTP.responsetext
Set oXHTTP = Nothing
End Function

The code is great for English websites but when it comes to other languages like Arabic, it fails and shows Arabic characters as some question marks like "????" while I do have language pack and fonts installed. I tried changing "windows regional and language" to Arabic but it didn't help. also changed the "Source.txt" to "Source.html" (the file that is saved and imported via the code) didn't change the results. It seems a SIMPLE task but is it possible to import full UTF8 html source of a non-English webpage into an Excel worksheet (given that the )? what am I doing wrong?
I have windows 7 and Excel 2013. Thanks in advance.
 

Excel Facts

Enter current date or time
Ctrl+: enters current time. Ctrl+; enters current date. Use Ctrl+: Ctrl+; Enter for current date & time.
Welcome to MrExcel.

Try replacing the GetSource = line with:
Code:
    GetSource = StrConv(oXHTTP.responseBody, vbUnicode)
 
Upvote 0
I have to import several HTTP webpage's source codes into separate Excel worksheets. I found this great code by searching the forum, which does exactly what I need:

Code:
Sub ImportHTMLSource()
Dim FileName As String
Dim FileNum As Long
Dim Sh As Worksheet
FileName = "D:\Source.txt"
FileNum = FreeFile
Open FileName For Output As FileNum
Print #FileNum, GetSource(Range("A1"))
Close FileNum
Set Sh = Worksheets.Add
With Sh.QueryTables.Add(Connection:="TEXT;D:\Source.txt", Destination:=Range("A2"))
.Name = "Source"
.AdjustColumnWidth = True
.TextFileParseType = xlFixedWidth
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileColumnDataTypes = Array(2)
.Refresh BackgroundQuery:=False
End With
End Sub
Function GetSource(sURL As String) As String
Dim oXHTTP As Object
Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
oXHTTP.Open "GET", sURL, False
oXHTTP.send
GetSource = oXHTTP.responsetext
Set oXHTTP = Nothing
End Function

The code is great for English websites but when it comes to other languages like Arabic, it fails and shows Arabic characters as some question marks like "????" while I do have language pack and fonts installed. I tried changing "windows regional and language" to Arabic but it didn't help. also changed the "Source.txt" to "Source.html" (the file that is saved and imported via the code) didn't change the results. It seems a SIMPLE task but is it possible to import full UTF8 html source of a non-English webpage into an Excel worksheet (given that the )? what am I doing wrong?
I have windows 7 and Excel 2013. Thanks in advance.
I have just completed such a task. All it is is transposing utf 8 into regular text. UTF8 has 65000+ CHARACTERS in blocks. Basically you need to translate that code page range to Windows Ansi or whatever. Look for my post "Translate UTF-8". Ye VBA code is very simple and effective at doing this. Whatever the language used, it will translate into legible characters.
 
Upvote 0

Forum statistics

Threads
1,215,063
Messages
6,122,935
Members
449,094
Latest member
teemeren

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top