Use VBA to download/save html file containing unicode characters

jaredaseltzer

New Member
Joined
May 5, 2015
Messages
3
I am trying to download and save hundreds of webpages as html files using the code below. The code works great for pages that contain no unicode special characters, but not for what I need. An example url that I'm trying to programmatically download from within Excel is http://cal1.cn.huc.edu/get_a_chapter.php?file=51006&sub=1&cset=H

but all of the hebrew letters are turning into question marks when I download and save the html using this code:

Code:
Sub Main()
    Dim url as String, fn as String
    url="http://cal1.cn.huc.edu/get_a_chapter.php?file=51006&sub=1&cset=H"
    fn="C:\temp\Joshua-Chapter1.htm"
    CreateFile fn, GetHTML(url)
End Sub

Function GetHTML(URL As String) As String
    Dim objHttp As Object
    
    Set objHttp = CreateObject("MSXML2.XMLHTTP")
    Call objHttp.Open("GET", URL, False)
    Call objHttp.Send("")
    GetHTML = objHttp.ResponseText
End Function


Sub CreateFile(FileName As String, Contents As String)
    ' creates file from string contents
    Dim tempFile As String
    Dim nextFileNum As Long
    
    nextFileNum = FreeFile
    tempFile = FileName
    Open tempFile For Output As #nextFileNum
    Print #nextFileNum, Contents
    Close #nextFileNum


End Sub
How should I change this code so that it does not turn the special unicode characters into question marks?
 

Some videos you may like

Excel Facts

What is the fastest way to copy a formula?
If A2:A50000 contain data. Enter a formula in B2. Select B2. Double-click the Fill Handle and Excel will shoot the formula down to B50000.

John_w

MrExcel MVP
Joined
Oct 15, 2007
Messages
6,266
Use the data in ResponseBody, which is an array of bytes, instead of ResponseText. The code changes to:
Code:
Function GetHTML(URL As String) As Byte()
    Dim objHttp As Object
    
    Set objHttp = CreateObject("MSXML2.XMLHTTP")
    objHttp.Open "GET", URL, False
    objHttp.Send ""
    GetHTML = objHttp.ResponseBody
End Function

Sub CreateFile(FileName As String, Contents() As Byte)
    ' creates file from byte array
    Dim nextFileNum As Long
    
    nextFileNum = FreeFile
    Open FileName For Binary Access Write As #nextFileNum
    Put #nextFileNum, , Contents
    Close #nextFileNum
End Sub
 

jaredaseltzer

New Member
Joined
May 5, 2015
Messages
3
Use the data in ResponseBody, which is an array of bytes, instead of ResponseText. The code changes to:
Code:
Function GetHTML(URL As String) As Byte()
    Dim objHttp As Object
    
    Set objHttp = CreateObject("MSXML2.XMLHTTP")
    objHttp.Open "GET", URL, False
    objHttp.Send ""
    GetHTML = objHttp.ResponseBody
End Function

Sub CreateFile(FileName As String, Contents() As Byte)
    ' creates file from byte array
    Dim nextFileNum As Long
    
    nextFileNum = FreeFile
    Open FileName For Binary Access Write As #nextFileNum
    Put #nextFileNum, , Contents
    Close #nextFileNum
End Sub
Thank you John_w. This would have worked, but I found another solution by calling the API "URLDownloadToFile":

Code:
Option Explicit
Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Function Download(url As String, fn As String) As Integer


    Download = URLDownloadToFile(0, url, fn, 0, 0)


    If Download <> 0 Then MsgBox "Error: " & Download & "." 


End Function
 

Watch MrExcel Video

Forum statistics

Threads
1,098,992
Messages
5,465,867
Members
406,452
Latest member
GroupGoal

This Week's Hot Topics

Top