Use VBA to download/save html file containing unicode characters

jaredaseltzer

New Member
Joined
May 5, 2015
Messages
3
I am trying to download and save hundreds of webpages as html files using the code below. The code works great for pages that contain no unicode special characters, but not for what I need. An example url that I'm trying to programmatically download from within Excel is http://cal1.cn.huc.edu/get_a_chapter.php?file=51006&sub=1&cset=H

but all of the hebrew letters are turning into question marks when I download and save the html using this code:

Code:
Sub Main()
    Dim url as String, fn as String
    url="http://cal1.cn.huc.edu/get_a_chapter.php?file=51006&sub=1&cset=H"
    fn="C:\temp\Joshua-Chapter1.htm"
    CreateFile fn, GetHTML(url)
End Sub

Function GetHTML(URL As String) As String
    Dim objHttp As Object
    
    Set objHttp = CreateObject("MSXML2.XMLHTTP")
    Call objHttp.Open("GET", URL, False)
    Call objHttp.Send("")
    GetHTML = objHttp.ResponseText
End Function


Sub CreateFile(FileName As String, Contents As String)
    ' creates file from string contents
    Dim tempFile As String
    Dim nextFileNum As Long
    
    nextFileNum = FreeFile
    tempFile = FileName
    Open tempFile For Output As #nextFileNum
    Print #nextFileNum, Contents
    Close #nextFileNum


End Sub

How should I change this code so that it does not turn the special unicode characters into question marks?
 

Excel Facts

Difference between two dates
Secret function! Use =DATEDIF(A2,B2,"Y")&" years"&=DATEDIF(A2,B2,"YM")&" months"&=DATEDIF(A2,B2,"MD")&" days"
Use the data in ResponseBody, which is an array of bytes, instead of ResponseText. The code changes to:
Code:
Function GetHTML(URL As String) As Byte()
    Dim objHttp As Object
    
    Set objHttp = CreateObject("MSXML2.XMLHTTP")
    objHttp.Open "GET", URL, False
    objHttp.Send ""
    GetHTML = objHttp.ResponseBody
End Function

Sub CreateFile(FileName As String, Contents() As Byte)
    ' creates file from byte array
    Dim nextFileNum As Long
    
    nextFileNum = FreeFile
    Open FileName For Binary Access Write As #nextFileNum
    Put #nextFileNum, , Contents
    Close #nextFileNum
End Sub
 
Upvote 0
Use the data in ResponseBody, which is an array of bytes, instead of ResponseText. The code changes to:
Code:
Function GetHTML(URL As String) As Byte()
    Dim objHttp As Object
    
    Set objHttp = CreateObject("MSXML2.XMLHTTP")
    objHttp.Open "GET", URL, False
    objHttp.Send ""
    GetHTML = objHttp.ResponseBody
End Function

Sub CreateFile(FileName As String, Contents() As Byte)
    ' creates file from byte array
    Dim nextFileNum As Long
    
    nextFileNum = FreeFile
    Open FileName For Binary Access Write As #nextFileNum
    Put #nextFileNum, , Contents
    Close #nextFileNum
End Sub

Thank you John_w. This would have worked, but I found another solution by calling the API "URLDownloadToFile":

Code:
Option Explicit
Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Function Download(url As String, fn As String) As Integer


    Download = URLDownloadToFile(0, url, fn, 0, 0)


    If Download <> 0 Then MsgBox "Error: " & Download & "." 


End Function
 
Upvote 0

Forum statistics

Threads
1,214,971
Messages
6,122,521
Members
449,088
Latest member
RandomExceller01

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top