Creating a UTF-8 Encoded Text File

JazzSP8

Well-known Member
Joined
Sep 30, 2005
Messages
1,228
Office Version
  1. 2013
  2. 2010
Platform
  1. Windows
Hey All

I'm trying to take some some system data on products from three different systems and load them into a fourth.

I'm dealing with special characters in UTF-8 format and have used Text > Import to allow for that, the special characters display correctly when imported in Excel

Now I've done the complicated bit and have come unstuck at what I thought was the easy bit.

I've combined the data from the three sources and saved it out as a text file with a simple;

Code:
Open "DataTest.txt" For Output As #1
        Print #1, DataWeNeed
Close #1

(The variable is dimmed as a string)

The special characters appear to remain intact when I opened the text file in Notepad.

However, when the chap trying to load them in to the system he came back saying it wouldn't work correctly because the special characters where causing an issue and could I supply it in a UTF-8 friendly format.

So, I worked out I could use Notepad to "Save As > Encoding > UTF-8 file" and sent him that, but, no, that didn't work either and gave him the same issue.

So, I thought perhaps something was getting lost in translation with creating the Excel text file? - Perhaps something I need to add to the 'Open" part of the Macro to create a UTF 8 file from the get go?

Bit of a loss on this one to be honest, any and all suggestions gratefully received? :)
 

Excel Facts

Back into an answer in Excel
Use Data, What-If Analysis, Goal Seek to find the correct input cell value to reach a desired result
Hi

I don't think that excel offers you the option to save the file in utf-8 directly.

You can use a Stream object. This is a quick test I just did

- in the vba editor set the reference to "Microsoft Active Data Objects 6.1 Library"

Used this code:

Code:
Sub Test_utf_8()
Dim st As ADODB.Stream
Dim sPathname As String

sPathname = "c:\tmp\test_utf-8.txt"

' create a stream object
Set st = New ADODB.Stream

' set properties
st.Charset = "utf-8"
st.Type = adTypeText

' open the stream object and write some text
st.Open
st.WriteText "This is a test"

' save
st.SaveToFile sPathname, adSaveCreateOverWrite

End Sub
 
Upvote 0
Hi

I don't think that excel offers you the option to save the file in utf-8 directly.

You can use a Stream object. This is a quick test I just did

- in the vba editor set the reference to "Microsoft Active Data Objects 6.1 Library"

Excellent, thanks a lot for that - I've give it a go and sent if off, he's come back saying he can make it work - Should know in a little while if successful :)

Really appreciated as I was completely baffled! :)
 
Upvote 0
Hey (again) all

Apologies about bumping the thread, but I've another question about this and to save explaining again...

We've got this up and running now thanks to the info above, however we've become to increasingly rely on it and the data sets involved have gotten bigger and bigger and it's taking longer and longer to run. When we started it used to take an hour or two, but now it's getting more like 8 when it's working with 164,000 rows :/

As it was getting to the point where we weren't sure if it was actually doing anything I've put a ticker on the Status Bar that lets us know which row number it is up to so we know it's still working as there was some doubt the first couple of times it got past the three hour mark.

This however allowed us to notice something else, the further down the list it gets, the slower it gets, it starts off at a fairly rapid pace but after a while it starts to slow down to 1 record per second and gets worse from there.

We need to use the method that pgc01 gave us about to give us the correct file format, but I was wondering if the file that is been created is actually held in memory until the final 'SaveToFile'

If the memory is getting less as the file progresses, that would explain the slow down as it gets further in?

The code I have (which works, apart from the slow down) is;

Code:
Sub GenerateFeatures()

Dim st As ADODB.Stream
Dim sPathname As String
Dim OutRow As String

Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual

TimeStart = Now

ChDir ActiveWorkbook.Path
sPathname = "19992403_local_product_en_cc_gb.txt" ''' This is the filename, amend to suit

Set st = New ADODB.Stream
st.Charset = "utf-8"
st.Type = adTypeText

st.Open

st.WriteText "product|en_CC" & vbNewLine

LastRow = ActiveCell.SpecialCells(xlCellTypeLastCell).Row

For x = 2 To LastRow

Windows("SKU_Active.xlsm").Activate

SKU = Cells(x, "B").Value

If SKU = "" Then GoTo GameOver

Application.StatusBar = "Working on Row " & x & " of " & LastRow & " - SKU: " & SKU

Windows("SKU_Data.xlsx").Activate
Set wb = ActiveWorkbook

With wb.Sheets("SKU_Data")
    Set FindRow = .Range("A:A").Find(What:=SKU, LookIn:=xlFormulas)
End With

If Not FindRow Is Nothing Then
    SKURow = FindRow.Row
    CloneRow = FindRow.Row + 1

    OutRow = SKU & "|0|"
    OutRow = OutRow & ActiveWorkbook.Sheets("SKU_Data").Cells(SKURow, "O").Value & "|"
    OutRow = OutRow & ActiveWorkbook.Sheets("SKU_Data").Cells(SKURow, "U").Value & "|" & "100000|"
    OutRow = OutRow & ActiveWorkbook.Sheets("SKU_Data").Cells(SKURow, "U").Value & "|" '' Yes U is used twice :)
    OutRow = OutRow & ActiveWorkbook.Sheets("SKU_Data").Cells(SKURow, "P").Value & "|"
    OutRow = OutRow & ActiveWorkbook.Sheets("SKU_Data").Cells(SKURow, "Q").Value & "|"
    OutRow = OutRow & ActiveWorkbook.Sheets("SKU_Data").Cells(SKURow, "R").Value & "|"
   
'' Get title from Col H
    
    FindSemi = InStr(ActiveWorkbook.Sheets("SKU_Data").Cells(SKURow, "H").Value, ";")
    
        If FindSemi <> 0 Then
    
            OutRow = OutRow & Left(ActiveWorkbook.Sheets("SKU_Data").Cells(SKURow, "H").Value, FindSemi) & ActiveWorkbook.Sheets("SKU_Data").Cells(SKURow, "H").Value & "|"

        Else
        
            OutRow = OutRow & ActiveWorkbook.Sheets("SKU_Data").Cells(SKURow, "H").Value & ";" & ActiveWorkbook.Sheets("SKU_Data").Cells(SKURow, "H").Value & "|"

        End If

    FindSemi = 0
        
    '' Collect Data for final part now while we're here

    FindSemi = InStr(ActiveWorkbook.Sheets("SKU_Data").Cells(SKURow, "I").Value, ";")

    '' Col IT Is for later use as technically we don't need it yet, but saves messing around again later.

    If FindSemi <> 0 Then
    
        ColIOrText = ActiveWorkbook.Sheets("SKU_Data").Cells(SKURow, "I").Value
        ColIT = Right(ColIOrText, Len(ColIOrText) - InStr(ColIOrText, ";") - 1)
    
    Else

        ColIT = ActiveWorkbook.Sheets("SKU_Data").Cells(SKURow, "I").Value

    End If
        
End If

If OutRow = "" Then GoTo Clean

Windows("SKU_Benefits.xlsx").Activate
Set wb = ActiveWorkbook

With wb.Sheets("Sheet 1")
    Set FindRow = .Range("B:B").Find(What:=SKU, LookIn:=xlFormulas)
End With

If Not FindRow Is Nothing Then

    SKURow = FindRow.Row

    For i = 3 To 32
        If ActiveWorkbook.Sheets("Sheet 1").Cells(SKURow, i).Value <> "" Then
            OutRow = OutRow & ActiveWorkbook.Sheets("Sheet 1").Cells(SKURow, i).Value & ";" ' May have to remove the last ";"
        End If
    Next i

End If

OutRow = Left(OutRow, Len(OutRow))

'' ColIT is now useful :)

OutRow = OutRow & ColIT & "|" & "1" & vbNewLine
 
st.WriteText OutRow

''''''''''''''''''''''''''''''''''''' Check for clones....
Windows("SKU_Data.xlsx").Activate
    
    TestClone = Left(ActiveWorkbook.Sheets("SKU_Data").Cells(CloneRow, "A").Value, Len(SKU))
    
    If TestClone = SKU Then
    
        CloneCount = Application.WorksheetFunction.CountIf(Sheets("SKU_Data").Range("M:M"), SKU)
        
        OutRowLength = Len(OutRow)
        SKULength = Len(SKU)
        
        NewOutRowLength = OutRowLength - SKULength
 
        For Z = CloneRow To CloneRow + CloneCount - 1
        
            CloneOutRow = ActiveWorkbook.Sheets("SKU_Data").Cells(Z, "A").Value & Right(OutRow, NewOutRowLength)
            st.WriteText CloneOutRow
            
        Next Z
        
    End If
    
Clean:
SKU = ""
SKURow = ""
OutRow = ""

Next x

GameOver:

st.SaveToFile sPathname, adSaveCreateOverWrite

Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.StatusBar = "Time Started: " & TimeStart & " - Time Ended: " & Now

End Sub

It's looking through a master list of Active SKU's from one file, getting the associated data from another file and then looking to see if there are any benefits in another and compiling them in to one pipe delimited string. Then looking to see if there are any 'Clone Codes' which also need to have the same data attached (clones are the same product but with a two digit marketing suffix attached to denote where sales have come from).

If it's a natural thing that it is getting slower to run, then I get that - I was just wondering / hoping that there might be a way to speed this up somehow?

As always, thanks in advance for any help that could be provided.
 
Upvote 0
Hi

Sorry, I don't have time these days to study your code, but I looked at it and, if I'm not wrong, you are writing each row to the stream.

This is usually inefficient.

What I'd try is to build the whole text into a string and only at the end do 1 write with the whole text.

In pseudo code, I understood you are doing this

Code:
Dim OutRow as String

For each row

    Build row text into OutRow
    Write OutRow ' 1 write for each row

Next row

In each loop you are doing 1 write.

What I'd test is to instead build a string with the whole text and only then write it. Something like

Code:
Dim OutRow as String
Dim OutText as String

For each row

    Build row text into OutRow
    Add OutRow to OutText
    ' or add directly the output to OutText

next row

' just 1 write for the whole text
Write OutText

Remark: I hope I understood what you are doing. As I said I don't have much time to come to the board and just skimmed through the code

If you test it, please post if it improves the execution time.
 
Upvote 0
Hey PGC

Understand completely that you don't have time to study the code and I really appreciate that you took the time to take a look and reply anyway :)

Yes, you got what I was doing, each time a row was complete I write it to the file - I thought that would be the more efficient way to do it but am always glad to learn something new :)

I'll make the amend as you suggested and see how I get on, I'll post back either way if it made a difference or not :)

Again, thanks for your time.

:)
 
Upvote 0
Hi PGC,

Nice to see your knowledge on Excel VBA. I am looking to see if possible, through VBA to open an xml file and do a find and replace on chinese or japanese characters and save the xml back. Is there anyway around to do it. Because it should be saved in UTF8. I tried several ways to do it. But for chinese and japanese characters the xml never opens correctly.

Thanks.
EP
 
Upvote 0
Hi
Welcome to the board

I tried several ways to do it. But for chinese and japanese characters the xml never opens correctly.

If the file is coded in utf-8 then you can read it using a code similar to the one that I posted for writing.

This is an example:

Code:
Sub Test_utf_8_Read()
Dim st As ADODB.Stream
Dim sPathname As String
Dim sText As String

sPathname = "c:\tmp\test_utf-8.txt"

' create a stream object
Set st = New ADODB.Stream

' set properties
st.Charset = "utf-8"
st.Type = adTypeText

' open the stream object and read the text from the file
st.Open
st.LoadFromFile (sPathname)
sText = st.readtext
st.Close

' the text is in sText, do something with it

' ...

End Sub

P. S.

Like I posted before:

- in the vba editor set the reference to "Microsoft Active Data Objects 6.1 Library"
 
Upvote 0
Thank you PGC. I have tried using the above code and writing into the xml file. But 2 things:
1. How do I read entire file content because reading with this function does not read entire xml file in UTF8(my xml file size is huge so difficult to find what is missing, unfortunately unable to use notepad++ or so to compare files and find what is missing.

2. I did use replace function but how do i write the content to the same xml file in one go instead of line by line. I tried below code, but it is appending to the file.

Dim st As ADODB.Stream
Dim sPathname As String
Dim sText As String


sPathname = "C:\Users\xxxxx\Desktop\Test Data\1.xml"


' create a stream object
Set st = New ADODB.Stream


' set properties
st.Charset = "utf-8"
st.Type = adTypeText
' open the stream object and write some text
st.Open


st.LoadFromFile (sPathname)
sText = st.ReadText


sText = Replace(sText, ActiveSheet.Cells(1, 1).Value, ActiveSheet.Cells(1, 2).Value)


st.WriteText sText


' save
st.SaveToFile sPathname, adSaveCreateOverWrite


st.Close
 
Upvote 0

Forum statistics

Threads
1,216,086
Messages
6,128,736
Members
449,466
Latest member
Peter Juhnke

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top