VBA - Text Qualifier - removal

ylafont

New Member
Joined
Jun 21, 2016
Messages
36
Office Version
  1. 365
  2. 2021
Platform
  1. Windows
Hello all, thank you in advance for any assistance.

I have special "csv " files containing specific delimiters (CHR (20) for the column separator and the thorn character chr(254) for the text qualifier)
thorn.png


I found the code below which i have modified slightly and gets me half of the way where i need to go. during the import process I have been able to specify the column separator when importing the file and now i need to remove the text qualifier(chr (254)) during the import process. I added the replace line

WholeLine = Replace(WholeLine, Chr(254), "", vbTextCompare)

below which does the trick, however i have not tired this with large files and this may cause a problem since the character is being deleted and not being used as a "Qualifier "

is there any method of accomplishing this in the code below? thank you in advance again.

VBA Code:
Public Sub ImportTextFile()
'Public Sub ImportTextFile(FName As String, Sep As String)

Dim Sep As String
Dim FName As String
Dim RowNdx As Long
Dim ColNdx As Integer
Dim TempVal As Variant
Dim WholeLine As String
Dim Pos As Integer
Dim NextPos As Integer
Dim SaveColNdx As Integer


Sep = Chr(20)
FName = "C:\Temp\enron.dat"


'disable screen updates
Application.ScreenUpdating = False
'error handling
On Error GoTo EndMacro

'Importing data starts from the selected cell in a worksheet
SaveColNdx = ActiveCell.Column
RowNdx = ActiveCell.Row

'open the file in read mode
Open FName For Input Access Read As #1

'Read the file until End of Line
While Not EOF(1)
    'read line by line
    Line Input #1, WholeLine
    WholeLine = Replace(WholeLine, "", "", vbTextCompare)    'UTF-8
    WholeLine = Replace(WholeLine, "ÿþ", "", vbTextCompare)     'UTF-16 Unicode little endian
    WholeLine = Replace(WholeLine, "þÿ", "", vbTextCompare)     'UTF-16 Unicode big endian
[B]    WholeLine = Replace(WholeLine, Chr(254), "", vbTextCompare)[/B]


    'checking if the line is empty
    If Right(WholeLine, 1) <> Sep Then
        WholeLine = WholeLine & Sep
    End If
    ColNdx = SaveColNdx
    Pos = 1
    NextPos = InStr(Pos, WholeLine, Sep)
    'finding each column data
    While NextPos >= 1
        TempVal = Mid(WholeLine, Pos, NextPos - Pos)
        Cells(RowNdx, ColNdx).Value = TempVal
        Pos = NextPos + 1
        ColNdx = ColNdx + 1
        NextPos = InStr(Pos, WholeLine, Sep)
    Wend
    RowNdx = RowNdx + 1
Wend

Close #1
Exit Sub
 

Excel Facts

Difference between two dates
Secret function! Use =DATEDIF(A2,B2,"Y")&" years"&=DATEDIF(A2,B2,"YM")&" months"&=DATEDIF(A2,B2,"MD")&" days"
When it comes to larger files, I would recommend not to process data (read, modify, copy & paste) line by line and column by column.
In particular, applying repetitive access to worksheet ranges will slow down the process considerably, while Excel offers a special feature to import CSV files.

Excel can handle custom column delimiters, but the choice of text qualifiers is very limited, so prior to importing we better replace them both.
To do this, we could read the binary data of the file in its entirety into the computer's memory, replace the desired bytes and store the modified data in another file.
The newly created CSV file can then be imported into Excel (Ribbon > Data tab > Get external data > from Text).

Note that you are showing us a screenshot of data displayed in a text editor, which most likely supports different Unicode encodings. It can be assumed that the data is stored in ANSI or UTF-16LE encoding, however, we don't know for sure. The exact binary format has therefore yet to be determined. Also note that Excel supports UTF16-BE but does not remove the corresponding Byte Order Mark (BOM) if present, so we have to do that ourselves, unlike UTF8-BOM and UTF16-LE files both of which its BOM is filtered out automatically. If there's no BOM present, we need to try to determine the encoding using the text qualifier your data file uses (þ > &hFE > 254) .

The code below does all of the above. Note the use of a separate function, a dependency of the main procedure.
The code responsible for reading and writing a binary file includes three Class modules, one of which has the PredeclaredID attribute set to true. This means that you first need to save the code for that module using a text editor (eg Notepad) and then import that file within the VBA Editor (keybord shortcut: CTRL M).
The provided code is limited to files with a size of approx 2 GB. Hope this helps.

This goes in a standard module:
VBA Code:
Option Explicit

Public Enum ArrayElement
    SourceQual = 0
    SourceDelim = 1
    TargetQual = 2
    TargetDelim = 3
End Enum

Public Enum DataEncoding
    ANSI = 0
    UTF8 = 1
    UTF16LE = 2
    UTF16BE = 3
End Enum

Public Sub ConvertCSV()

    Const SOURCEFILE        As String = "C:\Temp\enron.dat"             ' << change to suit
    Const DESTINATIONFILE   As String = "C:\Temp\enron(copy).csv"       ' << change to suit

    Const UTF16LE_BOM As String = "ÿþ"  ' FF FE
    Const UTF16BE_BOM As String = "þÿ"  ' FE FF
    Const UTF8BOM_BOM As String = "" ' EF BB BF
   
    Dim UNKNOWN As Boolean, arr(TargetDelim) As String

    ' read entire file into memory
    Dim BinFile     As IBinaryDataFile
    Dim DataBuffer  As String
    Set BinFile = CBinaryDataFile.Create(SOURCEFILE, ForReading)
    With BinFile
        .FileRead .Size, DataBuffer
        .FileClose
    End With

    '  == Try to determine data encoding ==
    ' first check on presence of Byte Order Mark
    If VBA.Left$(DataBuffer, 2) = UTF16LE_BOM Then
        SetQualifiersAndDelimiters UTF16LE, arr

    ElseIf VBA.Left$(DataBuffer, 2) = UTF16BE_BOM Then
        SetQualifiersAndDelimiters UTF16BE, arr
        ' remove unwanted Byte Order Mark
        DataBuffer = VBA.Replace(DataBuffer, UTF16BE_BOM, vbNullString, 1, , vbBinaryCompare)

    ElseIf VBA.Left$(DataBuffer, 3) = UTF8BOM_BOM Then
        SetQualifiersAndDelimiters UTF8, arr

    ' no Byte Order Mark found, could be either UTF-8, ANSI, UTF-16LE or UTF-16BE
    ' now check using known text qualifier (&hFE > &hC3BE)
    ElseIf VBA.Left$(DataBuffer, 2) = "þ" Then
        SetQualifiersAndDelimiters UTF8, arr
   
    ElseIf VBA.Left$(DataBuffer, 2) = "þ" & VBA.Chr$(0) Then
        SetQualifiersAndDelimiters UTF16LE, arr
   
    ElseIf VBA.Left$(DataBuffer, 2) = VBA.Chr$(0) & "þ" Then
        SetQualifiersAndDelimiters UTF16BE, arr
   
    ElseIf VBA.Left$(DataBuffer, 1) = "þ" Then
        SetQualifiersAndDelimiters ANSI, arr
   
    Else
        UNKNOWN = True
    End If

    If Not UNKNOWN Then
        ' perform desired replacement
        DataBuffer = VBA.Replace(VBA.Join(VBA.Split(DataBuffer, arr(SourceDelim)), arr(TargetDelim)), arr(SourceQual), arr(TargetQual))
   
        ' delete former version of destination file (if any)
        If VBA.CreateObject("Scripting.FileSystemObject").FileExists(DESTINATIONFILE) Then
            VBA.Kill DESTINATIONFILE
        End If
   
        ' write new version of destination file
        Set BinFile = CBinaryDataFile.Create(DESTINATIONFILE, ForExclusive)
        With BinFile
            .FileWrite DataBuffer
            .FileClose
        End With
        Set BinFile = Nothing
   
        ' import newly created CSV file
        Dim Rng As Range
        Set Rng = ThisWorkbook.Worksheets.Add.Range("A1")       ' << change to suit
    '   Set Rng = ThisWorkbook.Worksheets("Sheet1").Range("A3")
   
        With Rng.Parent.QueryTables.Add(Connection:="TEXT;" & DESTINATIONFILE, Destination:=Rng)
            .Name = "enron"
            .FieldNames = True
            .PreserveFormatting = True
            .RefreshStyle = xlInsertDeleteCells
            .SaveData = True
            .AdjustColumnWidth = True
            .RefreshPeriod = 0
            .TextFilePlatform = 850
            .TextFileStartRow = 1
            .TextFileParseType = xlDelimited
            .TextFileTextQualifier = xlTextQualifierDoubleQuote
            .TextFileCommaDelimiter = True
            .TextFileTrailingMinusNumbers = True
            .Refresh BackgroundQuery:=False
        End With
       
    Else
        MsgBox "Could not determine data encoding", vbExclamation + vbOKOnly
    End If
End Sub


Private Sub SetQualifiersAndDelimiters(ByVal argEnc As DataEncoding, ByRef argArr() As String)

    Dim NUL As String
    NUL = VBA.Chr$(0)

    Dim OldSep As String, OldQual As String, OldQualUTF8 As String
    OldSep = VBA.Chr$(20)
    OldQual = VBA.Chr$(254)
    OldQualUTF8 = "þ"

    Dim NewSep As String, NewQual As String
    NewSep = ","
    NewQual = """"

    Select Case argEnc

        Case ANSI
            argArr(SourceQual) = OldQual
            argArr(SourceDelim) = OldQual & OldSep & OldQual
            argArr(TargetQual) = NewQual
            argArr(TargetDelim) = NewQual & NewSep & NewQual

        Case UTF8
            argArr(SourceQual) = OldQualUTF8
            argArr(SourceDelim) = OldQualUTF8 & OldSep & OldQualUTF8
            argArr(TargetQual) = NewQual
            argArr(TargetDelim) = NewQual & NewSep & NewQual

        Case UTF16LE
            argArr(SourceQual) = OldQual & NUL
            argArr(SourceDelim) = OldQual & NUL & OldSep & NUL & OldQual & NUL
            argArr(TargetQual) = NewQual & NUL
            argArr(TargetDelim) = NewQual & NUL & NewSep & NUL & NewQual & NUL

        Case UTF16BE
            argArr(SourceQual) = NUL & OldQual
            argArr(SourceDelim) = NUL & OldQual & NUL & OldSep & NUL & OldQual
            argArr(TargetQual) = NUL & NewQual
            argArr(TargetDelim) = NUL & NewQual & NUL & NewSep & NUL & NewQual
    End Select
End Sub


This goes in a class module, to be renamed IBinaryDataFileFactory:
VBA Code:
'Class Name:    IBinaryDataFileFactory
Option Explicit

Public Enum EFileAccess
    ForExclusive = 1
    ForReading = 2
    ForReadingAndWriting = 3
End Enum

' -----------------------------------------------------------------------------------
' Properties which are multiple times needed during creation of the BinaryFile object
' -----------------------------------------------------------------------------------
Public Property Set FSO(ByVal RHS As Object)
End Property
Public Property Get FSO() As Object
End Property
Public Property Let FullFileName(ByVal RHS As String)
End Property
Public Property Get FullFileName() As String
End Property

' ---------------
' One-time Method
' ---------------
Public Sub FileOpenFor(ByVal argAccess As EFileAccess)
End Sub


This goes in a class module, to be renamed IBinaryDataFile:
VBA Code:
'Class Name:    IBinaryDataFile
Option Explicit

' ---------------
' Exposed Methods
' ---------------
Public Sub FileRead(ByVal argByteCount As Long, ByRef argBuffer As String)
End Sub
Public Sub FileWrite(ByRef argBuffer As String)
End Sub
Public Sub FileClose()
End Sub

' ------------------
' Exposed Properties
' ------------------
Public Property Let FilePointer(ByVal RHS As Long)
End Property
Public Property Get FilePointer() As Long
End Property

' ----------------------------
' Exposed Read-Only Properties
' ----------------------------
Public Property Get FullFileName() As String
End Property
Public Property Get Size() As Long
End Property
Public Property Get IsEOF() As Boolean
End Property


This needs to be saved using a text editor as a file named CBinaryDataFile.cls
and then be imported within the VBE:

VBA Code:
VERSION 1.0 CLASS
BEGIN
  MultiUse = -1  'True
END
Attribute VB_Name = "CBinaryDataFile"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = False
'@PredeclaredId
Option Explicit

Const CLASSNAME As String = "BinaryDataFile Class"

Implements IBinaryDataFile
Implements IBinaryDataFileFactory

Private Type TBinaryDataFile
    AmIValid                As Boolean
    FSO                     As Object
    FullFileName            As String
    FileHandle              As Integer
    FileAccess              As EFileAccess
    IsOpen                  As Boolean
    PointerBefore           As Long
    PointerAfter            As Long
    PointerMax              As Long
    ErrorNumber             As Long
    ErrorDescription        As String
End Type
Private this As TBinaryDataFile

Private Enum EErrorType
    CreateObjectError
    FileEqualsFolder
    FolderDoesNotExist
    WriteAccessIssue
    FileOpenIssue
    FileReadIssue
    FileWriteIssue
End Enum

' ======== FACTORY Procedures ========

Public Function Create(ByVal argFullFileName As String, ByVal argAccess As EFileAccess) As IBinaryDataFile
    Dim Result As CBinaryDataFile
    Set Result = New CBinaryDataFile
    Init Result, argFullFileName, argAccess
    Set Create = Result
End Function

Private Sub IBinaryDataFileFactory_FileOpenFor(ByVal argAccess As EFileAccess)
    Dim TempHandle As Integer
    TempHandle = FileOpenFor(this.FullFileName, argAccess)
    this.IsOpen = True
    this.FileHandle = TempHandle
    this.FileAccess = argAccess
    this.AmIValid = True
    ConfirmFilePointer 1
End Sub

Private Property Set IBinaryDataFileFactory_FSO(ByVal RHS As Object)
    Set this.FSO = RHS
End Property

Private Property Get IBinaryDataFileFactory_FSO() As Object
    Set IBinaryDataFileFactory_FSO = this.FSO
End Property

Private Property Let IBinaryDataFileFactory_FullFileName(ByVal RHS As String)
    this.FullFileName = RHS
End Property

Private Property Get IBinaryDataFileFactory_FullFileName() As String
    IBinaryDataFileFactory_FullFileName = this.FullFileName
End Property

Private Sub Init(ByVal argInstance As IBinaryDataFileFactory, ByVal argFullFileName As String, ByVal argOpenAccess As EFileAccess)
    Set this.FSO = VBA.CreateObject("Scripting.FileSystemObject")
    If IsValidPathAndFileName(argFullFileName) Then
        With argInstance
            .FullFileName = this.FSO.GetAbsolutePathName(argFullFileName)
            Set .FSO = this.FSO
            .FileOpenFor argOpenAccess
            Set this.FSO = Nothing
        End With
    Else
        RaiseError
    End If
End Sub

Private Function IsValidPathAndFileName(ByVal argPath As String) As Boolean
    If this.FSO.FileExists(argPath) Then
        IsValidPathAndFileName = True
    Else
        If Not this.FSO.FolderExists(argPath) Then
            If this.FSO.FolderExists(this.FSO.GetParentFolderName(argPath)) Then
                If IsFileCreatable(argPath) Then
                    IsValidPathAndFileName = True
                Else
                    ' ERROR caused by either:
                    '   - denial of write access in folder
                    '   - use of invalid characters, like  <>*?":/\|
                    '   - long paths are disabled and/or path length exceeds MAX_PATH
                    InitErrorMembers WriteAccessIssue, argPath
                End If
            Else
                InitErrorMembers FolderDoesNotExist, argPath
            End If
        Else
            InitErrorMembers FileEqualsFolder, argPath
        End If
    End If
End Function

Private Function IsFileCreatable(ByVal argFullFileName As String) As Boolean
    Dim TempHandle As Long
    TempHandle = VBA.FreeFile
    On Error Resume Next
    Open argFullFileName For Binary Access Read Write Lock Read Write As #TempHandle
    IsFileCreatable = Not VBA.CBool(VBA.Err.Number)
    this.ErrorNumber = VBA.Err.Number
    this.ErrorDescription = VBA.Err.Description
    Close #TempHandle
    VBA.Kill argFullFileName
End Function

Private Function FileOpenFor(ByVal argFullFileName As String, ByVal argAccess As EFileAccess) As Integer
    Dim FileAccessAttempt As String
    FileOpenFor = VBA.FreeFile
    On Error GoTo SUB_ERROR
    Select Case argAccess
        Case ForExclusive
            FileAccessAttempt = " (for Exclusive Access)"
            ' Open/Create file with exclusive access; other processes are totally locked out until file is closed
            Open argFullFileName For Binary Access Read Write Lock Read Write As #FileOpenFor
        Case ForReading
            FileAccessAttempt = " (for Reading)"
            ' Open file just for reading; other processes may gain read & write access but cannot(!) delete this file until it's closed
            Open argFullFileName For Binary Access Read Shared As #FileOpenFor
        Case ForReadingAndWriting
            FileAccessAttempt = " (for Reading & Writing)"
            ' Open/Create file for reading & writing; other processes can only gain read access (and cannot delete this file until it's closed)
            Open argFullFileName For Binary Access Read Write Lock Write As #FileOpenFor
    End Select
    GoTo SUB_EXIT
SUB_ERROR:
    InitErrorMembers FileOpenIssue, argFullFileName, FileAccessAttempt
    On Error GoTo 0
    RaiseError
SUB_EXIT:
End Function

Private Sub InitErrorMembers(ByVal argType As EErrorType, Optional ByVal argFileName As String, Optional ByVal argAccessType As String = "")
    Const ERRORINTRO As String = CLASSNAME & " object could not be created!" & vbNewLine & "Error on opening file"
    Select Case argType
    ' during creation of this BinaryDataFile Object
        Case FileEqualsFolder
            this.ErrorDescription = ERRORINTRO & argAccessType & ": " & argFileName & vbNewLine & "A folder with the same name already exists"
            this.ErrorNumber = VBA.vbObjectError + 52
        Case FolderDoesNotExist
            this.ErrorDescription = ERRORINTRO & argAccessType & ": " & argFileName & vbNewLine & "One or more folders in this path do not exist"
            this.ErrorNumber = VBA.vbObjectError + 53
        Case WriteAccessIssue
            this.ErrorDescription = ERRORINTRO & argAccessType & ": " & argFileName & vbNewLine & this.ErrorDescription
            this.ErrorNumber = VBA.vbObjectError + this.ErrorNumber
   ' after creation of this BinaryDataFile Object
        Case FileOpenIssue
            this.ErrorDescription = ERRORINTRO & argAccessType & ": " & this.FullFileName & vbNewLine & Err.Description
            this.ErrorNumber = VBA.vbObjectError + Err.Number
        Case FileReadIssue
            this.ErrorDescription = CLASSNAME & " encountered error in reading file:" & vbNewLine & this.FullFileName & vbNewLine & Err.Description
            this.ErrorNumber = VBA.vbObjectError + Err.Number
        Case FileWriteIssue
            this.ErrorDescription = CLASSNAME & " encountered error in writing file:" & vbNewLine & this.FullFileName & vbNewLine & Err.Description
            this.ErrorNumber = VBA.vbObjectError + Err.Number
        Case CreateObjectError
            this.ErrorDescription = "This " & CLASSNAME & " object was not instantiated with its .Create method."
            this.ErrorNumber = VBA.vbObjectError + &H400
    End Select
End Sub

' ---------------------------------
'    PUBLIC Exposed Properties
' ---------------------------------

Private Property Let IBinaryDataFile_FilePointer(ByVal RHS As Long)
    GuardClause
    ConfirmFilePointer RHS
End Property

Private Property Get IBinaryDataFile_FilePointer() As Long
    GuardClause
    IBinaryDataFile_FilePointer = this.PointerAfter
End Property

Private Property Get IBinaryDataFile_IsEOF() As Boolean
    GuardClause
    IBinaryDataFile_IsEOF = IsEOF
End Property

Private Property Get IBinaryDataFile_FullFileName() As String
    GuardClause
    IBinaryDataFile_FullFileName = this.FullFileName
End Property

Private Property Get IBinaryDataFile_Size() As Long
    GuardClause
    IBinaryDataFile_Size = Size
End Property

' --------------------------------
'    PUBLIC Exposed Methods
' --------------------------------

Private Sub IBinaryDataFile_FileRead(ByVal argByteCount As Long, argBuffer As String)
    GuardClause
    On Error GoTo SUB_ERROR
    argByteCount = VBA.IIf(argByteCount > 0, argByteCount, 0)
    ConfirmFilePointer this.PointerAfter
    this.PointerBefore = this.PointerAfter
    ' prevent a possible overflow
    If VBA.CCur(this.PointerBefore) + VBA.CCur(argByteCount) > this.PointerMax Then
        ' if read attempt beyond EOF then omit trailing NULL bytes
        argByteCount = this.PointerMax - this.PointerBefore
    End If
    argBuffer = VBA.String$(argByteCount, VBA.Chr$(255))
    Get #this.FileHandle, this.PointerBefore, argBuffer
    this.PointerAfter = VBA.Seek(this.FileHandle)
    GoTo SUB_EXIT
SUB_ERROR:
    InitErrorMembers FileReadIssue, this.FullFileName
    On Error GoTo 0
    RaiseError
SUB_EXIT:
End Sub

Private Sub IBinaryDataFile_FileWrite(argBuffer As String)
    GuardClause
    On Error GoTo SUB_ERROR
    this.PointerBefore = this.PointerAfter
    Put #this.FileHandle, this.PointerBefore, argBuffer
    this.PointerAfter = VBA.Seek(this.FileHandle)
    GoTo SUB_EXIT
SUB_ERROR:
    InitErrorMembers FileWriteIssue, this.FullFileName
    On Error GoTo 0
    RaiseError
SUB_EXIT:
End Sub

Private Sub IBinaryDataFile_FileClose()
    FileClose
End Sub

' ---------------------
'  Other PRIVATE Stuff
' ---------------------

Private Sub ConfirmFilePointer(ByVal argNewPointer As Long)
    this.PointerMax = Size + 1
    If argNewPointer > this.PointerMax Then
        argNewPointer = this.PointerMax
    ElseIf argNewPointer < 1 Then
        argNewPointer = 1
    End If
    Seek #this.FileHandle, argNewPointer
    this.PointerAfter = argNewPointer
End Sub

Private Function Size() As Long
    If this.FSO.FileExists(this.FullFileName) Then
        If this.IsOpen Then
            Size = VBA.LOF(this.FileHandle)
        Else
            Size = VBA.FileLen(this.FullFileName)
        End If
    End If
End Function

Private Function IsEOF() As Boolean
    If this.IsOpen Then
        IsEOF = (Size <= VBA.Seek(this.FileHandle))
    End If
End Function

Private Sub GuardClause()
    If Not this.AmIValid Then
        InitErrorMembers CreateObjectError
        RaiseError
    End If
End Sub

Private Sub RaiseError()
    If this.ErrorNumber <> 0 Then
       VBA.Err.Raise this.ErrorNumber, CLASSNAME, this.ErrorDescription
    End If
End Sub

Private Sub FileClose()
    If this.IsOpen Then
        Close #this.FileHandle
        this.IsOpen = False
    End If
End Sub

Private Sub Class_Terminate()
    FileClose
    Set this.FSO = Nothing
End Sub
 
Upvote 0

Forum statistics

Threads
1,214,788
Messages
6,121,577
Members
449,039
Latest member
Arbind kumar

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top