CSV Parsing: Stray Comma Conundrum

nigelandrewfoster

Well-known Member
Joined
May 27, 2009
Messages
735
Hiya

I'm using the following function to open a UTF-8 csv file and save it safely as an xlsx file. I use this to preserve the foreign characters from corruption (which happens when I open it directly in Excel):

Code:
Function wsUCF8_CSV_Worksheet(strPathFilename As String, Optional strDestinationFilename As String) As Worksheet ' wb is set to the opened workbook

    Dim wb As Workbook
    
    Set wb = Workbooks.Add
    Set wsUCF8_CSV_Worksheet = wb.Worksheets(1)
    
    With wsUCF8_CSV_Worksheet
        With .QueryTables.Add(Connection:="TEXT;" & strPathFilename, Destination:=.Range("A1"))
             .TextFileParseType = xlDelimited
             .TextFilePlatform = 65001
             .TextFileCommaDelimiter = True
             .Refresh
        End With
    End With
    
    Remove_Workbook_Connections wb
    
    If Not IsMissing(strDestinationFilename) Then
        MakePath "C:\Temp"
        wb.SaveAs "C:\Temp\" & strDestinationFilename, FileFormat:=xlWorkbookDefault
    End If
    
End Function
It works great ... until I encounter an exported CSV that contains commas WITHIN a field (eg. a text field), then it of course splits the line at that point. I cannot see any way around this. How CAN one parse a csv file when a field in that file contains commas AS commas, rather than as a delimiting symbol? I guess the answer is that programmers shouldn't permit fields of csv files to contain (text) commas without some kind of wrapping? I'm talking about the file exports of some sizeable companies.
 
Last edited:

Some videos you may like

Excel Facts

Remove leading & trailing spaces
Save as CSV to remove all leading and trailing spaces. It is faster than using TRIM().

MoshiM

Active Member
Joined
Jan 31, 2018
Messages
349
Office Version
2016
Platform
Windows
IF you add the contents of the file to an array do you know precisely which comma might cause you issues and the number of elements that would be considered abnormal? If so then you do something like the following
Example
Code:
   For i = 0 To UBound(strData)    
        If Len(strData(i)) - Len(Replace(strData(i), ",", "")) >= 191 Then 'checks for the number of columns[-1 less than total since array starts at 0]
            
            strData(i) = ReplaceN(strData(i), ",", "", 186) '(nth comma of interest - 1 ) is 186
        
        End If
    
    Next i
'Can't remember where I found this function
Code:
Function ReplaceN(ByVal str1 As Variant, strFind As String, strReplace As String, N As Long, Optional Count As Long) As String
Dim i As Long, j As Long
Dim strM As String
strM = str1
If Count <= 0 Then Count = 1
For i = 1 To N - 1
    j = InStr(1, strM, strFind) 'char position of the first occurence
    strM = Mid(strM, j + Len(strFind), Len(strM)) 'string that omits the characters before and including j but has the length of the original string
Next i 'this loop will continue to section the string until it has done so N-1 times
If N <= 0 Then
    ReplaceN = str1
Else
    ReplaceN = Mid(str1, 1, Len(str1) - Len(strM)) & Replace(strM, strFind, strReplace, Start:=1, Count:=Count)

End If

End Function
now split the string
 
Last edited:

nigelandrewfoster

Well-known Member
Joined
May 27, 2009
Messages
735
Thanks very much for reply and input. I will let you know once I have had a chance to examine the code. I think I need to examine the troublesome file in greater detail using a different method to find out which combination of characters exactly is causing the additional CRs.
 

MoshiM

Active Member
Joined
Jan 31, 2018
Messages
349
Office Version
2016
Platform
Windows
Thanks very much for reply and input. I will let you know once I have had a chance to examine the code. I think I need to examine the troublesome file in greater detail using a different method to find out which combination of characters exactly is causing the additional CRs.
Just to be sure is it a .csv or a comma delimited .txt file?
 

MoshiM

Active Member
Joined
Jan 31, 2018
Messages
349
Office Version
2016
Platform
Windows
Thanks very much for reply and input. I will let you know once I have had a chance to examine the code. I think I need to examine the troublesome file in greater detail using a different method to find out which combination of characters exactly is causing the additional CRs.
If you don't want to open the .csv in the background and read the contents from the sheet and if the elements inside the .csv file when you use notepad that are supposed to have commas are contained within quotation marks then the following untested function when given a string should return a string with commas that aren't in between quotation marks replaced with question marks.
Code:
Function Change_Comma_Delimiter_Not_Between_Quote(Current_String As String)

Dim Current_String As String, Comma_Here As New Dictionary, Location_Key As Long, Comma_Location As Long, T As Long, _
Comma_Location_Outside_Quotation As Long, Empty_Array_Element_Count As Long


Set Comma_Here = CreateObject("Scripting.Dictionary")


Array_String = Split(Current_String, Chr(34)) 'Split text with a Quotation mark


Location_Key = 1


For X = LBound(Array_String) To UBound(Array_String)
            
        If Array_String(X) = Chr(44) Then 'if the array element is a comma


            For T = 0 To X
                
                If IsEmpty(Array_String(T)) = True Then 'if element is empty
                
                    Empty_Array_Element_Count = Empty_Array_Element_Count + 1
                
                Else
                'gets the length of each preceding element and the current one to determine the location
                Comma_Location_Outside_Quotation = Len(Array_String(T)) + Comma_Location_Outside_Quotation + Empty_Array_Element_Count
                                                                           
                End If
            
            Next T
            
            Comma_Here.Add Location_Key, Comma_Location_Outside_Quotation
            
            Location_Key = Location_Key + 1
            
        End If
    
Next X


For Each element In Comma_Here.Keys


    Current_String = Replace(Current_String, Chr(44), "?", Comma_Here.Items(element), 1)
    'replace 1 comma starting at the location indicated by the dictionary item
    
Next element


Change_Comma_Delimiter_Not_Between_Quote = Current_String




End Function
 
Last edited:

Rick Rothstein

MrExcel MVP
Joined
Apr 18, 2011
Messages
35,937
Office Version
2010
Platform
Windows
I'm using the following function to open a UTF-8 csv file and save it safely as an xlsx file. I use this to preserve the foreign characters from corruption (which happens when I open it directly in Excel):

It works great ... until I encounter an exported CSV that contains commas WITHIN a field (eg. a text field), then it of course splits the line at that point. I cannot see any way around this.
I don't work with UTF-8 text myself, so I don't know if the code in my mini-blog article here...

http://www.excelfox.com/forum/showt...Ignores-Delimiters-Located-Inside-Quote-Marks

will work with that type of text or not, but it does work with "normal" text. Give it a try and let me know. Oh, I should mention, the function posted in that article returns a String array split on the specified delimiter but which ignores splitting on the specified delimiter if it is inside quote marks.
 
Last edited:

Watch MrExcel Video

Forum statistics

Threads
1,099,684
Messages
5,470,116
Members
406,681
Latest member
sachinmasurkar

This Week's Hot Topics

Top