Trouble with .CSV file

NotASuperGenius

Board Regular
Joined
Jun 17, 2002
Messages
62
I am working with a .CSV file that is generated upstream from me. It contains 1500 rows of data, each row has 6 columns. The first column is "Company Name", followed by three columns of dates and then some performance numbers.

The problem I am having is that the file is comma delimited and some of the company names include commas. As a result, when I open the .CSV file in Excel about 10% of the rows are misaligned.

I have been manually correcting this but there has to be a better way. I cannot make changes to the processing upstream (change the delimiter or to forbid commas in the company name).

Suggestions appreciated - Cheers, NASG
 

Excel Facts

Difference between two dates
Secret function! Use =DATEDIF(A2,B2,"Y")&" years"&=DATEDIF(A2,B2,"YM")&" months"&=DATEDIF(A2,B2,"MD")&" days"
Partial solution was the edit the file in Notepad. Doing a find/replace for ", " - that is, comma and then a space - removed all but a few. Not perfect but good enough. Viva la Notepad.
 
Upvote 0
Most programs that generate CSV files have an option to include text delimiters, if they are any good! (see here for details: http://en.wikipedia.org/wiki/Comma-separated_values). Otherwise they are pretty worthless. So you may want to check with the "upstream" process, and ask them to include text delimiters.

Otherwise, how can you differentiate commas that are delimiters from commas that are actually data? If you can't come up with a "hard-and-fast" rule logically, you sure aren't going to be able to program anything (programming isn't magic, it can only do what you tell it to do).

Now, if only this company name can have commas in it (and none of the other fields), you may be able to do something with it by importing each row with no delimiters so that all data gets put in column A. Then you may be able to work backwards and use string functions like FIND, LEFT, RIGHT, and MID to pull out the data you need, i.e.

The data after the last comma is Field6.
The data between the last comma and second to last comma is Field5.
...
and so on until you get to Field2.
Then you know everything left over is Field1.
 
Upvote 0
All right, assuming what I said it true, the only legitimate commas which are data are in the first field (company name), here is a macro that I wrote that will import the data into Excel and put everything in the correct column.

All you do is run the macro, select the file name, and it does the rest!
Code:
Sub MySplitCSV()
 
    Dim myFileName
    Dim myConnection As String
    Dim myNumFields As Long
    Dim myLastRow As Long
    Dim i As Long
    Dim j As Long
    Dim myValue As String
    Dim myLen As Long
    Dim myLastComma As Long
    Dim myCurComma As Long
    
    Application.ScreenUpdating = False
        
'   Enter number of fields
    myNumFields = 6
    
'   Browse to file to import
    myFileName = Application.GetOpenFilename("All CSV Files,*.CSV")
    If myFileName = False Then
        Exit Sub
    End If
    
'   Import file
    myConnection = "TEXT;" & myFileName
    Workbooks.Open Filename:=myFileName
    ActiveWindow.Close
    Range("A1").Select
    With ActiveSheet.QueryTables.Add(Connection:=myConnection, _
        Destination:=Range("$A$1"))
        .Name = "records3"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .TextFilePromptOnRefresh = False
        .TextFilePlatform = 437
        .TextFileStartRow = 1
        .TextFileParseType = xlDelimited
        .TextFileTextQualifier = xlTextQualifierNone
        .TextFileConsecutiveDelimiter = False
        .TextFileTabDelimiter = False
        .TextFileSemicolonDelimiter = False
        .TextFileCommaDelimiter = False
        .TextFileSpaceDelimiter = False
        .TextFileColumnDataTypes = Array(1)
        .TextFileTrailingMinusNumbers = True
        .Refresh BackgroundQuery:=False
    End With
   
'   Find last row
    myLastRow = Range("A1").End(xlDown).Row
    
'   Loop through each row
    For i = 1 To myLastRow
'       Get length of field
        myValue = Cells(i, "A")
        myLen = Len(myValue)
        myLastComma = myLen
'       Loop through fields
        For j = 1 To 6
            myCurComma = InStrRev(myValue, ",")
'           Populate cells
            If j < myNumFields Then
                Cells(i, myNumFields + 2 - j) = Mid(myValue, myCurComma + 1, myLastComma - myCurComma)
                myLastComma = myCurComma
                myValue = Left(myValue, myCurComma - 1)
            Else
                Cells(i, myNumFields + 2 - j) = Left(myValue, myLastComma)
            End If
        Next j
    Next i
    
'   Delete column A
    Columns("A:A").Delete Shift:=xlToLeft
    
    Application.ScreenUpdating = True
    
End Sub
 
Upvote 0
Joe - Thank you. Works like a charm. Hadn't thought about approacing the problem from the right. Quick question, the CSV file has 2 rows of data (4 columns) preceeding the problem data. How would I tweak your macro to account for this?

Cheers - NASG
 
Upvote 0
Are you saying that the problem column is not the left-most one, but in the middle somewhere?

Then you will probably want to come from the right for however many fields there are to the right of this problem field, come from the left for however columns there are to the left of this field, and dump the leftover in as this field.
 
Upvote 0
Another thought is insteading of breaking it up piece-by-piece, use the same logic coming from the left and right and insert your own Text Qualifer in it. Then you can use Text-to-Columns to split the columns easily once you have the appropriate text qualifer.
 
Upvote 0
Sorry, difficult to convey. The head of the file looks like

Code:
Added,Inactive,Updated,Queries
1602,28,34,83
Company Name,Record #,Xref,Created,Modified,Hits
18 Miles, LLC,5721,N/A,11/10/2008 10:10:46,11/10/2008 10:10:46,0
Elite Ltd,4546,N/A,07/21/2008 09:51:45,07/21/2008 09:51:45,0
MG COMPANY,9996,3525,05/24/2005 02:55:11,05/24/2005 02:55:11,0
ALL Red, Inc.,9590,N/A,03/29/2006 09:11:36,03/29/2006 09:11:36,0
BIG GROUP, INC.,0400,N/A,05/24/2005 10:34:28,05/26/2005 10:18:12,0
AGR INC.,8100,9251,05/25/2005 09:05:11,05/25/2005 09:05:11,0
TT Technologies, Inc.,5686,N/A,12/15/2008 02:09:06,12/15/2008 02:09:06,0
 
Upvote 0
You could skip the first two lines by updating the loop that cycles through the rows, i.e.
Code:
    For i = 3 To myLastRow
Or if you want to include them, and want them separated by columns, just add two commas to the end of each before the loops, i.e.

Code:
Range("A1")=Range("A1") & ",,"
Range("A2")=Range("A2") & ",,"
...
Then the code should work fine against it.
 
Upvote 0

Forum statistics

Threads
1,213,489
Messages
6,113,953
Members
448,535
Latest member
alrossman

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top