[Help] Open a huge csv then save it into XLSX with every 1M row split into different worksheets

SevenDP

New Member
Joined
Mar 10, 2016
Messages
21
Hi Mr Excel Expert,
Please help ...
I have huge CSV file with about 4 Million rows.
I need macro to open that CSV then split it every 1 Million rows into different worksheets but still in one XLSX file.
So with 4M rows the result is one file XLSX with 4 worksheets each got 1M rows.
Because XLSX's worksheets only support about 1 Million rows.

Thanks in advance
 

Some videos you may like

Excel Facts

Remove leading & trailing spaces
Save as CSV to remove all leading and trailing spaces. It is faster than using TRIM().

shknbk2

Board Regular
Joined
Mar 5, 2016
Messages
246
Here's a try. The code is pasted below. The VB editor needs to have a reference to the Microsoft Scripting Runtime in order to make this work (Tools->References).

I didn't want the array holding the data in the CSV file to be too big, so I've capped it at 10,000 items. That array repeats until 1,000,000 lines have been read, and then a new sheet is added.

At the end of each sheet, row A is converted to columns. If it doesn't behave properly based on the data you have, it can be tweaked.

Code:
Sub openCSV()
    Dim ws As Worksheet
    Dim a As Variant
    Dim filename As String
    Dim fso As FileSystemObject
    Dim t As TextStream
    Dim l As Long, lMil As Long, aCount As Long

    
    filename = Application.GetOpenFilename("CSV File (*.csv),*.csv")
    If filename = "False" Then Exit Sub
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set t = fso.OpenTextFile(filename, ForReading)
    
    Application.ScreenUpdating = False
    aCount = 0
    Do
        ReDim a(10000)
        If aCount >= 999999 Then
            Range("A1", Cells(Rows.Count, 1).End(xlUp)).TextToColumns Range("A1"), xlDelimited, xlTextQualifierDoubleQuote, , , , True
            Set ws = Sheets.Add(, Sheets(Sheets.Count))
            aCount = 0
        End If
        l = 0
        Do While Not t.AtEndOfStream And l < 10000
            a(l) = t.ReadLine
            l = l + 1
        Loop
        Range("A" & aCount + 1).Resize(UBound(a)).Value = Application.Transpose(a)
        aCount = aCount + 10000
    Loop While Not t.AtEndOfStream
    Range("A1", Cells(Rows.Count, 1).End(xlUp)).TextToColumns Range("A1"), xlDelimited, xlTextQualifierDoubleQuote, , , , True
    t.Close
    Set fso = Nothing
    Set t = Nothing
    Application.ScreenUpdating = True
End Sub
 
Last edited:

SevenDP

New Member
Joined
Mar 10, 2016
Messages
21
Hi shknbk2

Thanks for help. I already tried it but when i running it and choose a csv file... suddenly I got a Run-time error '13' Type Mismatch
in line (bold red):

Rich (BB code):
    .....
        Do While Not t.AtEndOfStream And l < 10000
            a(l) = t.ReadLine
            l = l + 1
        Loop
        Range("A" & aCount + 1).Resize(UBound(a)).Value = Application.Transpose(a)
        aCount = aCount + 10000
    ......


But when i remove the header row, the script running for some time then stop several times with error Run-time error '1004' TextToColumns method of Range class failed
in line (bold red). then after the first error when i press F5 to continue the script got this prompt: Do you want to replace the contents of the destination cells?
I still can continue with that error ... got several 1004 errors, and when i check the result got some blank rows ....
Rich (BB code):
        .....
        ReDim a(10000)
        If aCount >= 999999 Then
            Range("A1", Cells(Rows.Count, 1).End(xlUp)).TextToColumns Range("A1"), xlDelimited, xlTextQualifierDoubleQuote, , , , True
            Set ws = Sheets.Add(, Sheets(Sheets.Count))
            aCount = 0
        End If
        ......


Can my data caused that error ? Here is sample of data:
Rich (BB code):
"Snapshot Date";"Snapshot Time";"Plant";"Material - Key";"Material";"Segment";"Lines";"Planning segment num";"Element id";"Element Data";"Element";"Planned date";"Plus/minus";"Exception Message";"Rescheduling Date";"Location";"UoM";"Rec/Req Qty";"Available Qty"
"2016/03/10 00:00:00";"02:00:17";"100";"X0-ACRDNH-000";"ACRYLIC DECORATIVE";"2";"1";"#";"WB";"#";"STOCK";"2016/03/10 00:00:00";"B";"Not assigned";"";"1200/Not assigned";"each";"1";"1"
"2016/03/10 00:00:00";"02:00:17";"100";"X0-BRD001-0K61K5";"BOARD UK.1X0.6X1.5M";"2";"1";"#";"WB";"#";"STOCK";"2016/03/10 00:00:00";"B";"Not assigned";"";"1200/Not assigned";"UNIT";"0";"0"

The bold fonts is header, and there got 2 rows example data.
And FYI I was running it on Excel 2007. (Got Excel 2010 in office but I can try it in office on Monday)

Thanks in advance~
 

shknbk2

Board Regular
Joined
Mar 5, 2016
Messages
246
Ok. New macro to handle the semicolons and modify the array handling:

Code:
Sub openCSV()
    Dim ws As Worksheet
    Dim a() As Variant, aT() As Variant
    Dim filename As String
    Dim fso As FileSystemObject
    Dim t As TextStream
    Dim l As Long, lMil As Long, aCount As Long
    Dim boundA As Long


    filename = Application.GetOpenFilename("CSV File (*.csv),*.csv")
    If filename = "False" Then Exit Sub
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set t = fso.OpenTextFile(filename, ForReading)
    
    Application.ScreenUpdating = False
    aCount = 0
    boundA = 0
    Do
        If aCount >= 999999 Then
            Range("A1", Cells(Rows.Count, 1).End(xlUp)).TextToColumns Range("A1"), xlDelimited, xlTextQualifierDoubleQuote, , , , True
            Set ws = Sheets.Add(, Sheets(Sheets.Count))
            aCount = 0
            boundA = 0
        End If
        Do While Not t.AtEndOfStream And boundA < 10000
            ReDim Preserve a(boundA)
            a(boundA) = Replace(t.ReadLine, ";", ",")
            boundA = boundA + 1
        Loop
        ReDim aT(boundA - 1, 0)
        For l = 0 To boundA - 1
            aT(l, 0) = a(l)
        Next
        Range("A" & aCount + 1).Resize(boundA).Value = aT
        aCount = aCount + boundA
    Loop While Not t.AtEndOfStream
    Range("A1", Cells(Rows.Count, 1).End(xlUp)).TextToColumns Range("A1"), xlDelimited, xlTextQualifierDoubleQuote, , , , True
    t.Close
    Set fso = Nothing
    Set t = Nothing
    Application.ScreenUpdating = True
End Sub

Edit: After this gets working, we can get the header on each page. This code won't do that yet.
 
Last edited:

SevenDP

New Member
Joined
Mar 10, 2016
Messages
21

ADVERTISEMENT

Hi shknbk2

I already tried the new script on my office PC with Excel 2013 32bit.
And I get this error after few minutes run
error_2.png

My office PC is Win 7 Profesional 64bit with 4GB ram only

Thanks for the update

Cheers!
 

shknbk2

Board Regular
Joined
Mar 5, 2016
Messages
246
Change "boundA < 10000" to a lower and lower number until it stops. Try 1000. It will take a little longer, though.
 

SevenDP

New Member
Joined
Mar 10, 2016
Messages
21

ADVERTISEMENT

Hi shknbk2
I tried with "boundA < 1000" still no luck .... get same error message like picture above ... even if i change 999.999 to 499.999 (500.000 rows each worksheet) I still got that error message too ....
tried with "boundA < 500" after some time still got that error too ...
but if i tried with csv with about 200.000 rows your script is doing well ...
maybe if 1M rows each worksheet was too much (with only 4GB ram and 32bit Excel&Windows), i wish i can split that 4M rows csv to 500.000 rows each worksheet ....
is there any way to release the memory after write it to worksheet ? :)

thanks!
 

shknbk2

Board Regular
Joined
Mar 5, 2016
Messages
246
Here is modified code below. However, see if your Excel can handle that much info. Since the 200,000 rows works, what happens if you copy that sheet multiple times within the workbook? Does is copy? Does it save and re-open? If you can handle 20 sheets copied, then you should be able to handle the equivalent of 4M lines.

In the code, I've changed where to change the value to reduce memory size.
  1. readLimit<strike></strike> is now what you were changing before with boundA < 10000. readLimit is the 10000 or whatever number, so the code now checks to see if boundA < readLimit.<strike></strike>
  2. sheetLimit is a new number setting the maximum number of rows per page.

readLimit is 10 and sheetLimit is 200000 in this code below, but you can change them.

Code:
Sub openCSV()
    Dim ws As Worksheet
    Dim a() As Variant, aT() As Variant
    Dim filename As String
    Dim fso As FileSystemObject
    Dim t As TextStream
    Dim l As Long, lMil As Long, aCount As Long
    Dim boundA As Long
    
    '*************************************************************
    '* This sets how many CSV lines are read into memory before
    '* putting the values in the sheet
    '*************************************************************
    Const readLimit As Integer = 10
    
    '*************************************************************
    '* This sets how many rows there are on each sheet
    '*************************************************************
    Const sheetLimit As Long = 200000


    filename = Application.GetOpenFilename("CSV File (*.csv),*.csv")
    If filename = "False" Then Exit Sub
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set t = fso.OpenTextFile(filename, ForReading)
    
    Application.ScreenUpdating = False
    aCount = 0
    Do
        boundA = 0
        ReDim a(boundA)
        If aCount >= sheetLimit - 1 Then
            Range("A1", Cells(Rows.Count, 1).End(xlUp)).TextToColumns Range("A1"), xlDelimited, xlTextQualifierDoubleQuote, , , , True
            Set ws = Sheets.Add(, Sheets(Sheets.Count))
            aCount = 0
        End If
        Do While Not t.AtEndOfStream And boundA < readLimit
            ReDim Preserve a(boundA)
            a(boundA) = Replace(t.ReadLine, ";", ",")
            boundA = boundA + 1
        Loop
        ReDim aT(boundA - 1, 0)
        For l = 0 To boundA - 1
            aT(l, 0) = a(l)
        Next
        Range("A" & aCount + 1).Resize(boundA).Value = aT
        aCount = aCount + boundA
        DoEvents
    Loop While Not t.AtEndOfStream
    Range("A1", Cells(Rows.Count, 1).End(xlUp)).TextToColumns Range("A1"), xlDelimited, xlTextQualifierDoubleQuote, , , , True
    t.Close
    Set fso = Nothing
    Set t = Nothing
    Application.ScreenUpdating = True
End Sub
 

SevenDP

New Member
Joined
Mar 10, 2016
Messages
21
Hi ....
Already test new script with default value
Const readLimit As Integer = 10

Const sheetLimit As Long = 200000

but after some minutes run I got TextToColumns method of Range error again ....

Thanks for the update

Cheers!
 

shknbk2

Board Regular
Joined
Mar 5, 2016
Messages
246
Comment out the lines that have TextToColumns and re-run. This will see if the code will at least put in all of the data into the cells. If it works, you can go back and manually convert to columns.
 

Watch MrExcel Video

Forum statistics

Threads
1,122,564
Messages
5,596,863
Members
414,106
Latest member
Tigretto

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top