Reduce processing time for splitting huge file into 25 smaller files

Factotum

Board Regular
Joined
May 14, 2015
Messages
118
I've got a massive file with as many as 70,000 rows of data and four pivot tables. The initial file will probably be around 35-40 MB. I need to split this report into 25 different files based on a department identifier. Given the size of the file, there will be considerable time waiting for Excel to keep up.

I've done several projects like this on a smaller scale, so my question is: what is the fastest method to use to keep processing time to a minimum? The options I have used in the past include:


  1. Filter to one department, copy visible rows, paste to an existing template, save and close template, repeat
  2. Open original file, Save As, filter out department to keep, delete all remaining rows, clear filter, save and close, repeat

Both options have their drawbacks - it will take forever to copy and paste that many rows, but it also takes a long time to reopen such a big file 25 times.

Does anyone have any suggestions for a quicker method for splitting up a report like this? Thanks for any ideas!
 

Excel Facts

Links? Where??
If Excel says you have links but you can't find them, go to Formulas, Name Manager. Look for old links to dead workbooks & delete.
Have you considered loading the entire worksheet into a variant array and then doing all the processing in memory by copying from one variant array to another variant array then writing out the output variant aray. This technique is usually 100 to 1000 faster than any method that does the work on the worksheet. This is because the main reason for slow VBA is the time accessing the worksheet, so any steps you take to minimise the number of accesses to the worksheet is benificial tiemwise.
 
Upvote 0
You've switched to a language that I don't yet speak - teach me please! I love the idea of what you're proposing. Any recommendations on how to get started or any sources you could refer me to?
 
Upvote 0
Hi Fatso

I second @offthelip - for large data sets, once you understand how to process data in arrays (and also with the dictionary object), you will never go back to processing data in the worksheet.

The most comprehensive resource I have come across is by Paul Kelly at https://excelmacromastery.com/excel-vba-array/

Have a look initially at this subsection of it: https://excelmacromastery.com/excel-vba-array/#How_To_Make_Your_Macros_Run_at_Super_Speed

I was called in at work to see what I could do about working on a large data set that had to be manipulated daily. Because it was both huge and used worksheet formulae, it took about 4 hours (yes: 240 minutes) to calculate (let alone, opening, closing and saving times). With arrays incorporating the dictionary object, I had it down to less than one minute (60 seconds). It was a saving of about 0.5 FTE per annum.

They take a bit of understanding, but learn to love them and they will give back in spades !

Cheers

pvr928

PS Paul Kelly's entire site is the most comprehensive, user friendly, deep resource on Excel I have come across.
 
Last edited:
Upvote 0
Thank you both! I'm not sure how quickly I'll pick up on this concept, but any chance to learn this stuff is very exciting to me. What you're suggesting could possibly eliminate 12-15 hours per month, and in our understaffed and overworked office - this will be a huge help! Thank you again!
 
Upvote 0
Hi @Fatso

Kudos for being willing to take the challenge to learn. Perhaps think of arrays as simply recreating or creating a table in memory, either one dimensional (but really two) as a single column (x rows, 1 column), or two dimensional (x rows, y columns).

I'm no expert, but here is the way I tend to go about creating arrays (populate a table in Range(A1:D10) in Sheet1 with numbers to demonstrate):

Code:
Option Explicit
Sub ArrayExample()
Dim ws As Worksheet
Dim rRange As Range
Dim aArray()
Set ws = Sheet1
Set rRange = ws.Range("A1:D10")
'Method 1
aArray = rRange 'Directly passes the range to the array
Debug.Print "Direct " & LBound(aArray, 1) 'Returns 1 (1 'row')
Debug.Print "Direct " & UBound(aArray, 1) 'Returns 10 (10 'rows')
Debug.Print "Direct " & LBound(aArray, 2) 'Returns 1 (1 'column')
Debug.Print "Direct " & UBound(aArray, 2) 'Returns 4 (4 'columns')
Debug.Print "Value in Array(1,1) (Direct Method): " & aArray(1, 1)
Debug.Print "Value in Array(10,4) (Direct Method): " & aArray(10, 4)
'Method 2
Dim lLastRow As Integer
Dim lLastCol As Integer
Dim i As Integer
Dim j As Integer
lLastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
lLastCol = ws.Cells(1, ws.Columns.Count).End(xlToLeft).Column
ReDim aArray(1 To lLastRow, 1 To lLastCol)
'You must declare the 'size' of the array using this method.
'If you do not know the size of the array prior to initialising it, use the 'ReDim Preserve' statement. However, you can only alter the second dimension in this case
For i = 1 To lLastRow
    For j = 1 To lLastCol
    
        aArray(i, j) = rRange.Cells(i, j)
        
    Next j
    
Next i
Debug.Print "Loop " & LBound(aArray, 1) 'Returns 1 (1 'row')
Debug.Print "Loop " & UBound(aArray, 1) 'Returns 10 (10 'rows')
Debug.Print "Loop " & LBound(aArray, 2) 'Returns 1 (1 'column')
Debug.Print "Loop " & UBound(aArray, 2) 'Returns 4 (4 'columns')
Debug.Print "Value in Array(1,1) (Loop Method): " & aArray(1, 1)
Debug.Print "Value in Array(10,4) (Loop Method): " & aArray(10, 4)
'Multiply each element of array by 2
For i = 1 To lLastRow
    For j = 1 To lLastCol
    
        aArray(i, j) = aArray(i, j) * 2
        
    Next j
    
Next i
'Print out result back to worksheet
ws.Range("F1").Resize(lLastRow, lLastCol) = aArray
End Sub

Cheers

pvr928
 
Upvote 0
@pvr928 I do note that you are say you are not an expert, and that is shown be your method 2 example, this is NOT the way to use arrays: You define an array and the load it by individually accessing every cell on the worksheet , ugh horrible.
One of the main reasons that Vba is slow is the time taken to access the worksheet from VBa is a relatively long time.
To speed up vba the easiest way is to minimise the number of accesses to the worksheet. What is interesting is that the time taken to access a single cell on the worksheet in vba is almost identical as the time taken to access a large range if it is done in one action.
So instead of writing a loop which loops down a range copying one row at a time which will take along time if you have got 50000 rows it is much quicker to load the 50000 lines into a variant array ( one worksheet access), then copy the lines to a variant array and then write the array back to the worksheet, ( one worksheet access for each search that you are doing),
I have a simple rule for fast VBA: NEVER ACCESS THE WORKSHEET IN A LOOP.

I have written these two routines both of which do the same (meaningless) calculation, try them on your own machine to see the difference in speed.
Code:
'The two subroutines below take about the same amount of time on my machine ( about 2.2 seconds) however the one using variant arrays does exactly the same thing 500 times. i.e it is 500 times faster!!! I think everybody will agree that is a significant improvement.
Sub slow()
tt = Timer()
'initialise
 For j = 1 To 10
  Cells(j, 1) = 0
 Next j
For i = 1 To 1000
 For j = 1 To 10
  Cells(j, 1) = Cells(j, 1) + 1
 Next j
Next i
MsgBox (Timer() - tt)




End Sub




Sub fast()
tt = Timer()
For k = 1 To 500
'initialise
 For j = 1 To 10
  Cells(j, 1) = 0
 Next j
inarr = Range(Cells(1, 1), Cells(10, 1))
For i = 1 To 1000
 For j = 1 To 10
  inarr(j, 1) = inarr(j, 1) + 1
 Next j
Next i
Range(Cells(1, 1), Cells(10, 1)) = inarr
Next k


MsgBox (Timer() - tt)


End Sub
@Fatso
I have written the following code that loads data from sheet1 and then searches through it for a string in column A and copies everyline to sheet 2, all done in a flash

Code:
Sub test()
Dim outarr As Variant


With Worksheets("sheet1")
lastrow = .Cells(Rows.Count, "A").End(xlUp).Row
lastcol = .Cells(1, .Columns.Count).End(xlToLeft).Column
inarr = Range(.Cells(1, 1), .Cells(lastrow, lastcol))
End With
indi = 1
ReDim outarr(1 To lastrow, 1 To lastcol)
For i = 1 To lastrow
 If inarr(i, 1) = "as" Then
  ' copy row to outarr
   For j = 1 To lastcol
    outarr(indi, j) = inarr(i, j)
   Next j
   indi = indi + 1
 End If
Next i
With Worksheets("sheet2")
Range(.Cells(1, 1), .Cells(lastrow, lastcol)) = outarr
End With




End Sub
 
Upvote 0
Fatso

Is all the data that's to be split up on one sheet?

How is the data organised?

How are you splitting the data up?

Is the department identifier in a specific column?

Is the template sheet formatted the same as the sheet with the source data?
 
Upvote 0
@offthelip, I appreciate all the help, but I'm having a hard time applying what I'm reading. Paul Kelly's articles have been very informative, and I think your Sub test() code is really close to what I need, but I'm having trouble understanding and applying it. Could you help me tailor it a bit to my project?

My file goes out to Column BI and the length varies each time.
Column AY contains the criteria (department) used to split up the report.

I'd like to do the following:
  • Load any lines containing "Pacific Area" in column AY to an array or collection (not sure which is best in this case)
  • Copy them to an existing worksheet/template that has the same structure as the one the data is pulled from

This is essentially a copy/paste from one sheet to another, but it sounds like using an array will get this done much faster. Any tips?
 
Upvote 0
This is all you need to modify to do the copy from Sheet 1 to sheet 2
there is a useful excel function : Column() I use this all the time with variant arrays., Because you have to address variant arrays with a number index ( rather that letters for columns), It gives you the column number in the cell you put it.
using this I find Column AY is column 51,
So the modification to do what you want is very minor:
Code:
Sub test()
Dim outarr As Variant
With Worksheets("sheet1")
lastrow = .Cells(Rows.Count, "A").End(xlUp).Row
lastcol = 61  ' you could leave this as it was it would still work but column 61 is column BI
inarr = Range(.Cells(1, 1), .Cells(lastrow, lastcol))
End With
indi = 1
ReDim outarr(1 To lastrow, 1 To lastcol)
For i = 1 To lastrow
 If inarr(i, 51) = "Pacific Area" Then  ' column 51 is column AY  ' copy row to outarr
   For j = 1 To lastcol
    outarr(indi, j) = inarr(i, j)
   Next j
   indi = indi + 1
 End If
Next i
With Worksheets("sheet2")
Range(.Cells(1, 1), .Cells(indi, lastcol)) = outarr
End With
End Sub
 
Last edited:
Upvote 0

Forum statistics

Threads
1,213,544
Messages
6,114,239
Members
448,555
Latest member
RobertJones1986

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top