Speedier Alternatives to SUMIFS

Tarver

Board Regular
Joined
Nov 15, 2012
Messages
109
Office Version
  1. 365
Platform
  1. Windows
  2. MacOS
I have a fairly large dataset (350,000 rows) for which I need to sum data for 12,000 customers. Using SUMIFS is a long process and can take up to 10 minutes, because the same calculations are being repeated over and over again.

Is there a faster way?

Attached is a look at a sample of my data and a small table of the output I'm attempting to create. Your ideas on a faster way to calculate this result are welcomed! Thanks!

Predictive AR Model v1.xlsx
ABCDEFG
1Account noPosting dateDocument TypeAmount
210000967/1/2021SI$955.00Start Date7/5/2021
310000847/2/2021SI$960.00End Date7/16/2021
410000847/2/2021SI$1,395.00
510000847/2/2021SI$793.95CustomerAmount
610000847/2/2021SI$1,700.001000052$0.00
710000847/2/2021SI$375.001000084$20,428.00
810000847/2/2021SI$360.001000096$0.00
910000847/2/2021SI$1,235.00
1010000847/2/2021SI$960.00
1110000847/9/2021SI$740.35
1210000847/13/2021SI$618.00
1310000847/13/2021SI$1,728.00
1410000847/13/2021SI$713.00
1510000847/16/2021SI$424.00
1610000847/16/2021SI$470.00
1710000847/16/2021SI$2,610.00
1810000847/16/2021SI$572.00
1910000847/16/2021SI$1,055.00
2010000847/16/2021SI$25.00
2110000847/16/2021SI$3,745.00
2210000847/9/2021SI$465.65
2310000847/16/2021SI$6,560.00
2410000847/16/2021SI$702.00
2510000527/19/2021SI$822.00
2610000847/23/2021SI$3,190.00
2710000847/23/2021SI$835.00
2810000847/23/2021SI$1,070.00
2910000847/23/2021SI$4,480.00
3010000527/23/2021SI$358.25
31
Sheet1
Cell Formulas
RangeFormula
F6:F8F6=SORT(UNIQUE(Table4[Account no]))
G6:G8G6=SUMIFS(Table4[Amount],Table4[Account no],F6#,Table4[Posting date],">="&$G$2,Table4[Posting date],"<="&$G$3)
Dynamic array formulas.
 

Excel Facts

Excel Can Read to You
Customize Quick Access Toolbar. From All Commands, add Speak Cells or Speak Cells on Enter to QAT. Select cells. Press Speak Cells.
Trying to aggregate that amount of data, you are really pushing the limits of Excel. What you really have is a relational database, and a relational database program like SQL, Microsoft Access, or Oracle handles this type of things much more efficiently.

If you have to keep it in Excel, you may more luck using Power Query within Excel to do this.
There is a whole forum here devoted to "Power Tools" products, and some to the stickies give some good information on Power BI.
 
Upvote 0
Try this on a copy of your data. It assumes that your sheet is laid out exactly as your sample shows. Change sheet name/cell locations if they don't match your actual data exactly. On a test of your data replicated to 360k rows, it completed in under 2 seconds for me.

VBA Code:
Option Explicit
Sub Faster_Than_Sumifs()
    Dim t As Double: t = Timer
    Dim ws As Worksheet
    Set ws = Worksheets("Sheet1")   '<< Change to actual sheet name
    
    Dim Start As Double, Finish As Double
    Start = CDbl(ws.Range("G2").Value2)
    Finish = CDbl(ws.Range("G3").Value2)
    
    Dim ar, i As Long, n As Long
    ar = Sheet1.Range("A2", Cells(Rows.Count, "D").End(xlUp))
    
    With CreateObject("Scripting.Dictionary")
        For i = 1 To UBound(ar, 1)
            If CDbl(ar(i, 2)) >= Start And CDbl(ar(i, 2)) <= Finish Then
                .Item(ar(i, 1)) = .Item(ar(i, 1)) + ar(i, 4)
            Else
                .Item(ar(i, 1)) = .Item(ar(i, 1)) + 0
            End If
        Next i
        ar = Array(.keys, .Items)
        n = .Count
    End With
    
    With ws.Range("F5").CurrentRegion
        .Offset(1).ClearContents
        .Offset(1).Resize(n, 2).Value2 = Application.Transpose(ar)
        .Sort Key1:=ws.Range("F5"), order1:=xlAscending, Header:=xlYes
    End With
    
    MsgBox "Completed in " & Timer - t & " seconds"
End Sub
 
Upvote 0
Edit - change this line:
Rich (BB code):
.Sort Key1:=ws.Range("F5"), order1:=xlAscending, Header:=xlYes


to this
Rich (BB code):
.Sort Key1:=ws.Range("F6"), order1:=xlAscending, Header:=xlYes
 
Upvote 0
Kevin, this is great, and I really appreciate you taking the time to do this.

I've made adjustments to your code to update for my actual sheet names and cells.

In addtion, I've updated line
ar = Sheet1.Range("A2", Cells(Rows.Count, "D").End(xlUp))
to change Sheet1.Range to be ws.Range. That seemed to work.

However, I'm getting a runtime error 13, type mismatch in this line:
If CDbl(ar(i, 2)) >= Start And CDbl(ar(i, 2)) <= Finish Then
I've spent the better part of the morning trying to understand what the code is doing and what the error is, but I've thus far been unable to work it out.

If you (or anyone else) can help, I'd appreciate the nudge in the right direction.

Thanks again!

BT
 
Upvote 0
Hi BT, thank you for the feedback & comprehensive explanation of the issue. Moving forward, a few things would help to get to the bottom of the issue. Firstly, I’ll explain what this line of code does. (It’s easier to explain if you think of things in terms of a range).

If CDbl(ar(i, 2)) >= Start And CDbl(ar(i, 2)) <= Finish Then

The data in your table is loaded into an array (ar) which is then looped through one row at a time – where i is the row number, and the number following the comma after the i is the column number. The value is confirmed as a Double data type via the CDbl(). Therefore, that line of code asks – is the value in the second column greater than or equal to the value stored in the variable Start and is it less than or equal to the value stored in the variable Finish, and if so Then this line:

.Item(ar(i, 1)) = .Item(ar(i, 1)) + ar(i, 4)

stores that value as a Dictionary key – and the value in column 4 of the same row ar(i, 4) as the Dictionary item for that key – unless that key already exists, in which case it increments the value to the value already stored in the item for that key (the equivalent of summing the value – which is what ultimately returns the Sumifs() total once the entire data has been processed.)

What would help would be if you could tell me what values you are getting for the variables when the code errors. So when the code stops, put the mouse cursor over the following variables on the highlighted line and tell me what you get for Start & Finish (in both cases it should be a number, something like 44382 or similar); over ar (should appear as a date, something like 1/7/2021) and i (should be a number – if it’s 1 then the code doesn’t like anything about that line in relation to the data, if it’s greater than 1 then whatever number it is suggests there’s a problem on that particular row in your data).

It would also help if you could provide an XL2BB sample of your actual data so I can see the true layout – you can disguise any sensitive data & it doesn’t need to be a large sample, just something similar to what you provided in post #1. Also, could you show the full actual code you used once you had made the adjustments for sheet name/cell locations etc.

I have made a small edit to the code I posted in post #3 because I wasn’t happy with the final Sort of the returned table of values.
VBA Code:
Option Explicit
Sub Faster_Than_Sumifs_V2()
    Dim t As Double: t = Timer
    Dim ws As Worksheet
    Set ws = Worksheets("Sheet1")   '<< Change to actual sheet name
    
    Dim Start As Double, Finish As Double
    Start = CDbl(ws.Range("G2").Value2)
    Finish = CDbl(ws.Range("G3").Value2)
    
    Dim ar, i As Long, n As Long
    ar = ws.Range("A2", Cells(Rows.Count, "D").End(xlUp))
    
    With CreateObject("Scripting.Dictionary")
        For i = 1 To UBound(ar, 1)
            If CDbl(ar(i, 2)) >= Start And CDbl(ar(i, 2)) <= Finish Then
                .Item(ar(i, 1)) = .Item(ar(i, 1)) + ar(i, 4)
            Else
                .Item(ar(i, 1)) = .Item(ar(i, 1)) + 0
            End If
        Next i
        ar = Array(.keys, .Items)
        n = .Count
    End With
    
    With ws.Range("F5").CurrentRegion
        .Offset(1).ClearContents
        .Offset(1).Resize(n, 2).Value2 = Application.Transpose(ar)
    End With
    
    With ws.Range("F5").CurrentRegion
        .Sort Key1:=ws.Range("F6"), order1:=xlAscending, Header:=xlYes
    End With
    
    MsgBox "Completed in " & Timer - t & " seconds"
End Sub

When I ran the code, after extrapolating the data you provided down to 360K rows (bearing in mind only 1000084 had any data that falls within the date range) it took just over 1 second to turn this:
sumifs.xlsm
ABCDEFG
1Account noPosting dateDocument TypeAmount
210000967/01/2021SI$955.00Start Date7/05/2021
310000847/02/2021SI$960.00End Date7/12/2021
410000847/02/2021SI$1,395.00
510000847/02/2021SI$793.95CustomerAmount
610000847/02/2021SI$1,700.00
710000847/02/2021SI$375.00
810000847/02/2021SI$360.00
910000847/02/2021SI$1,235.00
1010000847/02/2021SI$960.00
1110000847/09/2021SI$740.35
1210000847/13/2021SI$618.00
1310000847/13/2021SI$1,728.00
1410000847/13/2021SI$713.00
1510000847/16/2021SI$424.00
1610000847/16/2021SI$470.00
1710000847/16/2021SI$2,610.00
1810000847/16/2021SI$572.00
1910000847/16/2021SI$1,055.00
2010000847/16/2021SI$25.00
Sheet1


Into this:
sumifs.xlsm
ABCDEFG
1Account noPosting dateDocument TypeAmount
210000967/01/2021SI$955.00Start Date7/05/2021
310000847/02/2021SI$960.00End Date7/12/2021
410000847/02/2021SI$1,395.00
510000847/02/2021SI$793.95CustomerAmount
610000847/02/2021SI$1,700.001000052$0.00
710000847/02/2021SI$375.001000084$14,472,000.00
810000847/02/2021SI$360.001000096$0.00
910000847/02/2021SI$1,235.00
1010000847/02/2021SI$960.00
1110000847/09/2021SI$740.35
1210000847/13/2021SI$618.00
1310000847/13/2021SI$1,728.00
1410000847/13/2021SI$713.00
1510000847/16/2021SI$424.00
1610000847/16/2021SI$470.00
1710000847/16/2021SI$2,610.00
1810000847/16/2021SI$572.00
1910000847/16/2021SI$1,055.00
2010000847/16/2021SI$25.00
Sheet1
 
Upvote 0
Solution
OUTSTANDING!

I cannot thank you enough for your help. The simple explanation you provided made sense to me, and I was able to edit the code to match my actual data set. I had it up and running in about 20 minutes, and I've verified that the answers are correct vs. a slow-moving SUMIFS.

The key for me was understanding the offsets of the columns. Your tip on mousing over the variable in the code helped me determine what the values were, and that helped me get to the right column.

Excellent work on your part, and I'm very grateful!

BT
 
Upvote 0
Happy to help BT and thanks for the feedback 👍
Out of curiosity, I'd be keen to know how long the code took to run with your actual data?
 
Upvote 0
Happy to help BT and thanks for the feedback 👍
One more point of follow-up for you: I've now updated my sheet to run your macro code when the desired date range is updated and I'm using it twice to get an average historical sales trend and then again to get a recent sales trend to compare the two. Ultimately the project is identify customers who may have financial issues, who have either slowed the rate at which they pay invoices or have suddenly spiked in new orders. Your contribution was a big help!
 
Upvote 0

Forum statistics

Threads
1,215,241
Messages
6,123,824
Members
449,127
Latest member
Cyko

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top