COUNTIFS returning wrong result by 1

RachelN76

New Member
Joined
Oct 24, 2016
Messages
15
Hi all,

I'm new here, so I hope I'm following all the rules! I've tried searching to see if anyone has had this problem, but I haven't come up with anything that fixes my problem.

I'm not a particularly advanced Excel user. I used it in a job a long time ago, but now I'm a nurse, however I'm the techiest person in my team, so I get these jobs!

We are running some studies, and I need to track the number of people we recruit each month, to each separate study.
We record all participants on a spreadsheet in different tabs, and for each study I make sure that they have a unique study ID and a date of recruitment.

I then have a separate tab to track these.
I set it up for January, then drag it across (making sure I've put $ in the right places) so it just carries on doing the same thing all year.
Here it is:
=COUNTIFS('PETRA Screening log'!$H$2:$H$1048576,">="&Tracker!S$3,'PETRA Screening log'!$H$2:$H$1048576,"<="&Tracker!S$4,'PETRA Screening log'!$G$2:$G$1048576,"<>"&"")

Where Petra screening log is the tab with all the participants on. Column H has the date recruited, column G is their unique ID (so I just need that to not be empty), and S3 and S4 are the dates I want it to count inbetween.

It has worked all year - but this month it counts 21 when there are only 20. And I can't for the life of my figure out why!

I've checked that there are no hidden rows.
I've checked the formatting on the column with the dates and they are all formatted as dates. I've checked that there are no extra numbers hidden anywhere.

I'm pretty certain that I will have done something silly - but if someone could tell me what it might be, I'd be really grateful.

Thanks you so much
Rachel
 

Excel Facts

Excel Wisdom
Using a mouse in Excel is the work equivalent of wearing a lanyard when you first get to college
Well it's not a syntax error, since it is giving a result at all.
It would be impossible to really say what's wrong without seeing the data.

I'd suggest using autofilter to filter that column H for >= date in S3 and <=Date in S4, and column G deselect blanks.
 
Upvote 0
=COUNTIFS('PETRA Screening log'!$H$2:$H$1048576,">="&Tracker!S$3,
'PETRA Screening log'!$H$2:$H$1048576,"<="&Tracker!S$4,
'PETRA Screening log'!$G$2:$G$1048576,"<>"&"")
[....]
It has worked all year - but this month it counts 21 when there are only 20. And I can't for the life of my figure out why!

The obvious answer is: you miscounted manually. Barring that....

First, FYI, there is no difference between COUNTIFS(...,"<>"&"") and COUNTIFS(...,"<>").

Second, that condition is true for any cell that ISBLANK returns FALSE; that is, any cell that appears blank, but it is not truly empty (no formula and no constant value).

So perhaps the value of one of the cells in column G that appears blank (and you expect not to be counted) actually contains the null string ("") or a string of one or more spaces (" ") or non-breaking spaces (HTML &nbsp).

Use ISBLANK to find it.

BTW, a really tricky situation is: the cell contains an invisible null string (!). That can happen when the cell value is an explicit null string, and you copy-and-paste-value to overwrite the cell.

PS.... If you never expect to fill nearly 1+ million cells, I suggest that you change the range 1048576 to something much smaller. Otherwise, your COUNTIFS is making 3+ million comparisons; and that will degrade worksheet performance noticeably when the formula is replicated many times.
 
Last edited:
Upvote 0
PS.... If you never expect to fill nearly 1+ million cells, I suggest that you change the range 1048576 to something much smaller. Otherwise, your COUNTIFS is making 3+ million comparisons; and that will degrade worksheet performance noticeably when the formula is replicated many times.

Actually the IF(S) family of functions (COUNTIF(S), SUMIF(S), AVERAGEIF(S)) are unique in that they use implicit detection of the last-used cell of the range passed, and thus calculate over the relevant reduced range.

As such. assuming for example that you only have data extending as far as row 2, there is no difference in worksheet performance in using, for example:

=SUMIF(A1:A2,"X",B1:B2)

and:

=SUMIF(A:A,"X",B:B)

despite the latter referencing approximately one million times as many cells as the former.

What you say does apply, however, to all other array-processing functions, e.g. SUMPRODUCT, AGGREGATE and any function requiring CSE. So, in parallel with the above example, the difference between:

=SUMPRODUCT(0+(A1:A2)="X",B1:B2)

and:

=SUMPRODUCT(0+(A:A)="X",B:B)

is enormous.

Regards
 
Last edited:
Upvote 0
PS.... IMHO, this is a distraction that is not helpful to Rachel at all. I hope this is the end of it here. Any further discussion of this point should be put into a separate thread, IMHO.

Actually the IF(S) family of functions (COUNTIF(S), SUMIF(S), AVERAGEIF(S)) are unique in that they use implicit detection of the last-used cell of the range passed, and thus calculate over the relevant reduced range. [....] there is no difference in worksheet performance in using, for example:
=SUMIF(A1:A2,"X",B1:B2)
and:
=SUMIF(A:A,"X",B:B)

That is correct for SUMIF(S) and probably for AVERAGEIF(S).

But it is not true of all IF(S) functions; notably, not for COUNTIFS, which Rachel is using.

And even if you were correct, the point is, as you noted: there are many other formulas where the use of whole-column references (or nearly so) is a bad idea. My guess: it is the leading root cause for many complaints about Excel performance and "hangs" (non-responding).

So IMHO, it is a bad habit that should be avoided whenever it is unnecessary, whether or not it actually makes a difference.

On my computer (YMMV), COUNTIFS(A2:A1048576,...) is about 200 times slower than COUNTIFS(A2:A5,...).

In contrast, the performance of SUMIFS(B2:B5,...) and SUMIFS(B2:B1048576,...) is about the same.

The reason for the difference: for SUMIFS, Excel knows or can know that it needs to look only up to the row with the last numeric value in column B, since it is summing numeric values.

The difference between SUMIFS and COUNTIFS is demonstrated by the macro below. On my computer (YMMV), the times are (in microseconds):

COUNTIFS(A2:A5,"x",B2:B5,"<>")
min 32.504, avg 44.351, max 65.008
min 26,089, avg 52.990, max 141.563

COUNTIFS(A2:A1048576,"x",B2:B1048576,"<>")
min 9518.930, avg 9845.850, max 9932.927

SUMIFS(B2:B5,A2:A5,"x",B2:B5,"<>")
min 27.799, avg 44.222, max 63.725

SUMIFS(B2:B1048576,A2:A1048576,"x",B2:B1048576,"<>")
min 24.378, avg 41.357, max 65.435
min 26.089, avg 57.566, max 164.230

I show two min/avg/max lines in some cases to make a point: the large variation probably reflects variable system overhead, and perhaps Excel or VBA overhead.

But that variable overhead is small relative to the much greater performance time for COUNTIFS(A2:A1048576,...). So its performance is more "stable".

I've taken care to avoid the system clock interrupt. But I cannot avoid other interrupts, even when I start Windows in safe mode without networking, which is difficult, if not impossible, to do starting with Office 2010.

-----

If you're curious, the following is the macro that I use. See the explanation below.

Code:
Option Explicit

Public Declare Function QueryPerformanceFrequency Lib "kernel32" _
   (ByRef freq As Currency) As Long
Public Declare Function QueryPerformanceCounter Lib "kernel32" _
   (ByRef cnt As Currency) As Long

Sub doit()
Const n As Long = 10
Dim r As Range, s As String
Dim t As Single, t0 As Single, clk(1 To n) As Single
Dim i As Long, lr As Long
Dim st As Currency, et As Currency, dt(1 To n) As Double
ActiveSheet.UsedRange.Clear
Range("a2:a3") = "x"
Range("a4:a5") = "y"
Range("b3") = 1
Range("b5") = 20
lr = Rows.Count
Set r = Range("c1")
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
#If 1 Then
    r.Formula = "=countifs(a2:a5,""x"",b2:b5,""<>"")"
#ElseIf 1 Then
    r.Formula = "=countifs(a2:a" & lr & ",""x"",b2:b" & lr & ",""<>"")"
#ElseIf 1 Then
    r.Formula = "=sumifs(b2:b5, a2:a5,""x"",b2:b5,""<>"")"
#Else
    r.Formula = "=sumifs(b2:b" & lr & ",a2:a" & lr & ",""x"",b2:b" & lr & ",""<>"")"
#End If
For i = 0 To n
    t = Timer: Do: t0 = Timer: Loop Until t0 <> t
    QueryPerformanceCounter st
    r.Calculate
    QueryPerformanceCounter et
    If i <> 0 Then
        clk(i) = Timer - t0: If clk(i) < 0 Then clk(i) = clk(i) + 86400
        dt(i) = convertmytimer(et - st)
    End If
Next
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
s = r.Formula
For i = 1 To n
    s = s & vbNewLine & Format(i, "00") & _
        vbTab & Format(dt(i), "0.000000\,000") & _
        vbTab & Format(CDbl(clk(i)), "0.000000\,000")
Next
s = s & vbNewLine & "min:" & vbTab & Format(WorksheetFunction.Min(dt), "0.000000\,000") & _
    vbNewLine & "avg:" & vbTab & Format(WorksheetFunction.Average(dt), "0.000000\,000") & _
    vbNewLine & "max:" & vbTab & Format(WorksheetFunction.Max(dt), "0.000000\,000")
Debug.Print "------" & vbNewLine & s
MsgBox s
End Sub

Function convertmytimer(ByVal dt As Currency) As Double
Static freq As Currency, df As Double
If freq = 0 Then QueryPerformanceFrequency freq: df = freq
convertmytimer = dt / df
End Function

Change #IF 1 to #IF 0 to select the next formula.

The form of the output is:
Code:
=COUNTIFS(A2:A1048576,"x",B2:B1048576,"<>")
01  0.009913,681    0.000000,000
02  0.009902,561    0.000000,000
03  0.009894,435    0.000000,000
04  0.009885,881    0.000000,000
05  0.009997,507    0.000000,000
06  0.010667,257    0.000000,000
07  0.010041,130    0.000000,000
08  0.009883,743    0.000000,000
09  0.009966,286    0.000000,000
10  0.009885,454    0.000000,000
min:    0.009883,743
avg:    0.010003,794
max:    0.010667,257

The times are in seconds. I use period as the decimal point and comma to demarcate nanoseconds (like a thousands separator).

I try to avoid system clock interrupts by "syncing" with the next system clock update. That is the purpose of the ``Do: t0 = Timer: Loop`` statements before each timing interval.

But we cannot guarantee there are no system clock interrupts. So the third column is the difference in the system clock, which typically has a resolution of 15.625 microseconds. Thus, 0.000000,000 demonstrates that there was no system clock interrupt; ergo, no system clock overhead.

Nevertheless, there is plenty of other overhead that is included in each time, for example: other system interrupts (networking, disk, etc); and VBA and Excel interthread communication.

I try to minimize Excel overhead by disabling ScreenUpdating and by setting xlCalculationManual mode.

Arguably, I could reduce Excel overhead further by calculating 10 formulas in Excel instead of looping over 10 calculate cycles. Indeed, that is my usual modus operandi.

But that has the potential for creating different Excel overhead. I decided it was not necessary because the relative difference between COUNTIFS(A2:A5,...) and COUNTIFS(A2:A1048576,...) is so significant.
 
Last edited:
Upvote 0
@joeu2004

Sincere apologies - I stand corrected by your excellent analysis.

I must have done my own basic testing using SUMIF some time ago, and made the idle and careless assumption that COUNTIF, etc. would behave similarly.

So IMHO, it is a bad habit that should be avoided whenever it is unnecessary, whether or not it actually makes a difference.

Agreed. A not insignificant proportion of my posts each week are along the lines of informing someone of the dangers inherent in using larger ranges than are actually necessary (usually entire column references, e.g. A:A). However, I usually include somewhere within those posts a few lines to the effect that this "does not apply to COUNTIF(S), SUMIF(S), etc.". I will be sure to remove that portion now in light of your analysis.

Regards
 
Upvote 0
And I can't for the life of my figure out why!

Me too! I tried using the SumProduct formula below in a column I needed to count unique instead, which should have fixed the problem since I only need 1 column checked and it's still wrong.

=SUMPRODUCT(--(FREQUENCY(MATCH(A$3:A$102,A$3:A$102,0),ROW(A$3:A$102)-ROW(A$3))>0))
 
Upvote 0

Forum statistics

Threads
1,214,911
Messages
6,122,198
Members
449,072
Latest member
DW Draft

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top