Minimum sample size for 'n' allowed errors, given required accuracy and confidence (statistics)

SerenityNetworks

Board Regular
Joined
Aug 13, 2009
Messages
131
Office Version
  1. 365
Platform
  1. Windows
I use the workbook referenced below to provide me with the minimum sample sizes needed for a range of allowed errors, given a required accuracy and required confidence.

In a round-about way, I fill-down the BINOMDIST function in a column to provide me with the probability/confidence that my accuracy requirement will be met for a given sample size and given number of errors. This allows me to then look up the sample size where the probability/confidence I'm looking has been met (by the BINOMDIST formula).

The solution works perfectly. The results have been validated many times. However, it produces a huge (100+ mb) Excel file that is slow to calculate. In the example file provided I'm only giving 5 columns of formulas that are filled-down 200 rows. To accommodate the ranges I need to work with, I must fill-down 50k or more rows and across 75 or more columns. That makes for huge workbooks that are slow to calculate.

I'm wondering if anyone can show me a more elegant method to accomplish my goal. I've tried, but haven't been able to get away from using a massive number of columns and rows. It seems there should be a single formula that will do what I need, but I'm not mathematically inclined enough to be able to figure it out. Any help will be appreciated.

Thanks in advance,
Andrew

Dropbox link to workbook directory

or link to workbook directly if above doesn't work.
 
Last edited:
How about these apples?

A​
B​
C​
1​
Accuracy
99.90%​
2​
Confidence
80.00%​
3​
4​
Allowed Errors
NumTrials4
5​
0​
1,609​
B5: =NumTrials4($A5, $B$2, 1-$B$1)
6​
1​
2,994​
7​
2​
4,278​
8​
5​
7,905​
9​
10​
13,649​
10​
20​
24,726​
11​
50​
56,890​
12​
100​
109,342​
13​
200​
212,819​
14​
500​
519,725​
15​
1,000​
1,027,513​
16​
2,000​
2,038,529​
17​
5,000​
5,060,389​
18​
10,000​
10,085,026​
19​
20,000​
20,119,868​
20​
50,000​
50,189,002​
21​
100,000​
100,266,915​
22​
200,000​
200,377,100​
23​
500,000​
500,595,722​
24​
1,000,000​
1,000,842,103​

That last result (the 1,000,842,103) requires only 36 calculations; it calculates instantly.

Code:
Function NumTrials4(k As Long, Conf As Double, p As Double) As Long
  ' shg 2017
  ' Rock solid for values < ~ 1 billion
  Dim cdf           As Double   ' cumulative distribution function
  Dim n             As Long     ' trials
  Dim nLo           As Long
  Dim nHi           As Long
  Dim i             As Long

  If k < 0 Then Exit Function
  If Conf <= 0# Or Conf >= 1# Then Exit Function
  If p <= 0# Then Exit Function
  
  nLo = k
  n = k

  With WorksheetFunction
    Do
      cdf = .Binom_Dist(k, n, p, True)
      i = i + 1
      'Debug.Print Right(Space(10) & Format(i, "  #,##0"), 16) & _
                  Right(Space(16) & Format(nLo, "  #,##0"), 16) & _
                  Right(Space(16) & Format(n, "  #,##0"), 16) & _
                  Right(Space(16) & Format(nHi, "  #,##0"), 16) & _
                  Format(cdf, "  0.0000000000")

      Select Case Sgn(1 - cdf - Conf)
        Case -1
          nLo = n
          If n = 0 Then n = 1 Else n = 2 * n
        Case 0
          NumTrials4 = n
          Exit Function
        Case 1
          nHi = n
          Exit Do
      End Select
    Loop

    Do
      If nHi = nLo + 1 Then
        NumTrials4 = nHi
        Exit Function
      End If

      n = (nLo + nHi) \ 2
      i = i + 1
      cdf = .Binom_Dist(k, n, p, True)
      'Debug.Print Right(Space(10) & Format(i, "  #,##0"), 16) & _
                  Right(Space(16) & Format(nLo, "  #,##0"), 16) & _
                  Right(Space(16) & Format(n, "  #,##0"), 16) & _
                  Right(Space(16) & Format(nHi, "  #,##0"), 16) & _
                  Format(cdf, "  0.0000000000")

      Select Case Sgn(1 - cdf - Conf)
        Case -1
          nLo = n
        Case 0
          NumTrials4 = n
          Exit Function
        Case 1
          nHi = n
      End Select
    Loop
  End With

End Function
 
Upvote 0

Excel Facts

Move date out one month or year
Use =EDATE(A2,1) for one month later. Use EDATE(A2,12) for one year later.
Well, I just saw your post above. I was playing with NumTrials2. I'll take a look at NumTrials4 above momentarily. Anyway, here is what I was going to comment on NumTrials2.
Yes, it is faster. I'm trying, but the math and UDFs are getting beyond my ability to follow. What is the PMF and how do I determine the boundary of when the PMF falls below the level at which it can be subtracted from the CDF? I can see that it's around one-hundred for the required values I typically use, but I not understanding the terms leaves me guessing.

I attempted to replace your 'Stop' with a call to your original NumTrials, in order to continue the calculation once PMF equaled zero. I wasn't successful (not sure why yet). But then I noticed that the calculation actually dies/differs from the original NumTrials before PMF equals zero. Of course, I have no idea why.

Thanks again,
Andrew

I'm going to take a look at NumTrials4 shortly. But I may not be able to reply until morning.
 
Upvote 0
Dang! That's amazing!

I'll have to look more closely, but I think I even can follow what you are doing with the UDF. Thank you! This far exceeds my needs. It even lets me calculate ridiculous (but sometimes encountered requirements of 99.99% accuracy with 90% confidence. Wow. (It dies at 87,382 allowed errors at 99.99% accuracy and 90% confidence, but I can live with that :) )

Any thoughts on the best ways to pursue my other ultimate goals to (2) calculate the confidence level given the count of errors, required accuracy, and sample size, and (3) calculate the number of errors allowed given the sample size, required accuracy, and required confidence? I don't mean to presume or suppose, but this has been such an incredibly helpful thread that I have to ask.

Anyway, again, thank you very much,
Andrew
 
Upvote 0
A little cleanup:

A​
B​
1​
Accuracy
99.99%​
2​
Confidence
80.00%​
3​
4​
Allowed Errors
NumTrials4
5​
0​
16,094​
6​
1​
29,943​
7​
2​
42,790​
8​
5​
79,059​
9​
10​
136,506​
10​
20​
247,278​
11​
50​
568,925​
12​
100​
1,093,460​
13​
200​
2,128,239​
14​
500​
5,197,333​
15​
1,000​
10,275,245​
16​
2,000​
20,385,455​
17​
5,000​
50,604,153​
18​
10,000​
100,850,634​
19​
20,000​
201,199,220​
20​
50,000​
501,890,868​
21​
100,000​
1,002,670,343​
22​
131,071​
1,313,765,866​
23​
131,072​
#VALUE!​

Code:
Function NumTrials4(k As Long, Conf As Double, p As Double) As Long
  ' shg 2017
  Dim cdf           As Double   ' cumulative distribution function
  Dim n             As Long     ' number of trials
  Dim nLo           As Long     ' low limit for n
  Dim nHi           As Long     ' high limit for n
  Dim nCalc         As Long     ' calculation counter

  If k < 0& Then Exit Function
  If Conf <= 0# Or Conf >= 1# Then Exit Function
  If p <= 0# Or p >= 1# Then Exit Function

  nLo = k
  If k = 0& Then nHi = 1 Else nHi = k

  With WorksheetFunction
    ' find the high limit
    Do
      nHi = nHi * 2&
      cdf = .Binom_Dist(k, nHi, p, True)
      nCalc = nCalc + 1&
      'Debug.Print Right(Format(nCalc, "   0"), 3) & _
                  Right(Format(nLo, "               #,##0"), 15) & _
                  Right(Format(n, "               #,##0"), 15) & _
                  Right(Format(nHi, "               #,##0"), 15) & _
                  Format(cdf, "  0.000 000 000 000 000")
    Loop While 1 - cdf < Conf

    ' binary search for n
    Do
      If nHi = nLo + 1& Then
        NumTrials4 = nHi
        Exit Function
      End If

      n = nLo \ 2& + nHi \ 2& + (nLo And nHi And 1&)
      nCalc = nCalc + 1&
      cdf = .Binom_Dist(k, n, p, True)
      'Debug.Print Right(Format(nCalc, "   0"), 3) & _
                  Right(Format(nLo, "               #,##0"), 15) & _
                  Right(Format(n, "               #,##0"), 15) & _
                  Right(Format(nHi, "               #,##0"), 15) & _
                  Format(cdf, "  0.000 000 000 000 000")

      Select Case Sgn(1# - cdf - Conf)
        Case -1
          nLo = n
        Case 0
          NumTrials4 = n
          Exit Function
        Case 1
          nHi = n
      End Select
    Loop
  End With
End Function
 
Upvote 0
(2) calculate the confidence level given the count of errors, required accuracy, and sample size, and (3) calculate the number of errors allowed given the sample size, required accuracy, and required confidence?

How do you do those now?
 
Upvote 0
(2) calculate the confidence level given the count of errors, required accuracy, and sample size, and (3) calculate the number of errors allowed given the sample size, required accuracy, and required confidence?
How do you do those now?

I do it the hard way. I enter the required accuracy and then keep changing the confidence until I hit the number of trials and sample sizes I encountered. It's not terribly hard or time consuming (and it will be even less so with your speedy UDF), but it would be nice to have a single formula that did it.

Thanks,
Andrew
 
Last edited:
Upvote 0
Oh gee, do I feel stupid! I was so into obtaining the sample sizes that I totally forgot that I was using the derived confidence level to achieve it.

A
BC
1Accuracy99.90%
2
3Allowed ErrorsTrialsConfidence Level
40120470.02%
51243970.01%
6261570.01%

<tbody>
</tbody>

C4: =1-BINOM.DIST(A4,B4,1-$B$1,TRUE)

Or am I having yet another brain cramp?

Thanks,
Andrew
 
Last edited:
Upvote 0
Oh, and for accuracy...

A
BC
1Accuracy99.90%
2
3Allowed ErrorsTrialsAccuracy
401204100.00%
512439
99.96%
6261599.94%

<tbody>
</tbody>


C4: =1-A4/B4

Doh!

This is what happens when I get too focused on one problem (and I'm overly tired). I miss the obvious.

Thanks for all your help on this topic. It has been above and beyond all expectations.

Andrew

PS. I guess now I just need to work on finding out the number of allowed errors when I know the required accuracy, required confidence level, and number of trials.
 
Upvote 0
BTW, I came up with my own solution (here) to calculating for allowed errors. Thank you for the example in coding for a UDF. I don't know that my solution is very elegant, but it works - and it's not terribly slow despite being a brute force method. Later I'll look at it more closely to see if I can optimize it as you did with your UDF.

Andrew
 
Upvote 0

Forum statistics

Threads
1,215,361
Messages
6,124,500
Members
449,166
Latest member
hokjock

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top