I've got a sampling workbook that uses the following formula (example from one cell in a sampling matrix) to sample with replacement:
=IF(AND('1'!$B6<>"",'1'!C$3<>""),HLOOKUP(Main!$C$13,Group01!$B$2:$AC$2002 ,RANDBETWEEN(2,Main!$C$12+1),TRUE),"")
What I need is a way to create a function where I can sample without replacement. The reason I need this is that at a very low prevalence of positive cells (e.g., 1% is very low vs 10% is low; and "positive" is a cell with a value of 1, vs "negative" being a cell with a value of 0), when I use a sample size of 100% (e.g., 100 sampled of 100 total potential), I find less than 100% "positive" where I need to find 100% "positive" for the model to be realistic. The problem is due to the sampling with replacement nature of the above function, whereas if I had a function that sampled without replacement, this would not happen.
Any help is appreciated.
Dale
I'll try this again, and hopefully be a bit clearer: I've got a simulation model that I'm using to simulate detection of ones (i.e., positives) from a sample taken from a population. For the sake of the discussion, please assume that the detection tool is 100% sensitive (i.e., if there is a one (positive) present in the sample, the detection tool will know it) and 100% specific (i.e., it won't call any of the points in the sample positive if they are not really a one). Sorry if that's a bit confusing, but if you want to understand sensitivity and specificity better, I can suggest some epidemiology texts to look at.
First, I have a sheet that creates a column of 0's or 1's based on two user inputs (N=population size, p=expected prevalence of 1's). For example, if N=20 and p=5% then the column contains 1 one and 19 zeros.
Second, I have another sheet in the workbook that uses RANDBETWEEN in conjunction with HLOOKUP, AND, & IF to randomly select & display the contents of one of the 20 cells from the first sheet, something like this:
=IF(AND($C27<>"",D$7<>""),HLOOKUP($D$4,$DC$7:$DC$27,RANDBETWEEN(2,$D$2+1),TRUE),"")
Where:
C27 = the 20th member of the population of 20
D7 = the 1st run in my set of "X" runs
D4 = the percent of cells from my population (of 20, in this case) in the first sheet that contain a "1" (5% or 1 of 20 cells, in this case)
DC7:DC27 = the range containing the 19 zeros and 1 one for my example population of 20
D2 = the size of my population (20, in this case)
I'm trying to simulate detection of ones (i.e., a "positive", where positive=1 and negative=0) with this model, and using the model to, among other things, demonstrate the relationship between the confidence for detection and: [1] sample size, [2] population size, and [3] true prevalence of positives (ones) in that population.
Excel seems to use a "sample with replacement" method, which can result in the same cell(s) sometimes being sampled multiple times in the same simulation run. When I use a percent of "positives" greater than around 5%-10% and/or when the population and sample size are large (500 & 500 or higher), this isn't much of a a problem, and the model stays right on the expected detecton (of ones, or positives) frequency (i.e., "confidence", or probability of detection) described in the statistical sample size tables (like those described by Cannon and Roe, 1982). However, when I use a sample size equal to the size of the entire population at smaller populations (300 & 300 or smaller) and especially where I use relatively low prevalence values (2% or less), I don't get the expected confidence probabilities (again, we're assuming 100% sensitive detection tool).
For example, I ran a simulation of 1000 replicates using the model, and the frequency of sampling the same cell containing the "1" out of the 20 in the example population (where the other 19 cells were zero) was:
1 = 371 of 1000
2 = 214 of 1000
3 = 64 of 1000
4 = 14 of 1000
5 = 1 of 1000
6 = 0 of 1000
7 = 1 of 1000
This problem occurs because Excel seems to use the "sample with replacement" method using the above equation and sheet structure. Using a "sample without replacement" method would resolve this problem, and what I'm looking for is a way to do "sampling without replacement" using Excel equations & functions, or macros/VBA.
I'd appreciate any wisdom from anyone on how to get Excel set up to do "sampling without replacement".
Dale
Like this thread? Share it with others