MrExcel Message Board

Go Back   MrExcel Message Board > Question Forums > Excel Questions

Excel Questions All Excel/VBA questions - formulas, macros, pivot tables, general help, etc. Please post to this forum in English only.

Reply
 
Thread Tools Display Modes
Old Apr 11th, 2002, 10:37 PM   #1
dpolson
New Member
 
Join Date: Apr 2002
Posts: 17
Default

I've got a simulation model that I'm using to simulate detection of ones (i.e., positives) from a sample taken from a population. For the sake of the discussion, please assume that the detection tool is 100% sensitive (i.e., if there is a one (positive) present in the sample, the detection tool will know it) and 100% specific (i.e., it won't call any of the points in the sample positive if they are not really a one). Sorry if that's a bit confusing, but if you want to understand sensitivity and specificity better, I can suggest some epidemiology texts to look at.

First, I have a sheet that creates a column of 0's or 1's based on two user inputs (N=population size, p=expected prevalence of 1's). For example, if N=20 and p=5% then the column contains 1 one and 19 zeros.

Second, I have another sheet in the workbook that uses RANDBETWEEN in conjunction with HLOOKUP, AND, & IF to randomly select & display the contents of one of the 20 cells from the first sheet, something like this:

=IF(AND($C27<>"",D$7<>""),HLOOKUP($D$4,$DC$7:$DC$27,RANDBETWEEN(2,$D$2+1) ,TRUE),"")

Where:
C27 = the 20th member of the population of 20
D7 = the 1st run in my set of "X" runs
D4 = the percent of cells from my population (of 20, in this case) in the first sheet that contain a "1" (5% or 1 of 20 cells, in this case)
DC7:DC27 = the range containing the 19 zeros and 1 one for my example population of 20
D2 = the size of my population (20, in this case)

I'm trying to simulate detection of ones (i.e., a "positive", where positive=1 and negative=0) with this model, and using the model to, among other things, demonstrate the relationship between the confidence for detection and: [1] sample size, [2] population size, and [3] true prevalence of positives (ones) in that population.

Excel seems to use a "sample with replacement" method, which can result in the same cell(s) sometimes being sampled multiple times in the same simulation run. When I use a percent of "positives" greater than around 5%-10% and/or when the population and sample size are large (500 & 500 or higher), this isn't much of a problem, and the model stays right on the expected detecton (of ones, or positives) frequency (i.e., "confidence", or probability of detection) described in the statistical sample size tables (like those described by Cannon and Roe, 1982). However, when I use a sample size equal to the size of the entire population at smaller populations (300 & 300 or smaller) and especially where I use relatively low prevalence values (2% or less), I don't get the expected confidence probabilities (again, we're assuming 100% sensitive detection tool).

For example, I ran a simulation of 1000 replicates using the model, and the frequency of sampling the same cell containing the "1" out of the 20 in the example population (where the other 19 cells were zero) was:

1 = 371 of 1000
2 = 214 of 1000
3 = 64 of 1000
4 = 14 of 1000
5 = 1 of 1000
6 = 0 of 1000
7 = 1 of 1000

This problem occurs because Excel seems to use the "sample with replacement" method using the above equation and sheet structure. Using a "sample without replacement" method would resolve this problem, and what I'm looking for is a way to do "sampling without replacement" using Excel equations & functions, or macros/VBA.

I'd appreciate any wisdom from anyone on how to get Excel set up to do "sampling without replacement".

Dale
dpolson is offline   Reply With Quote
Old Apr 11th, 2002, 11:00 PM   #2
Jay Petrulis
MrExcel MVP
 
Jay Petrulis's Avatar
 
Join Date: Mar 2002
Location: Chicago, IL USA
Posts: 2,042
Default

Hi Dale,

Posted below are two user-defined functions created by Dave Braden, an Excel MVP many times over, for sampling without replacement.

The posts can be found at this link:
http://groups.google.com/groups?hl=e...D%40fiastl.net

Dave is awesome. He is leading the charge to fix the statistical functions in the Analysis ToolPak. He, together with Jerry W. Lewis, have posted some amazing code. (I worked on the Hypergeometric function for them, but put it away for awhile.)

Anyway, please post back if this isn't what you need.

First function uses a sheet range. Second is more flexible. Please see Dave's instructions at the end.

-------------------------------
Function HGSample(data As Range, ByVal SampleSize As Integer) As Variant
' Returns a hypergeometric sample of Samplesize from a range, assumed to be columnar
'PLEASE retain all comments: Posted to microsoft.public.excel.misc
'Written 1999/4/1 David J. Braden tmy@fiastl.net

Dim hiP1 As Long, i As Long, j As Long
Dim ret() As Variant, temp As Variant

'Application.Volatile 'to make this a variate generator, uncomment this line

temp = data
hiP1 = data.Rows.Count + 1
If SampleSize > UBound(temp) Then SampleSize = UBound(temp)
ReDim ret(1 To SampleSize, 1 To 1)
For i = 1 To SampleSize
j = i + Int(Rnd * (hiP1 - i))
ret(i, 1) = temp(j, 1): temp(j, 1) = temp(i, 1)
Next i
HGSample = ret
End Function



Function HGSample2a(ByVal lo As Long, ByVal hi As Long, ByVal SampleSize As Long) As Variant
'Returns a sample, without replacement, of SampleSize from the range lo to hi
'Written 1999/7/7 David J. Braden tmy@fiastl.net
'PLEASE retain all comments: Posted to microsoft.public.excel.*

Dim hiP1 As Long, i As Long, j As Long
Dim ret() As Variant, temp As Variant

Application.Volatile 'to make this a variate generator, uncomment this line

'The following allows more flexible specification of the support.
If lo > hi Then temp = lo: lo = hi - 1: hi = temp Else lo = lo - 1

ReDim temp(1 To hi - lo)
For i = hi - lo To 1 Step -1
temp(i) = i
Next
hiP1 = UBound(temp) + 1
If SampleSize > UBound(temp) Then SampleSize = UBound(temp)
ReDim ret(1 To SampleSize, 1 To 1)
For i = 1 To SampleSize
j = i + Int(Rnd * (hiP1 - i))
ret(i, 1) = temp(j) + lo: temp(j) = temp(i)
Next i
HGSample2a = ret
End Function

'As with the earlier function, this returns an array, so you need to do a
'ctrl-shift-enter, no matter how you use it (in another function, or for explicit
'display of the sample, or whatever). To see a draw, select B1:B25, array-enter
'=HGsample2a(1,50,25), and you have it. This function allows flexible
'specification of the set of integers being sampled from. It could be more
'robust still by specifying lo and hi as variants, then checking that they are in
'fact integers, but I'll leave that overhead to others.
---------------------------------

HTH,
Jay


Jay Petrulis is offline   Reply With Quote
Old Apr 12th, 2002, 07:39 AM   #3
dpolson
New Member
 
Join Date: Apr 2002
Posts: 17
Default

Thanks Jay.

I'll digest this and let you know. Appreciate the help.

Dale
dpolson is offline   Reply With Quote
Old Apr 14th, 2002, 07:01 AM   #4
dpolson
New Member
 
Join Date: Apr 2002
Posts: 17
Default

I set up the HGSample & HGSample2a functions, debugged, and compiled.

When I use the HGSample function =HGSample(data,samplesize) I get duplication of values, which was expected.

But when I use the HGSample2a function =HGSample2a(lo,hi,samplesize) I still get duplication.

I've rechecked the code and am pretty sure I got it plugged in right -

Here is the HGSample code I put in the function pasted right from the edit screen:


Function HGSample(Data As Range, ByVal SampleSize As Integer) As Variant
'Written 4/1/99 (no fooling!) ny David J. Braden
'Plz retain the following comment:
' Posted to microsoft.public.excel.misc
Dim hiP1 As Long, i As Long, j As Long
Dim ret() As Variant, temp As Variant

'Application.Volatile 'to make this a variate generator, uncomment this line

temp = Data
hiP1 = Data.Rows.Count + 1
SampleSize = Application.Min(Data.Rows.Count, SampleSize)
ReDim ret(1 To SampleSize, 1 To 1)
For i = 1 To SampleSize
j = i + Int(Rnd * (hiP1 - i))
ret(i, 1) = temp(j, 1): temp(j, 1) = temp(i, 1)
Next i
HGSample = ret

End Function



Here is the HGSample2a code I put in the function pasted right from the edit screen:

Function HGSample2a(ByVal lo As Long, ByVal hi As Long, ByVal SampleSize As Long) As Variant
'Returns a sample, without replacement, of SampleSize from the range lo to hi
'Written 1999/7/7 David J. Braden tmy@fiastl.net
'Please retain all comments: Posted to microsoft.public.excel.*

Dim hiP1 As Long, i As Long, j As Long
Dim ret() As Variant, temp As Variant

Application.Volatile 'to make this a variate generator, uncomment this line

'The following allows more flexible specification of the support.
If lo > hi Then temp = lo: lo = hi - 1: hi = temp Else lo = lo - 1

ReDim temp(1 To hi - lo)
For i = hi - lo To 1 Step -1
temp(i) = i
Next
hiP1 = UBound(temp) + 1
If SampleSize > UBound(temp) Then SampleSize = UBound(temp)
ReDim ret(1 To SampleSize, 1 To 1)
For i = 1 To SampleSize
j = i + Int(Rnd * (hiP1 - i))
ret(i, 1) = temp(j) + lo: temp(j) = temp(i)
Next i
HGSample2a = ret

End Function


Also, I'm not seeing what reason the samplesize input is needed in either function. If I copy the function down a column, specifying sample size doesn't seem to matter - only the number of cells containing the HGSample2a function equation seems to matter (along with the lo and hi bounds).

Any thought on what I'm missing here, or am doing wrong?


Also, with the HGSample function, F-9 doesn't seem to generate a new set of values, but if I change any value in the data range it does...this doen't make sense to me. I was expecting that F-9 would generate a new set of random values from the same data range values. Same question...any thoughts on what I am missing here, or am doing wrong?

Thanks for your help.

Dale
dpolson is offline   Reply With Quote
Old Apr 14th, 2002, 09:19 AM   #5
Jay Petrulis
MrExcel MVP
 
Jay Petrulis's Avatar
 
Join Date: Mar 2002
Location: Chicago, IL USA
Posts: 2,042
Default

Hi Dale,

Comments nested within your post

Quote:
On 2002-04-14 06:01, dpolson wrote:
I set up the HGSample & HGSample2a functions, debugged, and compiled.

When I use the HGSample function =HGSample(data,samplesize) I get duplication of values, which was expected.

----------------
JP: Expected? Why? How so? I cannot replicate this duplication. No matter how many times I sample this, my returned set has unique values.
----------------

But when I use the HGSample2a function =HGSample2a(lo,hi,samplesize) I still get duplication.

-------------
JP: I cannot replicate this, either.
-------------

I've rechecked the code and am pretty sure I got it plugged in right -

Here is the HGSample code I put in the function pasted right from the edit screen:


-------------
JP: You changed the comments Dave would like retained. Not an issue, but why?
-------------

---> SNIP functions previously posted <---

Also, I'm not seeing what reason the samplesize input is needed in either function. If I copy the function down a column, specifying sample size doesn't seem to matter - only the number of cells containing the HGSample2a function equation seems to matter (along with the lo and hi bounds).

------------
JP: This might be the problem. You have to group the results in a worksheet array, which I do not think you have done (I could be wrong here).
------------

Any thought on what I'm missing here, or am doing wrong?

---------------------
JP: How are you entering this? Suppose you have the following:
1. In A1:A150 enter =ROW() for each cell.
2. Highlight B1:B10, with B1 the active cell
3. In the formula box type =HGSAMPLE(A1:A50,10) and enter it with Ctrl+Shift+Enter. All 10 highlighted cells should fill up with values from the list.
4. In C1 enter =COUNTIF($B$1:$B$10,B1) and copy down the list.

All the values in column C should be 1.
-----------------------


Also, with the HGSample function, F-9 doesn't seem to generate a new set of values, but if I change any value in the data range it does...this doen't make sense to me. I was expecting that F-9 would generate a new set of random values from the same data range values. Same question...any thoughts on what I am missing here, or am doing wrong?

------------------
JP: Uncomment (delete the single quote ') the Application.Volatile line.
-------------------

Thanks for your help.

Dale
--------------------
JP: These unique random numbers/sampling without replacement seem to be causing a bit of confusion in a number of threads these days. I cannot replicate these duplications in functions/procedures which were designed to return a set of unique elements.

What am I missing? Could it be that the duplicates are from a resampled run, meaning that you have 100 data points and take 10, then you want another 10 from the remaining 90, and so on? If that is the case, then I understand how you are getting duplication.
---------------------

Regards,
Jay

[ This Message was edited by: Jay Petrulis on 2002-04-14 08:25 ]
Jay Petrulis is offline   Reply With Quote
Old Apr 15th, 2002, 02:30 PM   #6
dpolson
New Member
 
Join Date: Apr 2002
Posts: 17
Default

Thanks, Jay. You've helped me recognize some gaps in my understanding. I've inserted some comments below following yours. Hopefully others can learn to avoid the mistakes I made not grouping the results of these functions as worksheet arrays. Obviously I've got some more learning to do.


Hi Dale,
Comments nested within your post

On 2002-04-14 06:01, dpolson wrote:
I set up the HGSample & HGSample2a functions, debugged, and compiled.

When I use the HGSample function =HGSample(data,samplesize) I get duplication of values, which was expected.

----------------
JP: Expected? Why? How so? I cannot replicate this duplication. No matter how many times I sample this, my returned set has unique values.

Dale: I think I incorrectly understood the HGSample function to be sampling with replacement and the HGSample2a to be sampling without replacement. I get the duplication when I treat the HGSample and HGSample2a like a regular function (instead of an array). My mistake.
----------------

But when I use the HGSample2a function =HGSample2a(lo,hi,samplesize) I still get duplication.

-------------
JP: I cannot replicate this, either.

Dale: I think I'm not using the arrays correctly (see further comments below).
-------------

I've rechecked the code and am pretty sure I got it plugged in right -

Here is the HGSample code I put in the function pasted right from the edit screen:


-------------
JP: You changed the comments Dave would like retained. Not an issue, but why?

Dale: I went to the Google site to read the context for the HGSample and HGSample2a code you posted, and I used the HGSample code from message 5 from David Braden dated 1999/04/01 that was posted in the Google thread, and also the Google code for HGSample2a code from Dave's response to Anne on 2000/02/17. I went back and looked again and compared it to what you sent me and it looks like the comments are a bit different. I'll change the comments in the HGSample and HGSample2a function code I input into my Excel to the most recent ones in the code that you sent me. It also looks like one line of the HGSample code you sent me has changed sowewhat from what I found at Google in message 5 (1999/04/01): was... SampleSize = Application.Min(Data.Rows.Count, SampleSize) but looks to now be... If SampleSize > UBound(temp) Then SampleSize = UBound(temp) I'll make those changes also to be consistent with what you posted.
-------------

---> SNIP functions previously posted <---

Also, I'm not seeing what reason the samplesize input is needed in either function. If I copy the function down a column, specifying sample size doesn't seem to matter - only the number of cells containing the HGSample2a function equation seems to matter (along with the lo and hi bounds).

------------
JP: This might be the problem. You have to group the results in a worksheet array, which I do not think you have done (I could be wrong here).
------------

Any thought on what I'm missing here, or am doing wrong?

---------------------
JP: How are you entering this? Suppose you have the following:
1. In A1:A150 enter =ROW() for each cell.
2. Highlight B1:B10, with B1 the active cell
3. In the formula box type =HGSAMPLE(A1:A50,10) and enter it with Ctrl+Shift+Enter. All 10 highlighted cells should fill up with values from the list.
4. In C1 enter =COUNTIF($B$1:$B$10,B1) and copy down the list.
All the values in column C should be 1.

Dale: I followed steps 1-4 and got what you described. In doing so, I've recognized that I wasn't correctly using the HGSample and HGSample2a functions or control-shift-enter. Thanks for pointing that out. I haven't worked enough with arrays (only a bit with conditional sums), so the problem was my ingnorance. Thanks for the help.
-----------------------

Also, with the HGSample function, F-9 doesn't seem to generate a new set of values, but if I change any value in the data range it does...this doen't make sense to me. I was expecting that F-9 would generate a new set of random values from the same data range values. Same question...any thoughts on what I am missing here, or am doing wrong?

------------------
JP: Uncomment (delete the single quote ') the Application.Volatile line.

Dale: I understand, did the uncomment, and that took care of it. Thanks.
-------------------

Thanks for your help.

Dale




--------------------
JP: These unique random numbers/sampling without replacement seem to be causing a bit of confusion in a number of threads these days. I cannot replicate these duplications in functions/procedures which were designed to return a set of unique elements.

What am I missing? Could it be that the duplicates are from a resampled run, meaning that you have 100 data points and take 10, then you want another 10 from the remaining 90, and so on? If that is the case, then I understand how you are getting duplication.
---------------------

Regards,
Jay

[ This Message was edited by: Jay Petrulis on 2002-04-14 08:25 ]
dpolson is offline   Reply With Quote
Old Apr 15th, 2002, 02:44 PM   #7
Jay Petrulis
MrExcel MVP
 
Jay Petrulis's Avatar
 
Join Date: Mar 2002
Location: Chicago, IL USA
Posts: 2,042
Default

Hi Dale,

No need to do anything regarding the comments. Dave posted these in a few posts, and they changed.

The real substantive comments were intact, so that's the important part for others to use.

Glad everything is working for you. Please post back if you have any other troubles.

In addition, when you get really, really, really good with arrays on the worksheet, you can begin to get into the realm of Alan Beban and Harlan Grove. Search google for their posts on MMULT and Alan's array functions (he has a download from his website which is awesome for this). I don't pretend to understand the details of what they argue about.

They are *the* array giants, IMO.

Bye,
Jay
Jay Petrulis is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On

Forum Jump


All times are GMT -4. The time now is 02:01 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2012, vBulletin Solutions, Inc.
All contents Copyright 1998-2012 by MrExcel Consulting.
diabetic desserts recipes recipes Diabetic Soups Holiday Pizza Recipes Popcorn Recipes Recipes For Microwave Pasta Recipes Casserole Recipes Chili Recipes Curry Recipes Crockpot Recipes Apples Recipes Bread Recipes Vegetarian Recipes Vegetable recipes Desserts Recipes Appetizers Ethnic Recipes Meat Dishes Barbecue Recipes Sauces Recipes Marinade Recipes Low Fat Recipes Frugal Gourmet Kitchen Classics Recipes On The Grill Cook Books Seafood Recipes Cajun Recipes Breads Low Fat Low Fat Breads Bread Machine Recipes Yeast Breads Quick Breads Fat Free Vegetarian Salad Recipes Eggplant Recipes Radish Recipes Tomato Recipes Jalapeno Recipes Potato Recipes Lettuce Recipes Cabbage Recipes Beans Ambrosia Recipes Biscotti Recipes Desserts Low Fat Cookie Recipes Cheesecake Recipes Cake Recipes Pie Recipes Muffin Recipes Custard Recipes Best Appetizers Appetizers Low Fat Salsa Recipes Dip Recipes International Recipes Afghan Recipes Alaska Recipes French Recipes German Recipes Greek Recipes Italian Recipes Spanish Recipes Thai Recipes Korean Recipes Chinese Recipes Mexican Recipes Indian Recipes Beef Recipes Pork Pork & Ham Pork Butts Pork Chop Recipes Pork Ribs Rulled Pork Poultry Recipes Stews Recipes Ground Beef Barbecue Grill Barbecue Smoker All Purpose Sauce BBQ Sauce Barbecue Sauce Carolina BBQ Sauce Pickle Recipes Marinades Smoking Low Fat Appetizers & Dips Low Fat Breakfast Low Fat Cakes Low Fat Cheesecakes Low Fat Cookies Low Fat Desserts Low Fat Fish & Seafood Low Fat Meats Low Fat Pasta Low Fat Pies Low Fat Salads Low Fat Sandwiches Low Fat Sauces & Condiments Low Fat Sides Low Fat Soups Low Fat Vegetarian Baker's Dozen Taste of Home Recipe Book Bon Appetit Cookbook Blacktie Cookbook Buster Cook Book Cookbook USA Cook Book Cook Book Sara's Cookbook Sara's Cookbook Appetizers and Dips Poultry recipes Diabetic recipes Holiday recipes Miscellaneous recipes 110 recipes 1986 Usenet cookbook 2900 recipes Cyberrealm recipes Great sysops of world Specialty recipes Ceideburg recipes Cheese recipes Chili recipes Fruits recipes Garlic recipes Great chefs of NY Londontowne recipes Raisins recipes Recipes for kids US Food Vegetarian recipes Bread recipes Drinks Meat Dishes Brisket recipes Caribou recipes Chicken recipes Filet mignons recipes Pork recipes Swordfish recipes Turkey recipes Pasta recipes Uncategorized recipes Ethnic recipes Canada recipes English recipes Ethiopia recipes Germany recipes Greece recipes Mexican recipes Philippines recipes Welsh recipes Microwave recipes Soups recipes Vegetable recipes Asparagus recipes Barley recipes Brown rice recipes Lentil recipes Mushrooms recipes Salads recipes Wild rice Desserts recipes Cakes recipes Chocolate recipes Cookies recipes Ice cream recipes