20% Random Sample

ARW17

Board Regular
Joined
Oct 31, 2016
Messages
109
I need to create a sample of "work that needs to be graded". For each person, in each month, I want to pull a 20% random sample.

"Person" "Month" "Number of Total Workloads"
1 5 10
1 6 5
2 5 20
2 6 5

For person 1, I need 2 random workloads for May and 1 for June to review.
For person 2, I need 4 random workloads for May and 1 for June to review.

I know I can add a random number and sort on this and then I could take the top N records, but as far as I see, I can't figure out how to take the top X%. I can calculate what would be 20% for each person for each month, but I don't see how to use that number as the selection for how many records to pull per person per month.
 

Excel Facts

Do you hate GETPIVOTDATA?
Prevent GETPIVOTDATA. Select inside a PivotTable. In the Analyze tab of the ribbon, open the dropown next to Options and turn it off
I'm not following. For one thing, "random" and top X% are mutually exclusive ideas to me. And random from what? A particular person? The whole data set? "how many records to pull per month" isn't what's random, right? It's the randomness of the sample, and it's not clear what that is. In the end, I suspect you'll need to define a recordset, determine how many are in it, then calculate how many records x% represents, then randomly choose a record from that set and repeat until you reach that number and build a second recordset as you go. That is based on what might be a flawed interpretation by me.
 
Upvote 0
I'm not following. For one thing, "random" and top X% are mutually exclusive ideas to me. And random from what? A particular person? The whole data set? "how many records to pull per month" isn't what's random, right? It's the randomness of the sample, and it's not clear what that is. In the end, I suspect you'll need to define a recordset, determine how many are in it, then calculate how many records x% represents, then randomly choose a record from that set and repeat until you reach that number and build a second recordset as you go. That is based on what might be a flawed interpretation by me.
I was going to give each record a random number and then sort by it and take the top X%. Random workloads per person per month. Correct, it'll always be 20% per person per month, I just don't want to always take the ones done at the beginning of the month (or middle or end). When you say "define a recordset, determine how many are in it, then calculate how many records x% represents, then randomly choose a record from that set and repeat until you reach that number and build a second recordset as you go" do you mean that I'll do that in VBA? Some kind of LOOP function? Thanks for your time!
 
Upvote 0
I was going to give each record a random number and then sort by it and take the top X%
Got it, thanks. Yes you understood the notion but the solution you have is simpler I think.
 
Upvote 0
Exactly what I need! Thank you!
See this link for an example.
JACKD: This works perfectly and gives me the 20% sample I need. The only tweak I need is to add something like a ceiling function. If there are 1-4 pieces of work for someone, I want to round up to 1 piece for review for each. The way I have it set up now (exactly like the sample in the link), it's rounding down. Do you have another tip that would get me this extra bit? I'm not used to using SQL and I'm not sure where to put this. Thanks so much! I'm so close with what you already provided!
 
Upvote 0
Not sure I understand. The sample gets the 20% random by group and in your case group is employee. Please post the code you are using, and an example of what you want the output to be/look like.
Sounds like you're saying you don't want 20%, but it's not clear to me.
 
Upvote 0
Sorry, what I need is to get 20% or 2 samples of work to grade (2 records) when there are 10 total pieces of work. This was accomplished by following Selecting Random TOP N from a Group exactly. The problem is that when there's only 2 pieces of work, I still need to grade 1. We were doing this in Excel previously and used the function CEILING() so that we always round up, never down. If the person has 1-5 total pieces of work/records, I want to grade 1, if they have 6-10, I want to grade 2, etc. So yes, you could say I don't really want 20% exactly.

Step 1: I added a randomized number.
1605017235453.png


Step 2: I was able to sort the random number, by actor, by month.
1605018550155.png

Step 3: Isn't working. I'm trying to follow this:
1605018638654.png

This is my current Step 3 code (this is the first time I'm writing in SQL so mistakes are not at all obvious to me):
SELECT ACTOR, ENTRYDATE_MONTH, REGBASE
FROM (SELECT TOP 20 PERCENT ENTRY_CRED_GRD_STEP1.REGBASE, ENTRY_CRED_GRD_STEP1.ACTOR, ENTRY_CRED_GRD_STEP1.ENTRYDATE_MONTH, ENTRY_CRED_GRD_STEP1.RAND_CRED FROM ENTRY_CRED_GRD_STEP1 GROUP BY ENTRY_CRED_GRD_STEP1.REGBASE, ENTRY_CRED_GRD_STEP1.ACTOR, ENTRY_CRED_GRD_STEP1.ENTRYDATE_MONTH, ENTRY_CRED_GRD_STEP1.RAND_CRED ORDER BY ENTRY_CRED_GRD_STEP1.RAND_CRED) AS [%$##@_alias]
ORDER BY ENTRY_CRED_GRD_STEP1.actor, ENTRY_CRED_GRD_STEP1.ENTRYDATE_MONTH;

I am getting a 20% sample, but it's not 20% of each Actor for each month. What I want it to do is give me the first 20% (always rounded up) by person/actor, by month. I manually selected the results I'm looking for:


REGBASEACTORENTRYDATE_MONTHRAND_CRED
7925723​
T3ABP
7​
0.02​

ENTRY_CRED_GRD_STEP2 ENTRY_CRED_GRD_STEP2

REGBASEACTORENTRYDATE_MONTHRAND_CRED
1399837​
T3ABP
8​
0.00​
1511765​
T3ABP
8​
0.29​
ENTRY_CRED_GRD_STEP2 ENTRY_CRED_GRD_STEP2

REGBASEACTORENTRYDATE_MONTHRAND_CRED
7955312​
T3CMM
7​
0.07​
7788038​
T3CMM
7​
0.08​
8085997​
T3CMM
7​
0.13​
 
Upvote 0
Just to be clear. I said a couple times that it worked perfectly. I was wrong about that. I was getting 20%, but not the right 20%. So, I have 2 problems. I need to get the right 20% by actor, by month, and then I need to always round up the 20% result.
 
Upvote 0

Forum statistics

Threads
1,213,501
Messages
6,114,010
Members
448,543
Latest member
MartinLarkin

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top