20% Random Sample

ARW17

Board Regular
Joined
Oct 31, 2016
Messages
99
I need to create a sample of "work that needs to be graded". For each person, in each month, I want to pull a 20% random sample.

"Person" "Month" "Number of Total Workloads"
1 5 10
1 6 5
2 5 20
2 6 5

For person 1, I need 2 random workloads for May and 1 for June to review.
For person 2, I need 4 random workloads for May and 1 for June to review.

I know I can add a random number and sort on this and then I could take the top N records, but as far as I see, I can't figure out how to take the top X%. I can calculate what would be 20% for each person for each month, but I don't see how to use that number as the selection for how many records to pull per person per month.
 

jackd

Well-known Member
Joined
Oct 19, 2006
Messages
1,310
Office Version
  1. 365
Can you post a copy of your database with enough data to illustrate the issue? I need some data to work with. It's been 6+ years since I posted the example.
 

Some videos you may like

Excel Facts

What is the shortcut key for Format Selection?
Ctrl+1 (the number one) will open the Format dialog for whatever is selected.

xenou

MrExcel MVP
Joined
Mar 2, 2007
Messages
16,824
Office Version
  1. 2019
Platform
  1. Windows
There's no way to configure or otherwise adjust SELECT TOP X PERCENT in SQL that I have ever heard of (i.e., to round it up or down).
I think (assuming you have your groups) you can solve the "at least one" problem with:
SQL:
select top 1 from MyGroup order by x
union
select top 20 percent from MyGroup order by x
This would give the first record and the first 20 percent of records and because union removes duplicates it would not repeat the first record if the two sets overlap.

As far as getting the result by actor by month, that's exactly what you will need to do - get the results by actor by month and from that select the top 20% from each of those groups. Generally (for MSAccess) you will have to combine some sql and vba to accomplish your goal - I don't think you will be able to write a straight up SQL query in MSAccess for this.
 

Micron

Well-known Member
Joined
Jun 3, 2015
Messages
2,058
Office Version
  1. 365
Platform
  1. Windows
If you can't post a db (I don't even know if it's allowed here?) then post raw data as a table. Pics don't help too much. I'm not going to create and fill my own tables based on pictures, but I might take a whack at it if I had data I could copy. I suspect that at least some of us are of that opinion.
 

jackd

Well-known Member
Joined
Oct 19, 2006
Messages
1,310
Office Version
  1. 365
Agree totally with Micron and xenou.
You will have to try xenou's suggestion and let us know the outcome.
Good luck.
 

ARW17

Board Regular
Joined
Oct 31, 2016
Messages
99
I solved it by first adding the random number:
SELECT *
FROM (SELECT b.ACTOR, b.ENTRYDATE_MONTH, b.regbase, (select count(b.random_number) from entry_errors a where a.random_number >= b.random_number and a.actor=b.actor and a.entrydate_month=b.entrydate_month) AS ranking FROM (ENTRY_TINITIALS AS a LEFT JOIN entry_errors AS b ON a.TInitials = b.actor) LEFT JOIN archetypes_list AS c ON b.actor = c.[t-initials] WHERE b.cred_days <30 and c.archetype in (1,3) ORDER BY b.actor, b.entrydate_month) AS x
ORDER BY actor, entrydate_month, ranking;

then used a sort of ceiling function to get the NumberOfRowsDesired
SELECT actor, entrydate_month, -int(max(ranking)*-.2) AS numberofrowsdesired
FROM step1
GROUP BY actor, entrydate_month;

then
SELECT c.*
FROM (step1 AS a LEFT JOIN step2 AS b ON (a.actor = b.actor) AND (a.entrydate_month = b.entrydate_month)) LEFT JOIN entry_errors AS c ON (a.actor = c.actor) AND (a.regbase = c.regbase)
WHERE a.ranking <= b.numberofrowsdesired;

I appreciate the ideas!
 

jackd

Well-known Member
Joined
Oct 19, 2006
Messages
1,310
Office Version
  1. 365
Glad you have a solution.

Here's a free sql formatter that may help with readability.

Code:
SELECT *
FROM (
    SELECT b.ACTOR
        ,b.ENTRYDATE_MONTH
        ,b.regbase
        ,(
            SELECT count(b.random_number)
            FROM entry_errors a
            WHERE a.random_number >= b.random_number
                AND a.actor = b.actor
                AND a.entrydate_month = b.entrydate_month
            ) AS ranking
    FROM (
        ENTRY_TINITIALS AS a LEFT JOIN entry_errors AS b
            ON a.TInitials = b.actor
        )
    LEFT JOIN archetypes_list AS c
        ON b.actor = c.[t-initials]
    WHERE b.cred_days < 30
        AND c.archetype IN (
            1
            ,3
            )
    ORDER BY b.actor
        ,b.entrydate_month
    ) AS x
ORDER BY actor
    ,entrydate_month
    ,ranking;
 

Watch MrExcel Video

Forum statistics

Threads
1,127,605
Messages
5,625,762
Members
416,136
Latest member
senthil_sk

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top