Results 1 to 9 of 9

Thread: Get random top n percent by group
Thanks Thanks: 0 Likes Likes: 0

  1. #1
    Board Regular Darren Bartrup's Avatar
    Join Date
    Mar 2006
    Location
    Nottingham (UK)
    Posts
    1,258
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Get random top n percent by group

    Hi all,

    I need to extract a percentage of records for each team member from an original raw data table.

    My raw data doesn't have a natural key, but can add an auto-number if needed.

    All the fields, except date, are text fields in a table called tbl_PrimaryData:
    dDate, sProcess, sPropID, sPropAddress, sSurname and sForename.

    The sSurname & sForename combination are pushed to a separate table called tbl_TeamMember which has these fields: ID, Surname, Forename and SampleSize. SampleSize is a double formatted as a percent and is manually entered - usually set at 5%.
    The SQL for this is:
    Code:
    INSERT INTO tbl_TeamMember ( Surname, Forename )
    SELECT DISTINCT sSurname, sForename
    FROM tbl_PrimaryData
    WHERE sSurname & sForename NOT IN (SELECT sSurname & sForename FROM tbl_TeamMember);
    So now I want to extract the SampleSize of records from tbl_PrimaryData for each team member and place the results in a table called tbl_Audits.

    Both team member names and process text will be pushed out to separate tables to normalise the data.

    If anyone has any thoughts on this please - I'm not having much luck.

    Edit:
    SQL like the below will give me random ordering of the records grouped by team member, but I can't seem to restrict it to the first n percentage.
    Code:
    SELECT      dDate,
                sProcess,
                sPropID,
                sPropAddress,
                sSurname,
                sForename
    FROM        tbl_PrimaryData
    ORDER BY    sSurname, sForename, RND(dDate)
    Last edited by Darren Bartrup; Oct 13th, 2015 at 09:29 AM.
    Using Office 2003 & 2010,

    I'm 1 of the 10 people that don't understand binary. Guess that means the other 1001 do.

    No answer to your post?
    Get someone to read it - does it make sense or does it sound like gibberish?

  2. #2
    MrExcel MVP
    Moderator
    xenou's Avatar
    Join Date
    Mar 2007
    Location
    Clev. OH, USA
    Posts
    16,469
    Post Thanks / Like
    Mentioned
    11 Post(s)
    Tagged
    1 Thread(s)

    Default Re: Get random top n percent by group

    In general, you can usually use a MOD function to get every k-th record where k is some constant:

    Select A.* FROM
    (
    --your query--
    ) AS A
    WHERE A mod K = 0

    If the ordering within the subquery can be also randomized then much better (like by day of week, or last name, or third letter of backwards last name or anything else that randomizes best. K should be chosen to get the number of records you want in the results (larger K equals fewer records, smaller K equals more records).

    Note:
    So, that was pseudocode. So MOD is probably a function in Access-world, and probably an operator in other worlds. Use the correct syntax as needed. Also this is a generic answer to the problem of randomization. Please clarify as needed!!
    Last edited by xenou; Oct 13th, 2015 at 08:46 PM.

    Using: Office 2010/Win7 (work) Office 2010/Win7 (home)

    You are rich in proportion to the number of things you can let alone.
    -- Henry David Thoreau

  3. #3
    Board Regular Darren Bartrup's Avatar
    Join Date
    Mar 2006
    Location
    Nottingham (UK)
    Posts
    1,258
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Get random top n percent by group

    Thanks Xenou. I'll have a play around and post back if I get the solution - think the correct syntax is A MOD K as in SELECT 5 MOD 10 returns 5.
    Using Office 2003 & 2010,

    I'm 1 of the 10 people that don't understand binary. Guess that means the other 1001 do.

    No answer to your post?
    Get someone to read it - does it make sense or does it sound like gibberish?

  4. #4
    MrExcel MVP
    Moderator
    xenou's Avatar
    Join Date
    Mar 2007
    Location
    Clev. OH, USA
    Posts
    16,469
    Post Thanks / Like
    Mentioned
    11 Post(s)
    Tagged
    1 Thread(s)

    Default Re: Get random top n percent by group

    Good. I think I forgot to mention that in some cases I might try to number the records in some way (i.e., if necessary insert into a temp table, then use an autonumbered field as the one to push into the mod function). I guess I've gotten used to this with SQL Server where I can create the numbering on the fly in the query ... starting to lose my old Access expertise. If there is a candidate field (with sufficient randomness) then possibly an existing field (or partof an existing field) can be used for that.

    Using: Office 2010/Win7 (work) Office 2010/Win7 (home)

    You are rich in proportion to the number of things you can let alone.
    -- Henry David Thoreau

  5. #5
    Board Regular Darren Bartrup's Avatar
    Join Date
    Mar 2006
    Location
    Nottingham (UK)
    Posts
    1,258
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Get random top n percent by group

    Slowly getting there - the requirements have changed a little since I realised I was forgetting one small point:
    I need to extract a percentage of records for each team member, for each week commencing date from an original raw data table.

    My original raw data table (tbl_PrimaryData) has:
    • ID - AutoNumber
    • dDate - Date
    • sProcess - Text
    • sPropCoID - Text
    • sPropertyAddress - Text
    • sSurname - Text
    • sForename - Text


    A couple of Append queries execute to push sProcess out to tbl_Process and sSurname & SForename to tbl_TeamMember.

    I then use this query to create a secondary primary data table (tbl_PrimaryData_Step1):
    Code:
    SELECT      CLng(PD.ID) AS ID, 
                IIf(Weekday([dDate],2)=1,[dDate],[dDate]-Weekday([dDate],3)) AS dWeekCommencing, 
                PD.dDate, 
                CLng(PR.ID) AS ProcessID, 
                PD.sPropCoID, 
                PD.sPropertyAddress, 
                CLng(TM.ID) AS TeamMemberID 
    INTO        tbl_PrimaryData_Step1
    FROM        (tbl_PrimaryData AS PD LEFT JOIN tbl_Process AS PR ON PD.sProcess = PR.Process) 
                    LEFT JOIN tbl_TeamMember AS TM ON (PD.sForename = TM.Forename) AND 
                                                      (PD.sSurname = TM.Surname)
    Another query updates a table (tbl_TeamMemberSampleSize) holding the total number of records per week per team member, and the required sample size.
    These columns are:
    • TMID - Team Member ID (L Integer)
    • WeekCommencing - date
    • TotalSample - (L Integer)
    • SampleSize - (Dbl Percentage)
    • AuditSize - (L Integer)


    For example, the first record here is:
    • 1 = TMID
    • 24/08/2015 = WeekCommencing
    • 56 = TotalSample
    • 5.00% - SampleSize
    • 3 - AuditSize


    My final query so far is:
    Code:
    SELECT         ID, 
                dWeekCommencing, 
                TeamMemberID
    FROM         tbl_PrimaryData_Step1 AS PD INNER JOIN tbl_TeamMemberSampleSize AS TMSS 
                    ON (PD.dWeekCommencing = TMSS.WeekCommencing) AND (PD.TeamMemberID = TMSS.TMID)
    WHERE         ID MOD AuditSize = 0
    This doesn't return the correct number of records (and I wasn't expecting it to).
    Based on the example above it gives 3, 6, 9, 12, etc - i.e. where the audit size fits exactly into the ID number.

    So this gives me a random (enough for my requirements) set of records which I think will always exceed the number of records required.

    So now I just need to return the first x records per group - based on the audit size. I've seen a query somewhere which does a self-join and counts how many records to return, but can't find it now.
    Using Office 2003 & 2010,

    I'm 1 of the 10 people that don't understand binary. Guess that means the other 1001 do.

    No answer to your post?
    Get someone to read it - does it make sense or does it sound like gibberish?

  6. #6
    MrExcel MVP
    Moderator
    xenou's Avatar
    Join Date
    Mar 2007
    Location
    Clev. OH, USA
    Posts
    16,469
    Post Thanks / Like
    Mentioned
    11 Post(s)
    Tagged
    1 Thread(s)

    Default Re: Get random top n percent by group

    first x should be easy (right?).

    Just:

    Code:
    Select TOP 100 A.* 
    FROM
        (
            your query here
        ) AS A

    Using: Office 2010/Win7 (work) Office 2010/Win7 (home)

    You are rich in proportion to the number of things you can let alone.
    -- Henry David Thoreau

  7. #7
    Board Regular Darren Bartrup's Avatar
    Join Date
    Mar 2006
    Location
    Nottingham (UK)
    Posts
    1,258
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Get random top n percent by group

    I wish it was that easy - but can't use a parameter in the SQL to select the top x.

    It would have to use something like SELECT TOP (SELECT AuditSize FROM tbl_TeamMemberSampleSize WHERE....).* FROM (......) which Access can't do.
    I've seen the solution somewhere - just wish I could remember where.
    Using Office 2003 & 2010,

    I'm 1 of the 10 people that don't understand binary. Guess that means the other 1001 do.

    No answer to your post?
    Get someone to read it - does it make sense or does it sound like gibberish?

  8. #8
    Board Regular Darren Bartrup's Avatar
    Join Date
    Mar 2006
    Location
    Nottingham (UK)
    Posts
    1,258
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Get random top n percent by group

    Finally figured it out. Found an example that worked here: SQL(DML) help: How to select TOP N rows from a table for each group?

    I've had to update the previous query to it - ID MOD AuditSize = 0 wasn't returning enough records, so I've updated it to ID MOD 2 = 0 to return every other record - might still need to work on that bit.

    So my main table for this query has 3 columns - ID, dWeekCommencing and TeamMemberID.
    tbl_TeamMemberSampleSize tells me how many records to return for each team member per week so this is referenced in the HAVING clause.

    I can now link the ID back to the rest of the data I need for each team member.

    Code:
    SELECT      T1.ID, 
    			T1.dWeekCommencing, 
    			T1.TeamMemberID
    FROM        qry_AuditSample_Step1 T1 INNER JOIN qry_AuditSample_Step1 T2
                    ON T1.dWeekCommencing = T2.dWeekCommencing AND
                       T1.TeamMemberID = T2.TeamMemberID AND
                       T1.ID >= T2.ID
    GROUP BY    T1.ID,
                T1.dWeekCommencing,
                T1.TeamMemberID
    HAVING      COUNT(*) <= (
                                SELECT  AuditSize
                                FROM    tbl_TeamMemberSampleSize
                                WHERE   TMID = T1.TeamMemberID AND
                                        WeekCommencing = T1.dWeekCommencing
                            )
    ORDER BY    T1.ID, T1.dWeekCommencing, T1.TeamMemberID
    Using Office 2003 & 2010,

    I'm 1 of the 10 people that don't understand binary. Guess that means the other 1001 do.

    No answer to your post?
    Get someone to read it - does it make sense or does it sound like gibberish?

  9. #9
    MrExcel MVP
    Moderator
    xenou's Avatar
    Join Date
    Mar 2007
    Location
    Clev. OH, USA
    Posts
    16,469
    Post Thanks / Like
    Mentioned
    11 Post(s)
    Tagged
    1 Thread(s)

    Default Re: Get random top n percent by group

    Sounds good. Overall, it also seems like using a little vba code could also smooth out the rough spots (if getting away from pure SQL is not a problem). Still, sounds like problem is getting solved quickly now!

    Using: Office 2010/Win7 (work) Office 2010/Win7 (home)

    You are rich in proportion to the number of things you can let alone.
    -- Henry David Thoreau

Some videos you may like

User Tag List

Like this thread? Share it with others

Like this thread? Share it with others

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •