Removing duplicates based on multiple criteria

Gollum9

New Member
Joined
Feb 10, 2011
Messages
31
Office Version
  1. 365
Platform
  1. Windows
Hi all,

I'm not even sure if this is possible but I just wanted to see if there was a better solution to something I am struggling with.

I am trying to remove duplicate instances from a list of training attendance. People are free to book on a course themselves but the problem from a reporting perspective is that the data export then has several rows for the same person with 'booked', then 'absent', then another 'booked' and finally a 'attended' for example. What I want to do is set something up to pick out the most recent 'attended' and delete the others. If they haven't attended, the most recent 'booked' and finally if none of these then the most recent 'absent'.

The data looks like this if it helps to explain better:

Name Date Course Status
Bob 10/2/18 A Booked
Bob 12/2/18 A Absent
Bob 13/2/18 A Booked
Bob 15/2/18 A Attended

The code/formula would then strip out the first 3 rows leaving just the last one.

Any suggestions would be much appreciated as at the moment I am using a number of sorts and filters but it is very manual due to the amount of data I have.

Many thanks,
 

Excel Facts

Did you know Excel offers Filter by Selection?
Add the AutoFilter icon to the Quick Access Toolbar. Select a cell containing Apple, click AutoFilter, and you will get all rows with Apple
Try this:-
Code:
[COLOR="Navy"]Sub[/COLOR] MG19Feb04
[COLOR="Navy"]Dim[/COLOR] Rng [COLOR="Navy"]As[/COLOR] Range, Dn [COLOR="Navy"]As[/COLOR] Range, n [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Q [COLOR="Navy"]As[/COLOR] Variant, K [COLOR="Navy"]As[/COLOR] Variant, a [COLOR="Navy"]As[/COLOR] Range
[COLOR="Navy"]Dim[/COLOR] b [COLOR="Navy"]As[/COLOR] Range, c [COLOR="Navy"]As[/COLOR] Range, col [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Dic [COLOR="Navy"]As[/COLOR] Object, nRng [COLOR="Navy"]As[/COLOR] Range
[COLOR="Navy"]Set[/COLOR] Rng = Range("A2", Range("A" & Rows.Count).End(xlUp))
[COLOR="Navy"]With[/COLOR] CreateObject("scripting.dictionary")
 .CompareMode = vbTextCompare
   [COLOR="Navy"]Set[/COLOR] Dic = CreateObject("scripting.dictionary")
     Dic.CompareMode = vbTextCompare
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] Dn [COLOR="Navy"]In[/COLOR] Rng
    [COLOR="Navy"]Select[/COLOR] [COLOR="Navy"]Case[/COLOR] Dn.Offset(, 3).Value
        [COLOR="Navy"]Case[/COLOR] "Attended": col = 0
        [COLOR="Navy"]Case[/COLOR] "Booked": col = 1
        [COLOR="Navy"]Case[/COLOR] "Absent": col = 2
    [COLOR="Navy"]End[/COLOR] Select
    [COLOR="Navy"]If[/COLOR] Not .exists(Dn.Value) [COLOR="Navy"]Then[/COLOR]
        .Add Dn.Value, Array(a, b, c)
        Q = .Item(Dn.Value)
            [COLOR="Navy"]Set[/COLOR] Q(col) = Dn
        .Item(Dn.Value) = Q
    [COLOR="Navy"]Else[/COLOR]
    Q = .Item(Dn.Value)
        [COLOR="Navy"]If[/COLOR] Q(col) [COLOR="Navy"]Is[/COLOR] Nothing [COLOR="Navy"]Then[/COLOR]
            [COLOR="Navy"]Set[/COLOR] Q(col) = Dn
        [COLOR="Navy"]Else[/COLOR]
            [COLOR="Navy"]If[/COLOR] Dn.Offset(, 1).Value > Q(col).Offset(, 1).Value [COLOR="Navy"]Then[/COLOR] [COLOR="Navy"]Set[/COLOR] Q(col) = Dn
        [COLOR="Navy"]End[/COLOR] If
    .Item(Dn.Value) = Q
[COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR]
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] K [COLOR="Navy"]In[/COLOR] .keys
[COLOR="Navy"]For[/COLOR] n = 0 To 2
    [COLOR="Navy"]If[/COLOR] Not .Item(K)(n) [COLOR="Navy"]Is[/COLOR] Nothing [COLOR="Navy"]Then[/COLOR]
         Dic(.Item(K)(n).Address) = Empty
        [COLOR="Navy"]Exit[/COLOR] For
    [COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR] n
[COLOR="Navy"]Next[/COLOR] K
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] Dn [COLOR="Navy"]In[/COLOR] Rng
    [COLOR="Navy"]If[/COLOR] Not Dic.exists(Dn.Address) [COLOR="Navy"]Then[/COLOR]
        [COLOR="Navy"]If[/COLOR] nRng [COLOR="Navy"]Is[/COLOR] Nothing [COLOR="Navy"]Then[/COLOR] [COLOR="Navy"]Set[/COLOR] nRng = Dn Else [COLOR="Navy"]Set[/COLOR] nRng = Union(nRng, Dn)
    [COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR] Dn
[COLOR="Navy"]End[/COLOR] With
[COLOR="Navy"]If[/COLOR] Not nRng [COLOR="Navy"]Is[/COLOR] Nothing [COLOR="Navy"]Then[/COLOR] nRng.EntireRow.Delete
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]
Regards Mick
 
Upvote 0
Thanks so much for your help Mick. Let me try to apply it to my file and I will let you know how I get on.

Thanks again,
 
Upvote 0
Hi @MickG

This worked great with the test data, did just what I needed.

Since then the data export has changed and expanded quite a lot and i'm struggling to modify the code to work with it. The columns are now from A to AB and the relevant ones are Employee Number (col A), Date (col R), Course Name (col W) and Status (col Z). The status options are also slightly different and are now (in order of priority to keep), Attended / Confirmed / Awaiting Response / Absent.

If you are able to help modify this for the changes to the data it would be much appreciated.

Thanks,
Adam
 
Upvote 0
Try this (Lightly tested)
Code:
[COLOR="Navy"]Sub[/COLOR] MG05Mar31
[COLOR="Navy"]Dim[/COLOR] Rng [COLOR="Navy"]As[/COLOR] Range, Dn [COLOR="Navy"]As[/COLOR] Range, n [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Q [COLOR="Navy"]As[/COLOR] Variant, K [COLOR="Navy"]As[/COLOR] Variant, a [COLOR="Navy"]As[/COLOR] Range
[COLOR="Navy"]Dim[/COLOR] b [COLOR="Navy"]As[/COLOR] Range, c [COLOR="Navy"]As[/COLOR] Range, d [COLOR="Navy"]As[/COLOR] Range, col [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Dic [COLOR="Navy"]As[/COLOR] Object, nRng [COLOR="Navy"]As[/COLOR] Range
[COLOR="Navy"]Set[/COLOR] Rng = Range("A2", Range("A" & Rows.Count).End(xlUp))
'[COLOR="Green"][B]Attended / Confirmed / Awaiting Response / Absent.[/B][/COLOR]
[COLOR="Navy"]With[/COLOR] CreateObject("scripting.dictionary")
 .CompareMode = vbTextCompare
   [COLOR="Navy"]Set[/COLOR] Dic = CreateObject("scripting.dictionary")
     Dic.CompareMode = vbTextCompare
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] Dn [COLOR="Navy"]In[/COLOR] Rng
    [COLOR="Navy"]Select[/COLOR] [COLOR="Navy"]Case[/COLOR] Dn.Offset(, 25).Value
        [COLOR="Navy"]Case[/COLOR] "Attended": col = 0
        [COLOR="Navy"]Case[/COLOR] "Confirmed": col = 1
        [COLOR="Navy"]Case[/COLOR] "Awaiting Response": col = 2
        [COLOR="Navy"]Case[/COLOR] "Absent": col = 3
    [COLOR="Navy"]End[/COLOR] Select
    [COLOR="Navy"]If[/COLOR] Not .exists(Dn.Value) [COLOR="Navy"]Then[/COLOR]
        .Add Dn.Value, Array(a, b, c, d)
        Q = .Item(Dn.Value)
            [COLOR="Navy"]Set[/COLOR] Q(col) = Dn
        .Item(Dn.Value) = Q
    [COLOR="Navy"]Else[/COLOR]
    Q = .Item(Dn.Value)
        [COLOR="Navy"]If[/COLOR] Q(col) [COLOR="Navy"]Is[/COLOR] Nothing [COLOR="Navy"]Then[/COLOR]
            [COLOR="Navy"]Set[/COLOR] Q(col) = Dn
        [COLOR="Navy"]Else[/COLOR]
            [COLOR="Navy"]If[/COLOR] Dn.Offset(, 17).Value > Q(col).Offset(, 17).Value [COLOR="Navy"]Then[/COLOR] [COLOR="Navy"]Set[/COLOR] Q(col) = Dn
        [COLOR="Navy"]End[/COLOR] If
    .Item(Dn.Value) = Q
[COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR]
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] K [COLOR="Navy"]In[/COLOR] .keys
[COLOR="Navy"]For[/COLOR] n = 0 To 2
    [COLOR="Navy"]If[/COLOR] Not .Item(K)(n) [COLOR="Navy"]Is[/COLOR] Nothing [COLOR="Navy"]Then[/COLOR]
         Dic(.Item(K)(n).Address) = Empty
        [COLOR="Navy"]Exit[/COLOR] For
    [COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR] n
[COLOR="Navy"]Next[/COLOR] K
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] Dn [COLOR="Navy"]In[/COLOR] Rng
    [COLOR="Navy"]If[/COLOR] Not Dic.exists(Dn.Address) [COLOR="Navy"]Then[/COLOR]
        [COLOR="Navy"]If[/COLOR] nRng [COLOR="Navy"]Is[/COLOR] Nothing [COLOR="Navy"]Then[/COLOR] [COLOR="Navy"]Set[/COLOR] nRng = Dn Else [COLOR="Navy"]Set[/COLOR] nRng = Union(nRng, Dn)
    [COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR] Dn
[COLOR="Navy"]End[/COLOR] With
[COLOR="Navy"]If[/COLOR] Not nRng [COLOR="Navy"]Is[/COLOR] Nothing [COLOR="Navy"]Then[/COLOR] nRng.EntireRow.Delete
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]
Regards Mick
 
Upvote 0
Thanks for getting back to me so quickly!

I have tested this and it seems to delete a lot more rows than I was expecting. It has removed a lot more Attended's than I was after and I have also tested for someone with 3 'absent' status' for the same course and it deletes out all 3 of them rather than leaving the most recent. There are also some with one unique row that is attended but that gets deleted.

I guess it's hard to explain exactly but I don't know if you can see why it would be deleting out more than it should?

Thanks again
 
Upvote 0
No problem. I have checked and can't attach a file to this but hopefully this will give you a better idea. If you need it in a different format please just let me know.

Emp No (col A) Date (col R) Course (col W) Status (col Z) Expected Result
1 12/2/18 Induction Attended Delete as not the most recent attend
1 13/2/18 Induction Attended Keep as most recent attend
1 10/2/18 Induction Absent Delete as there is an attend
1 11/2/18 Induction Confirmed Delete as there is an attend
1 15/2/18 Soft Skills Confirmed Keep as there is no duplicate for this
2 10/2/18 Induction Absent Delete as there is a confirmed
2 11/2/18 Induction Absent Delete as there is a confirmed
2 13/2/18 Induction Confirmed Keep as best result for this person
3 15/2/18 Soft Skills Absent Keep as no duplicate
4 12/2/18 Mandatory Awaiting Response Delete as not the best result
4 10/2/18 Mandatory Absent Delete as not the best result
4 14/2/18 Mandatory Confirmed Keep as best result for course/person

I hope this gives a better, more practical example. As a rule, my description would be:

1) If there is only one row for the person with a course keep the row as it's the only instance.
2) If there are multiple rows of the same person with the same course, delete out the extras keeping only the 'best' status using the priority - Attended > Confirmed > Awaiting Response > Absent
3) If there are duplicate 'best' entries for the same person with the same course, delete all but the best status with the most recent date.

Hope this helps. Thanks very much for all your help with this.
 
Upvote 0
We did not take account of there being different courses in the previous code !!
I have altered the code accordingly and at the moment it returns the correct results on your data.
Give it a try to see if it stands Up !!
Code:
[COLOR="Navy"]Sub[/COLOR] MG06Mar34
[COLOR="Navy"]Dim[/COLOR] Rng [COLOR="Navy"]As[/COLOR] Range, Dn [COLOR="Navy"]As[/COLOR] Range, n [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Q [COLOR="Navy"]As[/COLOR] Variant, K [COLOR="Navy"]As[/COLOR] Variant, a [COLOR="Navy"]As[/COLOR] Range
[COLOR="Navy"]Dim[/COLOR] b [COLOR="Navy"]As[/COLOR] Range, c [COLOR="Navy"]As[/COLOR] Range, d [COLOR="Navy"]As[/COLOR] Range, col [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Dic [COLOR="Navy"]As[/COLOR] Object, nRng [COLOR="Navy"]As[/COLOR] Range, Txt [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]String[/COLOR]
[COLOR="Navy"]Set[/COLOR] Rng = Range("A2", Range("A" & Rows.Count).End(xlUp))
'[COLOR="Green"][B]Attended / Confirmed / Awaiting Response / Absent.[/B][/COLOR]
[COLOR="Navy"]With[/COLOR] CreateObject("scripting.dictionary")
 .CompareMode = vbTextCompare
   [COLOR="Navy"]Set[/COLOR] Dic = CreateObject("scripting.dictionary")
     Dic.CompareMode = vbTextCompare
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] Dn [COLOR="Navy"]In[/COLOR] Rng
    Txt = Dn.Value & Dn.Offset(, 22)
    [COLOR="Navy"]Select[/COLOR] [COLOR="Navy"]Case[/COLOR] Dn.Offset(, 25).Value
        [COLOR="Navy"]Case[/COLOR] "Attended": col = 0
        [COLOR="Navy"]Case[/COLOR] "Confirmed": col = 1
        [COLOR="Navy"]Case[/COLOR] "Awaiting Response": col = 2
        [COLOR="Navy"]Case[/COLOR] "Absent": col = 3
    [COLOR="Navy"]End[/COLOR] Select
    [COLOR="Navy"]If[/COLOR] Not .exists(Txt) [COLOR="Navy"]Then[/COLOR]
        .Add Txt, Array(a, b, c, d)
        Q = .Item(Txt)
            [COLOR="Navy"]Set[/COLOR] Q(col) = Dn
        .Item(Txt) = Q
    [COLOR="Navy"]Else[/COLOR]
    Q = .Item(Txt)
        [COLOR="Navy"]If[/COLOR] Q(col) [COLOR="Navy"]Is[/COLOR] Nothing [COLOR="Navy"]Then[/COLOR]
            [COLOR="Navy"]Set[/COLOR] Q(col) = Dn
        [COLOR="Navy"]Else[/COLOR]
            [COLOR="Navy"]If[/COLOR] Dn.Offset(, 17).Value > Q(col).Offset(, 17).Value [COLOR="Navy"]Then[/COLOR] [COLOR="Navy"]Set[/COLOR] Q(col) = Dn
        [COLOR="Navy"]End[/COLOR] If
    .Item(Txt) = Q
[COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR]
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] K [COLOR="Navy"]In[/COLOR] .keys
[COLOR="Navy"]For[/COLOR] n = 0 To 3
    [COLOR="Navy"]If[/COLOR] Not .Item(K)(n) [COLOR="Navy"]Is[/COLOR] Nothing [COLOR="Navy"]Then[/COLOR]
         Dic(.Item(K)(n).Address) = Empty
        [COLOR="Navy"]Exit[/COLOR] For
    [COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR] n
[COLOR="Navy"]Next[/COLOR] K
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] Dn [COLOR="Navy"]In[/COLOR] Rng
    [COLOR="Navy"]If[/COLOR] Not Dic.exists(Dn.Address) [COLOR="Navy"]Then[/COLOR]
        [COLOR="Navy"]If[/COLOR] nRng [COLOR="Navy"]Is[/COLOR] Nothing [COLOR="Navy"]Then[/COLOR] [COLOR="Navy"]Set[/COLOR] nRng = Dn Else [COLOR="Navy"]Set[/COLOR] nRng = Union(nRng, Dn)
    [COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR] Dn
[COLOR="Navy"]End[/COLOR] With
[COLOR="Navy"]If[/COLOR] Not nRng [COLOR="Navy"]Is[/COLOR] Nothing [COLOR="Navy"]Then[/COLOR] nRng.EntireRow.Delete
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]
Regards Mick
 
Upvote 0

Forum statistics

Threads
1,215,241
Messages
6,123,823
Members
449,127
Latest member
Cyko

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top