Here's a good one! Delete duplicates with macro keeping only the newest by date

rkol297

Board Regular
Joined
Nov 12, 2010
Messages
131
Office Version
  1. 365
  2. 2019
Platform
  1. Windows
I have a spreedsheet with 17,000 rows in it. In column D I have ID numbers and in column B I have the dates. There are multiple entries for each ID in column D which gives multiples dates each entry was given in column B. I need a VBA Macro that will identify duplicate entries in Column D and keep only the most recent entry by date in column B. Any help would be greatly appreciated. I'm banging my head on the desk to get this to work thanks!
 
Yes, this part is in addition to the macro that were working on.

Actually, the maximum value is determined on a row by row basis, between the columns J through V, in using the unique id.

Thanks for the code. :)
 
Upvote 0

Excel Facts

Ambidextrous Undo
Undo last command with Ctrl+Z or Alt+Backspace. If you use the Undo icon in the QAT, open the drop-down arrow to undo up to 100 steps.
Here's a modified version of the macro, which uses a custom function. Hopefully, it should be faster...

Code:
[COLOR=darkblue]Option[/COLOR] [COLOR=darkblue]Explicit[/COLOR]

[COLOR=darkblue]Sub[/COLOR] KeepLastEntryAndMaxVals()

    [COLOR=darkblue]Dim[/COLOR] varSortedData [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] varUniqueVals [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] LastRow [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
    
    Application.ScreenUpdating = [COLOR=darkblue]False[/COLOR]
    
    LastRow = Cells(Rows.Count, "C").End(xlUp).Row
    
    [COLOR=darkblue]If[/COLOR] LastRow = 1 [COLOR=darkblue]Then[/COLOR]
        MsgBox "No data found.", vbInformation
        [COLOR=darkblue]Exit[/COLOR] [COLOR=darkblue]Sub[/COLOR]
    [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
    
    [COLOR=darkblue]With[/COLOR] Range("C1:V" & LastRow)
        .Sort _
            key1:=.Cells(1), order1:=xlAscending, _
            key2:=.Cells(1, 5), order2:=xlDescending, _
            Header:=xlYes, _
            OrderCustom:=1, _
            MatchCase:=False, _
            Orientation:=xlTop[COLOR=darkblue]To[/COLOR]Bottom
        varSortedData = .Offset(1, 0).Resize(.Rows.Count - 1).Value
        varUniqueVals = GetUniqueAndMaxVals(varSortedData)
        .Offset(1, 0).ClearContents
        .Offset(1, 0).Resize(UBound(varUniqueVals, 1), [COLOR=darkblue]UBound[/COLOR](varUniqueVals, 2)).Value = varUniqueVals
    [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]With[/COLOR]
    
    Application.ScreenUpdating = [COLOR=darkblue]True[/COLOR]
    
    MsgBox "Completed...", vbInformation
        
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Sub[/COLOR]

[COLOR=darkblue]Private[/COLOR] [COLOR=darkblue]Function[/COLOR] GetUniqueAndMaxVals([COLOR=darkblue]ByVal[/COLOR] varData [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR])

    [COLOR=darkblue]Dim[/COLOR] objDic [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Object[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] arrUniqueVals() [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] UnqIndx [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] i [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] j [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
    
    [COLOR=darkblue]Set[/COLOR] objDic = CreateObject("Scripting.Dictionary")
    objDic.CompareMode = 1 [COLOR=green]'vbTextCompare[/COLOR]
    
    [COLOR=darkblue]ReDim[/COLOR] arrUniqueVals(1 [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](varData, 1), 1 [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](varData, 2))
    
    UnqIndx = 0
    [COLOR=darkblue]For[/COLOR] i = 1 [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](varData, 1)
        [COLOR=darkblue]If[/COLOR] [COLOR=darkblue]Not[/COLOR] objDic.Exists(varData(i, 1)) [COLOR=darkblue]Then[/COLOR]
            UnqIndx = UnqIndx + 1
            [COLOR=darkblue]For[/COLOR] j = 1 [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](varData, 2)
                arrUniqueVals(UnqIndx, j) = varData(i, j)
            [COLOR=darkblue]Next[/COLOR] j
            objDic.Add varData(i, 1), UnqIndx
        [COLOR=darkblue]Else[/COLOR]
            [COLOR=darkblue]For[/COLOR] j = 8 To UBound(varData, 2)
                [COLOR=darkblue]If[/COLOR] Application.IsNumber(varData(i, j)) [COLOR=darkblue]Then[/COLOR]
                    arrUniqueVals(objDic.Item(varData(i, 1)), j) = _
                        Application.Max(varData(i, j), arrUniqueVals(objDic.Item(varData(i, 1)), j))
                [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
            [COLOR=darkblue]Next[/COLOR] j
        [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
    [COLOR=darkblue]Next[/COLOR] i
    
    GetUniqueAndMaxVals = arrUniqueVals
    
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Function[/COLOR]

Hope this helps!
 
Last edited:
Upvote 0
Hi Domenic,

So, now I'm not using 'Sub
KeepLastEntryAndMaxVals()' and using 'Private
Function GetUniqueAndMaxVals(ByVal varData As Variant)', right?

So if I am, I really do appreciate you using the 'Dictionary Script', I am familar with that concept, from the begining of this thread.

I can't wait to try it out!

Thanks,

Pin
 
Upvote 0
Actually, copy both of them (KeepLastEntryAndMaxVals and GetUniqueAndMaxVals) into a regular module, and then run KeepLastEntryAndMaxVals. When it runs, it will call the function GetUniqueAndMaxVals, which uses the Dictionary object.
 
Upvote 0
Hi Domenic,


I tested it out and it's working great! :) & Yes, it is much faster!

But can you adjust your code to start placing the data beginning on row 2 at C2, not in beginning on row 3 at cell C3?

Also, can you please tell it to arrange the data produced, according to Column G from the lowest to its highest value?

Thank you!


R/
Pinaceous
 
Upvote 0
I tested it out and it's working great! :) & Yes, it is much faster!

That's great, glad to hear it...

But can you adjust your code to start placing the data beginning on row 2 at C2, not in beginning on row 3 at cell C3?

Actually, it should start on Row 2, but I'll take another look...

Also, can you please tell it to arrange the data produced, according to Column G from the lowest to its highest value?

Can you post some sample data before the macro runs, and the results you expect after it runs?
 
Upvote 0
But can you adjust your code to start placing the data beginning on row 2 at C2, not in beginning on row 3 at cell C3?


It still is doing this? The first row is hidden on the sheet, but when I unhide it, it still produces the data starting in cell c3. I'm guessing its in the code, somewhere...


Also, can you please tell it to arrange the data produced, according to Column G from the lowest to its highest value?


Okay, after the macro code:

It produces the data information in the following way:


Column C
D
E
F
G (24hr time)
H
I
J - V
D2 402
D2
402
2200
OGE
143
Max Value #'s
D5 414
D5
414
1410
YMT
95
Max Value #'s
IA 101
IA
101
735
LDE
339
Max Value #'s
RA 1300
RA
1300
905
EEZ
256
Max Value #'s
VA 20
VA
20
540
GBO
250
Max Value #'s
VA 244
VA
244
2200
GBO
137
Max Value #'s
VA 670
VA
670
212
LSA
153
Max Value #'s
WB 11
WB
11
2230
JMB
82
Max Value #'s
WB 17
WB
17
10
NKI
141
Max Value #'s

<tbody>
</tbody>


What, I'd like, is for the data to post the information in ascending order according to column G; represented in the following way:



Column C
D
E
F
G (24hr time)
H
I
J - V
WB 17
WB
17
10
NKI
141
Max Value #'s
VA 670
VA
670
212
LSA
153
Max Value #'s
VA 20
VA
20
540
GBO
250
Max Value #'s
IA 101
IA
101
735
LDE
339
Max Value #'s
RA 1300
RA
1300
905
EEZ
256
Max Value #'s
D5 414
D5
414
1410
YMT
95
Max Value #'s
D2 402
D2
402
2200
OGE
143
Max Value #'s
VA 244
VA
244
2200
GBO
137
Max Value #'s
WB 11
WB
11
2230
JMB
82
Max Value #'s

<tbody>
</tbody>


Thank you!


=}
 
Upvote 0
I think we can add the following lines in red, which sorts the unique list at the end...

Code:
Sub KeepLastEntryAndMaxVals()

    Dim varSortedData As Variant
    Dim varUniqueVals As Variant
    Dim LastRow As Long
    
    Application.ScreenUpdating = False
    
    LastRow = Cells(Rows.Count, "C").End(xlUp).Row
    
    If LastRow = 1 Then
        MsgBox "No data found.", vbInformation
        Exit Sub
    End If
    
    With Range("C1:V" & LastRow)
        .Sort _
            key1:=.Cells(1), order1:=xlAscending, _
            key2:=.Cells(1, 5), order2:=xlDescending, _
            Header:=xlYes, _
            OrderCustom:=1, _
            MatchCase:=False, _
            Orientation:=xlTopToBottom
        varSortedData = .Offset(1, 0).Resize(.Rows.Count - 1).Value
        varUniqueVals = GetUniqueAndMaxVals(varSortedData)
        .Offset(1, 0).ClearContents
        .Offset(1, 0).Resize(UBound(varUniqueVals, 1), UBound(varUniqueVals, 2)).Value = varUniqueVals
    End With
    
    [COLOR=#ff0000]With Range("C1:V" & Cells(Rows.Count, "C").End(xlUp).Row)
        .Sort _
            key1:=.Cells(1, 5), order1:=xlAscending, _
            Header:=xlYes, _
            OrderCustom:=1, _
            MatchCase:=False, _
            Orientation:=xlTopToBottom
    End With[/COLOR]
    
    Application.ScreenUpdating = True
    
    MsgBox "Completed...", vbInformation
        
End Sub
 
Upvote 0
Hi Domenic,


The macro is working really great! I really do appreciate your continued help on this.




I’d like to describe the behavior of the sheet a little more in detail…




Basically, the macro is deleting duplicate value, across the 3 separate data rows ranges' through the use of a unique id supplied by column C, in producing a final outcome. Therefore, there are no more than 3 unique ids represented on the sheet at any 1 time between the range of rows 2-573.



Where on the sheet it can be broken down into 3 separate row ranges, being;




Data 2: Rows Range (2-172)
Data 3: Rows Range (173-273)
Data 1: Rows Range (274-573)




Behavior of the sheet: The data is dependent on time, where, it posts data by order from first to Data 1 to Data 2 than to finally to Data 3, independent of the macro and user executing it.



An example will follow, of which I’d like to explain an additional concept in building on what we already have accomplished.


Just let me know you understand what I'm explaining here, first.


R/
Pin
 
Upvote 0
Hello,
Can I please get some help with a macro, looks like mine would need a little modification from what you guys have been doing but I can't seem to figure it out. I put the columns and rows in to better define what I'm looking for. The keys for me are column B and column D.

Row 2 and row 3 have the same unit number whenever this is identified look at column D and only keep the row with the most current time. In this example row 3 would be kept and row 2 deleted from the spreadsheet.

My spreadsheet will have up to 100000 rows of data. Thank you all.

Column AColumn BColumn cColumn DColumn EColumn F
Row 1Green JacketCreated OnCreated AtProductGrams Weight
Row 2Unit 15/1/201517:00:40A71.907
Row 3Unit 15/1/201517:23:56A264.216
Row 4Unit 25/1/201517:02:48A71.907
Row 5Unit 25/1/201517:24:04A264.216
Row 6Unit 35/1/201517:03:02A71.907
Row 7Unit 35/1/201517:24:12A264.216
Row 8Unit 45/1/201517:03:17A71.907
Row 9Unit 45/1/201517:24:23A264.216
Row 10Unit 55/1/201517:03:30A126.255
Row 11Unit 55/1/201517:24:35A464.887
Row 12Unit 65/1/201517:03:44A71.907
Row 13Unit 65/1/201517:24:47A264.216
Row 14Unit 75/1/201517:03:56A71.907
Row 15Unit 75/1/201517:24:58A264.216
Row 16Unit 85/1/201517:04:14A71.907
Row 17Unit 85/1/201517:25:10A264.216

<tbody>
</tbody><colgroup><col><col><col><col><col><col></colgroup>
 
Upvote 0

Forum statistics

Threads
1,215,181
Messages
6,123,508
Members
449,101
Latest member
mgro123

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top