Here's a good one! Delete duplicates with macro keeping only the newest by date

rkol297

Board Regular
Joined
Nov 12, 2010
Messages
131
Office Version
  1. 365
  2. 2019
Platform
  1. Windows
I have a spreedsheet with 17,000 rows in it. In column D I have ID numbers and in column B I have the dates. There are multiple entries for each ID in column D which gives multiples dates each entry was given in column B. I need a VBA Macro that will identify duplicate entries in Column D and keep only the most recent entry by date in column B. Any help would be greatly appreciated. I'm banging my head on the desk to get this to work thanks!
 
Yes, this part is in addition to the macro that were working on.

Actually, the maximum value is determined on a row by row basis, between the columns J through V, in using the unique id.

Thanks for the code. :)
 
Upvote 0

Excel Facts

Workdays for a market open Mon, Wed, Friday?
Yes! Use "0101011" for the weekend argument in NETWORKDAYS.INTL or WORKDAY.INTL. The 7 digits start on Monday. 1 means it is a weekend.
Here's a modified version of the macro, which uses a custom function. Hopefully, it should be faster...

Code:
[COLOR=darkblue]Option[/COLOR] [COLOR=darkblue]Explicit[/COLOR]

[COLOR=darkblue]Sub[/COLOR] KeepLastEntryAndMaxVals()

    [COLOR=darkblue]Dim[/COLOR] varSortedData [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] varUniqueVals [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] LastRow [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
    
    Application.ScreenUpdating = [COLOR=darkblue]False[/COLOR]
    
    LastRow = Cells(Rows.Count, "C").End(xlUp).Row
    
    [COLOR=darkblue]If[/COLOR] LastRow = 1 [COLOR=darkblue]Then[/COLOR]
        MsgBox "No data found.", vbInformation
        [COLOR=darkblue]Exit[/COLOR] [COLOR=darkblue]Sub[/COLOR]
    [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
    
    [COLOR=darkblue]With[/COLOR] Range("C1:V" & LastRow)
        .Sort _
            key1:=.Cells(1), order1:=xlAscending, _
            key2:=.Cells(1, 5), order2:=xlDescending, _
            Header:=xlYes, _
            OrderCustom:=1, _
            MatchCase:=False, _
            Orientation:=xlTop[COLOR=darkblue]To[/COLOR]Bottom
        varSortedData = .Offset(1, 0).Resize(.Rows.Count - 1).Value
        varUniqueVals = GetUniqueAndMaxVals(varSortedData)
        .Offset(1, 0).ClearContents
        .Offset(1, 0).Resize(UBound(varUniqueVals, 1), [COLOR=darkblue]UBound[/COLOR](varUniqueVals, 2)).Value = varUniqueVals
    [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]With[/COLOR]
    
    Application.ScreenUpdating = [COLOR=darkblue]True[/COLOR]
    
    MsgBox "Completed...", vbInformation
        
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Sub[/COLOR]

[COLOR=darkblue]Private[/COLOR] [COLOR=darkblue]Function[/COLOR] GetUniqueAndMaxVals([COLOR=darkblue]ByVal[/COLOR] varData [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR])

    [COLOR=darkblue]Dim[/COLOR] objDic [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Object[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] arrUniqueVals() [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Variant[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] UnqIndx [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] i [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] j [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
    
    [COLOR=darkblue]Set[/COLOR] objDic = CreateObject("Scripting.Dictionary")
    objDic.CompareMode = 1 [COLOR=green]'vbTextCompare[/COLOR]
    
    [COLOR=darkblue]ReDim[/COLOR] arrUniqueVals(1 [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](varData, 1), 1 [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](varData, 2))
    
    UnqIndx = 0
    [COLOR=darkblue]For[/COLOR] i = 1 [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](varData, 1)
        [COLOR=darkblue]If[/COLOR] [COLOR=darkblue]Not[/COLOR] objDic.Exists(varData(i, 1)) [COLOR=darkblue]Then[/COLOR]
            UnqIndx = UnqIndx + 1
            [COLOR=darkblue]For[/COLOR] j = 1 [COLOR=darkblue]To[/COLOR] [COLOR=darkblue]UBound[/COLOR](varData, 2)
                arrUniqueVals(UnqIndx, j) = varData(i, j)
            [COLOR=darkblue]Next[/COLOR] j
            objDic.Add varData(i, 1), UnqIndx
        [COLOR=darkblue]Else[/COLOR]
            [COLOR=darkblue]For[/COLOR] j = 8 To UBound(varData, 2)
                [COLOR=darkblue]If[/COLOR] Application.IsNumber(varData(i, j)) [COLOR=darkblue]Then[/COLOR]
                    arrUniqueVals(objDic.Item(varData(i, 1)), j) = _
                        Application.Max(varData(i, j), arrUniqueVals(objDic.Item(varData(i, 1)), j))
                [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
            [COLOR=darkblue]Next[/COLOR] j
        [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
    [COLOR=darkblue]Next[/COLOR] i
    
    GetUniqueAndMaxVals = arrUniqueVals
    
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Function[/COLOR]

Hope this helps!
 
Last edited:
Upvote 0
Hi Domenic,

So, now I'm not using 'Sub
KeepLastEntryAndMaxVals()' and using 'Private
Function GetUniqueAndMaxVals(ByVal varData As Variant)', right?

So if I am, I really do appreciate you using the 'Dictionary Script', I am familar with that concept, from the begining of this thread.

I can't wait to try it out!

Thanks,

Pin
 
Upvote 0
Actually, copy both of them (KeepLastEntryAndMaxVals and GetUniqueAndMaxVals) into a regular module, and then run KeepLastEntryAndMaxVals. When it runs, it will call the function GetUniqueAndMaxVals, which uses the Dictionary object.
 
Upvote 0
Hi Domenic,


I tested it out and it's working great! :) & Yes, it is much faster!

But can you adjust your code to start placing the data beginning on row 2 at C2, not in beginning on row 3 at cell C3?

Also, can you please tell it to arrange the data produced, according to Column G from the lowest to its highest value?

Thank you!


R/
Pinaceous
 
Upvote 0
I tested it out and it's working great! :) & Yes, it is much faster!

That's great, glad to hear it...

But can you adjust your code to start placing the data beginning on row 2 at C2, not in beginning on row 3 at cell C3?

Actually, it should start on Row 2, but I'll take another look...

Also, can you please tell it to arrange the data produced, according to Column G from the lowest to its highest value?

Can you post some sample data before the macro runs, and the results you expect after it runs?
 
Upvote 0
But can you adjust your code to start placing the data beginning on row 2 at C2, not in beginning on row 3 at cell C3?


It still is doing this? The first row is hidden on the sheet, but when I unhide it, it still produces the data starting in cell c3. I'm guessing its in the code, somewhere...


Also, can you please tell it to arrange the data produced, according to Column G from the lowest to its highest value?


Okay, after the macro code:

It produces the data information in the following way:


[TABLE="class: grid, width: 500, align: center"]
<tbody>[TR]
[TD]Column C
[/TD]
[TD]D
[/TD]
[TD]E
[/TD]
[TD]F
[/TD]
[TD]G (24hr time)
[/TD]
[TD]H
[/TD]
[TD]I
[/TD]
[TD]J - V
[/TD]
[/TR]
[TR]
[TD]D2 402
[/TD]
[TD][/TD]
[TD]D2
[/TD]
[TD]402
[/TD]
[TD]2200
[/TD]
[TD]OGE
[/TD]
[TD]143
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]D5 414
[/TD]
[TD][/TD]
[TD]D5
[/TD]
[TD]414
[/TD]
[TD]1410
[/TD]
[TD]YMT
[/TD]
[TD]95
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]IA 101
[/TD]
[TD][/TD]
[TD]IA
[/TD]
[TD]101
[/TD]
[TD]735
[/TD]
[TD]LDE
[/TD]
[TD]339
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]RA 1300
[/TD]
[TD][/TD]
[TD]RA
[/TD]
[TD]1300
[/TD]
[TD]905
[/TD]
[TD]EEZ
[/TD]
[TD]256
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]VA 20
[/TD]
[TD][/TD]
[TD]VA
[/TD]
[TD]20
[/TD]
[TD]540
[/TD]
[TD]GBO
[/TD]
[TD]250
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]VA 244
[/TD]
[TD][/TD]
[TD]VA
[/TD]
[TD]244
[/TD]
[TD]2200
[/TD]
[TD]GBO
[/TD]
[TD]137
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]VA 670
[/TD]
[TD][/TD]
[TD]VA
[/TD]
[TD]670
[/TD]
[TD]212
[/TD]
[TD]LSA
[/TD]
[TD]153
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]WB 11
[/TD]
[TD][/TD]
[TD]WB
[/TD]
[TD]11
[/TD]
[TD]2230
[/TD]
[TD]JMB
[/TD]
[TD]82
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]WB 17
[/TD]
[TD][/TD]
[TD]WB
[/TD]
[TD]17
[/TD]
[TD]10
[/TD]
[TD]NKI
[/TD]
[TD]141
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
</tbody>[/TABLE]


What, I'd like, is for the data to post the information in ascending order according to column G; represented in the following way:



[TABLE="class: grid, width: 500, align: center"]
<tbody>[TR]
[TD]Column C
[/TD]
[TD]D
[/TD]
[TD]E
[/TD]
[TD]F
[/TD]
[TD]G (24hr time)
[/TD]
[TD]H
[/TD]
[TD]I
[/TD]
[TD]J - V
[/TD]
[/TR]
[TR]
[TD]WB 17
[/TD]
[TD][/TD]
[TD]WB
[/TD]
[TD]17
[/TD]
[TD]10
[/TD]
[TD]NKI
[/TD]
[TD]141
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]VA 670
[/TD]
[TD][/TD]
[TD]VA
[/TD]
[TD]670
[/TD]
[TD]212
[/TD]
[TD]LSA
[/TD]
[TD]153
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]VA 20
[/TD]
[TD][/TD]
[TD]VA
[/TD]
[TD]20
[/TD]
[TD]540
[/TD]
[TD]GBO
[/TD]
[TD]250
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]IA 101
[/TD]
[TD][/TD]
[TD]IA
[/TD]
[TD]101
[/TD]
[TD]735
[/TD]
[TD]LDE
[/TD]
[TD]339
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]RA 1300
[/TD]
[TD][/TD]
[TD]RA
[/TD]
[TD]1300
[/TD]
[TD]905
[/TD]
[TD]EEZ
[/TD]
[TD]256
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]D5 414
[/TD]
[TD][/TD]
[TD]D5
[/TD]
[TD]414
[/TD]
[TD]1410
[/TD]
[TD]YMT
[/TD]
[TD]95
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]D2 402
[/TD]
[TD][/TD]
[TD]D2
[/TD]
[TD]402
[/TD]
[TD]2200
[/TD]
[TD]OGE
[/TD]
[TD]143
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]VA 244
[/TD]
[TD][/TD]
[TD]VA
[/TD]
[TD]244
[/TD]
[TD]2200
[/TD]
[TD]GBO
[/TD]
[TD]137
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
[TR]
[TD]WB 11
[/TD]
[TD][/TD]
[TD]WB
[/TD]
[TD]11
[/TD]
[TD]2230
[/TD]
[TD]JMB
[/TD]
[TD]82
[/TD]
[TD]Max Value #'s
[/TD]
[/TR]
</tbody>[/TABLE]


Thank you!


=}
 
Upvote 0
I think we can add the following lines in red, which sorts the unique list at the end...

Code:
Sub KeepLastEntryAndMaxVals()

    Dim varSortedData As Variant
    Dim varUniqueVals As Variant
    Dim LastRow As Long
    
    Application.ScreenUpdating = False
    
    LastRow = Cells(Rows.Count, "C").End(xlUp).Row
    
    If LastRow = 1 Then
        MsgBox "No data found.", vbInformation
        Exit Sub
    End If
    
    With Range("C1:V" & LastRow)
        .Sort _
            key1:=.Cells(1), order1:=xlAscending, _
            key2:=.Cells(1, 5), order2:=xlDescending, _
            Header:=xlYes, _
            OrderCustom:=1, _
            MatchCase:=False, _
            Orientation:=xlTopToBottom
        varSortedData = .Offset(1, 0).Resize(.Rows.Count - 1).Value
        varUniqueVals = GetUniqueAndMaxVals(varSortedData)
        .Offset(1, 0).ClearContents
        .Offset(1, 0).Resize(UBound(varUniqueVals, 1), UBound(varUniqueVals, 2)).Value = varUniqueVals
    End With
    
    [COLOR=#ff0000]With Range("C1:V" & Cells(Rows.Count, "C").End(xlUp).Row)
        .Sort _
            key1:=.Cells(1, 5), order1:=xlAscending, _
            Header:=xlYes, _
            OrderCustom:=1, _
            MatchCase:=False, _
            Orientation:=xlTopToBottom
    End With[/COLOR]
    
    Application.ScreenUpdating = True
    
    MsgBox "Completed...", vbInformation
        
End Sub
 
Upvote 0
Hi Domenic,


The macro is working really great! I really do appreciate your continued help on this.




I’d like to describe the behavior of the sheet a little more in detail…




Basically, the macro is deleting duplicate value, across the 3 separate data rows ranges' through the use of a unique id supplied by column C, in producing a final outcome. Therefore, there are no more than 3 unique ids represented on the sheet at any 1 time between the range of rows 2-573.



Where on the sheet it can be broken down into 3 separate row ranges, being;




Data 2: Rows Range (2-172)
Data 3: Rows Range (173-273)
Data 1: Rows Range (274-573)




Behavior of the sheet: The data is dependent on time, where, it posts data by order from first to Data 1 to Data 2 than to finally to Data 3, independent of the macro and user executing it.



An example will follow, of which I’d like to explain an additional concept in building on what we already have accomplished.


Just let me know you understand what I'm explaining here, first.


R/
Pin
 
Upvote 0
Hello,
Can I please get some help with a macro, looks like mine would need a little modification from what you guys have been doing but I can't seem to figure it out. I put the columns and rows in to better define what I'm looking for. The keys for me are column B and column D.

Row 2 and row 3 have the same unit number whenever this is identified look at column D and only keep the row with the most current time. In this example row 3 would be kept and row 2 deleted from the spreadsheet.

My spreadsheet will have up to 100000 rows of data. Thank you all.

[TABLE="width: 464"]
<tbody>[TR]
[TD]Column A[/TD]
[TD]Column B[/TD]
[TD]Column c[/TD]
[TD]Column D[/TD]
[TD]Column E[/TD]
[TD]Column F[/TD]
[/TR]
[TR]
[TD]Row 1[/TD]
[TD]Green Jacket[/TD]
[TD]Created On[/TD]
[TD]Created At[/TD]
[TD]Product[/TD]
[TD]Grams Weight[/TD]
[/TR]
[TR]
[TD]Row 2[/TD]
[TD]Unit 1[/TD]
[TD]5/1/2015[/TD]
[TD]17:00:40[/TD]
[TD]A[/TD]
[TD]71.907[/TD]
[/TR]
[TR]
[TD]Row 3[/TD]
[TD]Unit 1[/TD]
[TD]5/1/2015[/TD]
[TD]17:23:56[/TD]
[TD]A[/TD]
[TD]264.216[/TD]
[/TR]
[TR]
[TD]Row 4[/TD]
[TD]Unit 2[/TD]
[TD]5/1/2015[/TD]
[TD]17:02:48[/TD]
[TD]A[/TD]
[TD]71.907[/TD]
[/TR]
[TR]
[TD]Row 5[/TD]
[TD]Unit 2[/TD]
[TD]5/1/2015[/TD]
[TD]17:24:04[/TD]
[TD]A[/TD]
[TD]264.216[/TD]
[/TR]
[TR]
[TD]Row 6[/TD]
[TD]Unit 3[/TD]
[TD]5/1/2015[/TD]
[TD]17:03:02[/TD]
[TD]A[/TD]
[TD]71.907[/TD]
[/TR]
[TR]
[TD]Row 7[/TD]
[TD]Unit 3[/TD]
[TD]5/1/2015[/TD]
[TD]17:24:12[/TD]
[TD]A[/TD]
[TD]264.216[/TD]
[/TR]
[TR]
[TD]Row 8[/TD]
[TD]Unit 4[/TD]
[TD]5/1/2015[/TD]
[TD]17:03:17[/TD]
[TD]A[/TD]
[TD]71.907[/TD]
[/TR]
[TR]
[TD]Row 9[/TD]
[TD]Unit 4[/TD]
[TD]5/1/2015[/TD]
[TD]17:24:23[/TD]
[TD]A[/TD]
[TD]264.216[/TD]
[/TR]
[TR]
[TD]Row 10[/TD]
[TD]Unit 5[/TD]
[TD]5/1/2015[/TD]
[TD]17:03:30[/TD]
[TD]A[/TD]
[TD]126.255[/TD]
[/TR]
[TR]
[TD]Row 11[/TD]
[TD]Unit 5[/TD]
[TD]5/1/2015[/TD]
[TD]17:24:35[/TD]
[TD]A[/TD]
[TD]464.887[/TD]
[/TR]
[TR]
[TD]Row 12[/TD]
[TD]Unit 6[/TD]
[TD]5/1/2015[/TD]
[TD]17:03:44[/TD]
[TD]A[/TD]
[TD]71.907[/TD]
[/TR]
[TR]
[TD]Row 13[/TD]
[TD]Unit 6[/TD]
[TD]5/1/2015[/TD]
[TD]17:24:47[/TD]
[TD]A[/TD]
[TD]264.216[/TD]
[/TR]
[TR]
[TD]Row 14[/TD]
[TD]Unit 7[/TD]
[TD]5/1/2015[/TD]
[TD]17:03:56[/TD]
[TD]A[/TD]
[TD]71.907[/TD]
[/TR]
[TR]
[TD]Row 15[/TD]
[TD]Unit 7[/TD]
[TD]5/1/2015[/TD]
[TD]17:24:58[/TD]
[TD]A[/TD]
[TD]264.216[/TD]
[/TR]
[TR]
[TD]Row 16[/TD]
[TD]Unit 8[/TD]
[TD]5/1/2015[/TD]
[TD]17:04:14[/TD]
[TD]A[/TD]
[TD]71.907[/TD]
[/TR]
[TR]
[TD]Row 17[/TD]
[TD]Unit 8[/TD]
[TD]5/1/2015[/TD]
[TD]17:25:10[/TD]
[TD]A[/TD]
[TD]264.216[/TD]
[/TR]
</tbody><colgroup><col><col><col><col><col><col></colgroup>[/TABLE]
 
Upvote 0

Forum statistics

Threads
1,223,099
Messages
6,170,108
Members
452,302
Latest member
TaMere

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top