Here's a good one! Delete duplicates with macro keeping only the newest by date

rkol297

Board Regular
Joined
Nov 12, 2010
Messages
131
Office Version
  1. 365
  2. 2019
Platform
  1. Windows
I have a spreedsheet with 17,000 rows in it. In column D I have ID numbers and in column B I have the dates. There are multiple entries for each ID in column D which gives multiples dates each entry was given in column B. I need a VBA Macro that will identify duplicate entries in Column D and keep only the most recent entry by date in column B. Any help would be greatly appreciated. I'm banging my head on the desk to get this to work thanks!
 

Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest
Try...

<font face=Calibri><br><SPAN style="color:#00007F">Sub</SPAN> test()<br><br>    <SPAN style="color:#00007F">Dim</SPAN> Rng <SPAN style="color:#00007F">As</SPAN> Range<br>    <SPAN style="color:#00007F">Dim</SPAN> LastRow <SPAN style="color:#00007F">As</SPAN> <SPAN style="color:#00007F">Long</SPAN><br>    <SPAN style="color:#00007F">Dim</SPAN> i <SPAN style="color:#00007F">As</SPAN> <SPAN style="color:#00007F">Long</SPAN><br>    <br>    Application.ScreenUpdating = <SPAN style="color:#00007F">False</SPAN><br>    <br>    LastRow = Cells(Rows.Count, "D").End(xlUp).Row<br>    <br>    <SPAN style="color:#00007F">Set</SPAN> Rng = Range("B1:D" & LastRow)<br>    <br>    <SPAN style="color:#00007F">With</SPAN> Rng<br>        .Sort key1:=Range("D1"), order1:=xlAscending, key2:=Range("B1"), order2:=xlDescending, _<br>            Header:=xlYes, OrderCustom:=1, MatchCase:=False, _<br>            Orientation:=xlTop<SPAN style="color:#00007F">To</SPAN>Bottom<br>    <SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN><br>    <br>    <SPAN style="color:#00007F">For</SPAN> i = LastRow To 2 <SPAN style="color:#00007F">Step</SPAN> -1<br>        <SPAN style="color:#00007F">If</SPAN> WorksheetFunction.CountIf(Range(Cells(2, "D"), Cells(i, "D")), Cells(i, "D")) > 1 <SPAN style="color:#00007F">Then</SPAN><br>            Rows(i).Delete<br>        <SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">If</SPAN><br>    <SPAN style="color:#00007F">Next</SPAN> i<br>    <br>    Application.ScreenUpdating = <SPAN style="color:#00007F">True</SPAN><br>    <br>    MsgBox "Completed...", vbInformation<br>        <br><SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">Sub</SPAN><br></FONT>
 
Upvote 0
I have a similar problem except the multiple entries (Job #) are in Column B and the dates are in Column G. Data begins in Row 4.

What needs to be changed in the macro for it to work properly in my case?
 
Upvote 0
I have a similar problem except the multiple entries (Job #) are in Column B and the dates are in Column G. Data begins in Row 4.

What needs to be changed in the macro for it to work properly in my case?
You can have Domenic's code modified if you like.

But you may like to try this one, which is somewhat faster
Code:
Sub test_jimH()
Dim d As Object, a As Variant, i As Long
Dim u As Long, v As String

Set d = CreateObject("scripting.dictionary")
a = Intersect(ActiveSheet.UsedRange, Columns("B:G"))

For i = 4 To UBound(a, 1)
    u = a(i, 6) * 1: v = a(i, 1)
    If Len(v) > 0 Then
        If Not d.exists(v) Then
            d(v) = u
        Else
            If u > d(v) Then d(v) = u
        End If
    End If
Next i

Columns("B:B").ClearContents
Columns("G:G").ClearContents
Range("B4").Resize(d.Count) = Application.Transpose(d.keys)
Range("G4").Resize(d.Count) = Application.Transpose(d.items)

End Sub
 
Upvote 0
Mirabeau. Could you give a quick explanation of what your code is doing (I haven't got a clue)

For example, what does the scripting.dictionary do in the scheme of things ?
 
Upvote 0
Mirabeau. Could you give a quick explanation of what your code is doing (I haven't got a clue)

For example, what does the scripting.dictionary do in the scheme of things ?
Michael,

The code uses the scripting dictionary object to obtain the unique ID's and the latest date associated with them.

You can look up scripting dictionary on Google or Excel or VBA help. It's explained there much better than I could do it.

But basically it's a kind of associative array, associating various items (such as dates) with unique values of something such as a string or a number.

I used it as a useful device to quickly and easily obtain unique values (called keys) in the VBA code for this problem, and then to find the latest date (called items) associated with each key.

I also like scripting dictionary because it tends to stay pretty robust and reliable even with large problems such as 100,000 rows or more.
 
Upvote 0
It seems to work and it is fast, but at the bottom of the worksheet there are a bunch of rows where the Job# and Date Invoiced entries are gone but the entries in the rest of the columns are intact. Also, it deleted the column labels at the top.

For those Job#s with duplicates, I was hoping that it would find the Job# with the most recent date and delete all the others. It should delete all the duplicate rows. It shouldn't touch any Job# that doesn't have a duplicate entry.
 
Upvote 0
It seems to work and it is fast, but at the bottom of the worksheet there are a bunch of rows where the Job# and Date Invoiced entries are gone but the entries in the rest of the columns are intact. Also, it deleted the column labels at the top.

For those Job#s with duplicates, I was hoping that it would find the Job# with the most recent date and delete all the others. It should delete all the duplicate rows. It shouldn't touch any Job# that doesn't have a duplicate entry.
I tried it out, before posting it, on some test data and it seemed to do just as you asked.

Without knowing how my guessed-at test data differs from your own data it's hard to know just how to proceed.

Where is "at the bottom of the worksheet"? Do you mean around rows(1000000) or so (in Excel 2007) or rows(65500) in Excel 2003, etc?

Where is "at the top". Does it mean in rows(4), in rows(1), or somewhere else?

Perhaps best if you posted a before and after example showing just where your data are located, and how you envisage the result should be.
 
Upvote 0
After the macro is finished running, the data associated with a particular Job# does not match the original data. I need all the other column data to stay attached to the original row.
 
Upvote 0
After the macro is finished running, the data associated with a particular Job# does not match the original data. I need all the other column data to stay attached to the original row.

"Perhaps best if you posted a before and after example showing just where your data are located, and how you envisage the result should be."
 
Upvote 0

Forum statistics

Threads
1,213,536
Messages
6,114,215
Members
448,554
Latest member
Gleisner2

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top