Remove duplicates based on multiple criteria (ID and Date)

johnnytominaga

New Member
Joined
Apr 27, 2018
Messages
19
Hey guys!

I'm working on a project that requires merging data from multiple workbooks into a single "master" workbook.
All files have the same number of columns and have been merged properly. That means all data is in the "master" workbook.

I couldn't manage to remove the duplicates though. They need to be removed based on multiple criteria and so that cells with value are copied even if the rest of the row is removed. The criteria that defines which duplicate is to be kept is:
a) ID No. (numerical)
b) Update Date (the higher is kept)

Dataset sample:

IDNameLocationPhoneLast updated on
1SaraMiami9999999919/07/2018
3Brian8888888815/03/2015
4JoshSeattle03/02/2015
7PeterNew York30/09/2016
7Peter6666666601/10/2016
3BrianLos Angeles8888888820/06/2017
7Peter20/01/2017
9Nicole5555555518/11/2016
47777777704/01/2017

<tbody>
</tbody>

Desired result:

IDNameLocationPhoneLast updated on
1SaraMiami9999999919/07/2018
3BrianLos Angeles8888888820/06/2017
4JoshSeattle7777777704/01/2017
7PeterNew York6666666620/01/2017
9Nicole5555555518/11/2016

<tbody>
</tbody>

I tried by using the Dictionary object, but couldn't get anywhere close to what I'm looking to achieve. Also, the script is going to be used in multiple PCs, so if I could avoid needing to activate the Microsoft Scripting Runtime everytime, that would be appreciated.

I'm using Excel 2016 on Windows.

Any ideas on how I could accomplish that?

Thanks a lot in advance.


Johnny
 

Excel Facts

Why does 9 mean SUM in SUBTOTAL?
It is because Sum is the 9th alphabetically in Average, Count, CountA, Max, Min, Product, StDev.S, StDev.P, Sum, VAR.S, VAR.P.
Assuming data in Sheet1, columns A:E, headers in row 1, maybe this

Results in columns G:K
Code:
Sub aTest()
    Dim dic As Object, vData As Variant
    Dim i As Long, arr(1 To 4) As Variant, vKey As Variant
    Dim vAux As Variant, vResult As Variant
    
    Set dic = CreateObject("Scripting.dictionary")
    With Sheets("Sheet1")
        vData = .Range("A2:E" & .Cells(.Rows.Count, "A").End(xlUp).Row)
        For i = LBound(vData, 1) To UBound(vData, 1)
            If dic.exists(vData(i, 1)) Then
                If vData(i, 5) > dic(vData(i, 1))(4) Then
                    vAux = dic(vData(i, 1))
                    vAux(4) = vData(i, 5)
                    If vData(i, 2) <> "" Then vAux(1) = vData(i, 2)
                    If vData(i, 3) <> "" Then vAux(2) = vData(i, 3)
                    If vData(i, 4) <> "" Then vAux(3) = vData(i, 4)
                    dic(vData(i, 1)) = vAux
                End If
            Else
                arr(1) = vData(i, 2)
                arr(2) = vData(i, 3)
                arr(3) = vData(i, 4)
                arr(4) = vData(i, 5)
                dic(vData(i, 1)) = arr
            End If
        Next i
        i = 0
        .Range("G1:K1").Value = .Range("A1:E1").Value
        vResult = .Range("G2").Resize(dic.Count, 5)
        For Each vKey In dic.keys
            i = i + 1
            vResult(i, 1) = vKey
            vResult(i, 2) = dic(vKey)(1)
            vResult(i, 3) = dic(vKey)(2)
            vResult(i, 4) = dic(vKey)(3)
            vResult(i, 5) = dic(vKey)(4)
        Next vKey
        .Range("G2").Resize(dic.Count, 5) = vResult
        .Columns("H:K").AutoFit
    End With
End Sub

Hope this helps

M.
 
Last edited:
Upvote 0
Hey guys!

I'm working on a project that requires merging data from multiple workbooks into a single "master" workbook.
All files have the same number of columns and have been merged properly. That means all data is in the "master" workbook.

I couldn't manage to remove the duplicates though. They need to be removed based on multiple criteria and so that cells with value are copied even if the rest of the row is removed. The criteria that defines which duplicate is to be kept is:
a) ID No. (numerical)
b) Update Date (the higher is kept)

Dataset sample:

IDNameLocationPhoneLast updated on
1SaraMiami9999999919/07/2018
3Brian8888888815/03/2015
4JoshSeattle03/02/2015
7PeterNew York30/09/2016
7Peter6666666601/10/2016
3BrianLos Angeles8888888820/06/2017
7Peter20/01/2017
9Nicole5555555518/11/2016
47777777704/01/2017

<tbody>
</tbody>

Desired result:

IDNameLocationPhoneLast updated on
1SaraMiami9999999919/07/2018
3BrianLos Angeles8888888820/06/2017
4JoshSeattle7777777704/01/2017
7PeterNew York6666666620/01/2017
9Nicole5555555518/11/2016

<tbody>
</tbody>

I tried by using the Dictionary object, but couldn't get anywhere close to what I'm looking to achieve. Also, the script is going to be used in multiple PCs, so if I could avoid needing to activate the Microsoft Scripting Runtime everytime, that would be appreciated.

I'm using Excel 2016 on Windows.

Any ideas on how I could accomplish that?

Thanks a lot in advance.


Johnny

Oh, I see now - that an MVP had answered your query....better try it.
thanks
 
Last edited:
Upvote 0
Thanks Marcelo! You're a genius.
It worked perfectly (you were right on the assumption around the headers btw).

Couple of questions:

- How could I make it work with a larger dataset though? - Meaning if I have 30, 40 columns to be merged - I tried it and I'm getting a VBA Runtime error '6' error.

- Also, what need to be changed if the ID (key) isn't in the first column?

Thanks again.


Assuming data in Sheet1, columns A:E, headers in row 1, maybe this

Results in columns G:K
Code:
Sub aTest()
    Dim dic As Object, vData As Variant
    Dim i As Long, arr(1 To 4) As Variant, vKey As Variant
    Dim vAux As Variant, vResult As Variant
    
    Set dic = CreateObject("Scripting.dictionary")
    With Sheets("Sheet1")
        vData = .Range("A2:E" & .Cells(.Rows.Count, "A").End(xlUp).Row)
        For i = LBound(vData, 1) To UBound(vData, 1)
            If dic.exists(vData(i, 1)) Then
                If vData(i, 5) > dic(vData(i, 1))(4) Then
                    vAux = dic(vData(i, 1))
                    vAux(4) = vData(i, 5)
                    If vData(i, 2) <> "" Then vAux(1) = vData(i, 2)
                    If vData(i, 3) <> "" Then vAux(2) = vData(i, 3)
                    If vData(i, 4) <> "" Then vAux(3) = vData(i, 4)
                    dic(vData(i, 1)) = vAux
                End If
            Else
                arr(1) = vData(i, 2)
                arr(2) = vData(i, 3)
                arr(3) = vData(i, 4)
                arr(4) = vData(i, 5)
                dic(vData(i, 1)) = arr
            End If
        Next i
        i = 0
        .Range("G1:K1").Value = .Range("A1:E1").Value
        vResult = .Range("G2").Resize(dic.Count, 5)
        For Each vKey In dic.keys
            i = i + 1
            vResult(i, 1) = vKey
            vResult(i, 2) = dic(vKey)(1)
            vResult(i, 3) = dic(vKey)(2)
            vResult(i, 4) = dic(vKey)(3)
            vResult(i, 5) = dic(vKey)(4)
        Next vKey
        .Range("G2").Resize(dic.Count, 5) = vResult
        .Columns("H:K").AutoFit
    End With
End Sub

Hope this helps

M.
 
Upvote 0
Try to understand the code above and adapt it for more columns.
The ID column is used as the Keys of the dictionary.
The values of other columns are stored in an array that is placed as the Item correspondent to each key.

M.
 
Upvote 0
Right. I've made a few adjustments so that I can have, for example, 27 (A to AA) columns with IDs being in column B.

This is how it now looks like:

Sub aTest()

Dim dic As Object


Dim vData As Variant
Dim i As Long
Dim arr(1 To 25) As Variant
Dim vKey As Variant
Dim vAux As Variant
Dim vResult As Variant

Set dic = CreateObject("Scripting.dictionary")

With Sheets("Sheet1")
vData = .Range("A2:Z" & .Cells(.Rows.Count, "A").End(xlUp).Row)
For i = LBound(vData, 1) To UBound(vData, 1)
If dic.exists(vData(i, 1)) Then
If vData(i, 26) > dic(vData(i, 1))(25) Then
vAux = dic(vData(i, 1))
vAux(26) = vData(i, 27)
If vData(i, 2) <> "" Then vAux(1) = vData(i, 2)
If vData(i, 3) <> "" Then vAux(2) = vData(i, 3)
If vData(i, 4) <> "" Then vAux(3) = vData(i, 4)
If vData(i, 5) <> "" Then vAux(4) = vData(i, 5)
If vData(i, 6) <> "" Then vAux(5) = vData(i, 6)
If vData(i, 7) <> "" Then vAux(6) = vData(i, 7)
If vData(i, 8) <> "" Then vAux(7) = vData(i, 8)
If vData(i, 9) <> "" Then vAux(8) = vData(i, 9)
If vData(i, 10) <> "" Then vAux(9) = vData(i, 10)
If vData(i, 11) <> "" Then vAux(10) = vData(i, 11)
If vData(i, 12) <> "" Then vAux(11) = vData(i, 12)
If vData(i, 13) <> "" Then vAux(12) = vData(i, 13)
If vData(i, 14) <> "" Then vAux(13) = vData(i, 14)
If vData(i, 15) <> "" Then vAux(14) = vData(i, 15)
If vData(i, 16) <> "" Then vAux(15) = vData(i, 16)
If vData(i, 17) <> "" Then vAux(16) = vData(i, 17)
If vData(i, 18) <> "" Then vAux(17) = vData(i, 18)
If vData(i, 19) <> "" Then vAux(18) = vData(i, 19)
If vData(i, 20) <> "" Then vAux(19) = vData(i, 20)
If vData(i, 21) <> "" Then vAux(20) = vData(i, 21)
If vData(i, 22) <> "" Then vAux(21) = vData(i, 22)
If vData(i, 23) <> "" Then vAux(22) = vData(i, 23)
If vData(i, 24) <> "" Then vAux(23) = vData(i, 24)
If vData(i, 25) <> "" Then vAux(24) = vData(i, 25)
If vData(i, 26) <> "" Then vAux(25) = vData(i, 26)
dic(vData(i, 1)) = vAux
End If
Else
arr(1) = vData(i, 2)
arr(2) = vData(i, 3)
arr(3) = vData(i, 4)
arr(4) = vData(i, 5)
arr(5) = vData(i, 6)
arr(6) = vData(i, 7)
arr(7) = vData(i, 8)
arr(8) = vData(i, 9)
arr(9) = vData(i, 10)
arr(10) = vData(i, 11)
arr(11) = vData(i, 12)
arr(12) = vData(i, 13)
arr(13) = vData(i, 14)
arr(14) = vData(i, 15)
arr(15) = vData(i, 16)
arr(16) = vData(i, 17)
arr(17) = vData(i, 18)
arr(18) = vData(i, 19)
arr(19) = vData(i, 20)
arr(20) = vData(i, 21)
arr(21) = vData(i, 22)
arr(22) = vData(i, 23)
arr(23) = vData(i, 24)
arr(24) = vData(i, 25)
arr(25) = vData(i, 26)
dic(vData(i, 1)) = arr
End If
Next i
i = 0
.Range("AG1:BF1").Value = .Range("A1:Z1").Value
vResult = .Range("AG2").Resize(dic.Count, 26)
For Each vKey In dic.keys
i = i + 1
vResult(i, 1) = vKey
vResult(i, 2) = dic(vKey)(1)
vResult(i, 3) = dic(vKey)(2)
vResult(i, 4) = dic(vKey)(3)
vResult(i, 5) = dic(vKey)(4)
vResult(i, 6) = dic(vKey)(5)
vResult(i, 7) = dic(vKey)(6)
vResult(i, 8) = dic(vKey)(7)
vResult(i, 9) = dic(vKey)(8)
vResult(i, 10) = dic(vKey)(9)
vResult(i, 11) = dic(vKey)(10)
vResult(i, 12) = dic(vKey)(11)
vResult(i, 13) = dic(vKey)(12)
vResult(i, 14) = dic(vKey)(13)
vResult(i, 15) = dic(vKey)(14)
vResult(i, 16) = dic(vKey)(15)
vResult(i, 17) = dic(vKey)(16)
vResult(i, 18) = dic(vKey)(17)
vResult(i, 19) = dic(vKey)(18)
vResult(i, 20) = dic(vKey)(19)
vResult(i, 21) = dic(vKey)(20)
vResult(i, 22) = dic(vKey)(21)
vResult(i, 23) = dic(vKey)(22)
vResult(i, 24) = dic(vKey)(23)
vResult(i, 25) = dic(vKey)(24)
vResult(i, 26) = dic(vKey)(25)

Next vKey
.Range("AG2").Resize(dic.Count, 26) = vResult
' .Columns("H:K").AutoFit
End With
End Sub


The problem I'm facing is that I'm getting an Run-time Error 6 'Overflow', when run the line below:

vData = .Range("B2:I" & .Cells(.Rows.Count, "B").End(xlUp).Row)

I can't seem to get passed that to test the rest of the code.
Do you have a suggestions around how to solve that error?

Thanks a lot.



Try to understand the code above and adapt it for more columns.
The ID column is used as the Keys of the dictionary.
The values of other columns are stored in an array that is placed as the Item correspondent to each key.

M.
 
Upvote 0
Some adjustments

1. You have 27 columns, A from AA, being one column (B) reserved for the key then you need an array with 26 elements to store the values of the other columns
Dim arr(1 to 26)

2, vData should comprise all the columns so
vData = .Range("A2:AA" & .Cells(.Rows.Count, "A").End(xlUp).Row)

3. The key is column B so
If dic.exists(vData(i, 2)) Then

4. Store the other values in arr
arr(1) = vdata(i,1)
'skip ID column
arr(2) = vdata(i,3)
arr(3) = vdata(i,4)
....
arr(26)= vdata(i,27)

5. vResult must have 27 columns so
vResult = .Range("AG2").Resize(dic.Count, 27)

Hope this helps

M.
 
Upvote 0
Edit

2, vData should comprise all the columns so
vData = .Range("A2:AA" & .Cells(.Rows.Count, "B").End(xlUp).Row)

M.
 
Upvote 0
Another adjustment

If vData(i, 27) > dic(vData(i, 1))(26) Then
vAux = dic(vData(i, 2))
vAux(26) = vData(i, 27)
If vData(i, 1) <> "" Then vAux(1) = vData(i, 1)
'skip ID column
If vData(i, 3) <> "" Then vAux(2) = vData(i, 3)
If vData(i, 4) <> "" Then vAux(3) = vData(i, 4)
...
If vData(i, 26) <> "" Then vAux(25) = vData(i, 26)

M.
 
Last edited:
Upvote 0
It worked. Thanks for going through it line by line. Truly appreciate it.
I'm running a few tests and depending on the number of rows or columns, I'm still getting a Error 6 Overflow message (line: vData = .Range("A2:AA" & .Cells(.Rows.Count, "B").End(xlUp).Row).

Is there any way or other method to avoid that?

Thanks.



Edit

2, vData should comprise all the columns so
vData = .Range("A2:AA" & .Cells(.Rows.Count, "B").End(xlUp).Row)

M.
 
Upvote 0

Forum statistics

Threads
1,215,549
Messages
6,125,473
Members
449,233
Latest member
Deardevil

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top