VBA Question: How to find duplicate rows, and sum values after finding duplicates?

rolamento

New Member
Joined
Apr 20, 2023
Messages
8
Office Version
  1. 2016
Platform
  1. Windows
Hello friends, I'm new to the forum but always try to get references for code here.
I'm currently trying to develop a VBA code to loop through a dataset, comparing the values from some rows, and if I find that all of them are equal, I have to sum values from some last rows to the first match, and delete the duplicate.
I have done it, but it's extremely slow, and considering the size of the dataset, it could that up to 7 hours to loop. I have seem some examples using COUNTIF or some kind of Find value. What would you guys recommend?
I'm posting my sample code below:


VBA Code:
Dim n As Long, j As Long, k As Long, count As Long, LastRow As Long, x As Long, y As Long
    Dim Arryk(), Arrnum() As Variant
    LastRow = Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
    
    
    Arryk = Array(5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
    Arrnum = Array(33, 34, 35, 37)
    
    
    For x = 2 To LastRow
        LastRow = Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
        For y = LastRow To x Step -1
            If (Not x = y) And (Not IsEmpty(Cells(y, Arryk(0)))) And (Not IsEmpty(Cells(x, Arryk(0)))) Then
                
                count = 0
                For k = 0 To 27
                    If Cells(x, Arryk(k)) = Cells(y, Arryk(k)) Then
                        count = count + 1
                End If
                Next
                
                If count = 28 Then
                    If IsEmpty(Cells(x, 2)) Then
                        Cells(x, 2).Value = Cells(x, 1).Value & ", " & Cells(y, 1).Value
                    Else
                        Cells(x, 2).Value = Cells(x, 2).Value & ", " & Cells(y, 1).Value
                    End If
                    For j = 0 To 3
                        Cells(x, Arrnum(j)).Value = Round(WorksheetFunction.Sum(Cells(x, Arrnum(j)).Value, Cells(y, Arrnum(j)).Value), 2)
                    Next
                    Rows(y).Delete Shift:=xlUp
                End If
            End If
        Next y
    Next x
 
I wasn't happy with my first effort - it was taking too long to run. Please ignore post #8 and try this code instead.

VBA Code:
Option Explicit
Sub rolamento_V2()
    Application.ScreenUpdating = False
    Dim t As Double: t = Timer
    Dim ws As Worksheet
    Set ws = Worksheets("Sheet1")   '<~~ Change to actual sheet name
   
    'Get the consolidated IDs
    Dim Lrow As Long
    Lrow = Cells.Find("*", , xlFormulas, , xlByRows, xlPrevious).Row
    With ws.Range("B2:B" & Lrow)
        .Formula2R1C1 = "=Concat(RC5:RC32&"" "")"
        .Value2 = .Value2
    End With
    ws.Range("A1").CurrentRegion.Sort Key1:=ws.Range("B1"), order1:=xlAscending, Header:=xlYes
   
    Dim a, b, temp, i As Long, j As Long, n As Long, s As String, ar
    a = ws.Range("A2:B" & Lrow)
    ReDim b(1 To UBound(a, 1), 1 To 1)
   
    For i = 2 To UBound(a, 1)
        If i = 2 Then b(i - 1, 1) = a(i - 1, 1)
        If a(i, 2) = a(i - 1, 2) And a(i, 1) = a(i - 1, 1) Then
            b(i, 1) = b(i - 1, 1)
        ElseIf a(i, 2) = a(i - 1, 2) And a(i, 1) <> a(i - 1, 1) Then
            b(i, 1) = b(i - 1, 1) & ", " & a(i, 1)
        Else
            b(i, 1) = a(i, 1)
        End If
    Next i
       
    For i = UBound(a, 1) To 2 Step -1
        If a(i, 2) = a(i - 1, 2) Then
            b(i - 1, 1) = b(i, 1)
        Else
            b(i - 1, 1) = b(i - 1, 1)
        End If
    Next i
    ws.Range("B2").Resize(UBound(b, 1)).Value = b
       
    'Do the Dictionary part
    Dim rng As Range, R As Range, txt As String
    Set rng = ws.Range(Range("B1"), Range("B" & Rows.Count).End(xlUp))
    ar = Sheet1.Range("A1").CurrentRegion.Offset(, 1).Resize(Columns.Count - 1)
   
    With CreateObject("scripting.dictionary")
        For Each R In rng
            txt = R.Value
            If Not .exists(txt) Then
                n = n + 1
                .Add txt, n
                For j = 1 To UBound(ar, 2)
                    ar(n, j) = R.Offset(, j - 1)
                Next j
            Else
                For i = 32 To 34
                    ar(.Item(txt), i) = ar(.Item(txt), i) + R.Offset(, i - 1)
                Next i
                ar(.Item(txt), 36) = ar(.Item(txt), 36) + R.Offset(, 35)
            End If
        Next R
    End With
   
    For i = 1 To UBound(ar)
        For j = 2 To UBound(ar, 2) - 1
            ar(i, j) = ar(i, j)
        Next j
    Next i
 
    With ws
        .UsedRange.ClearContents
        .Range("A1").Resize(n, UBound(ar, 2)) = ar
    End With
    Application.ScreenUpdating = True
    MsgBox "Completed in " & Timer - t & " Seconds"
End Sub

Hey kevin, thanks for the responses! First of, my dataset has 65000+ rows. The table I posted earlier is a good representation of the data I'm dealing with. I tried running your code, but it's not working and it gives me an error for the first concatenate part. However, I shouldn't touch the concatenated IDs early on. Lemme try to re-explain what the script must do.
1) I must loop though all rows, and compare all rows agains all other rows, so that I can find duplicates.
2) Compare the values from columns 5 to 32 for each row. If they are all equal, I consider them a duplicate row (this means I have to ignore columns 1 to 4, and columns 33+)
3) Let's say it found a duplicate, for example row 6 and row 18. It then does:
3a) Add values from columns 33, 34, 35 and 37 (which are numeric) to row 6.
3b) At column 2, concatenate the IDs (that are sequential numbers present at column 1). In this case, it would write "6, 18".
3c) Delete row 18, using something like (Rows(18).EntireRow.Delete
4) Continue looking for more duplicates. If more are found, I have to repeat steps 3.
If you run the script on post #1 with the dataset on post #4 you can see that it sum values, deletes duplicates and we are left with 2 rows only. We don't mess with the ID's and adding values until we find duplicates.
 
Upvote 0

Excel Facts

Copy a format multiple times
Select a formatted range. Double-click the Format Painter (left side of Home tab). You can paste formatting multiple times. Esc to stop
Give this a try. I collaborated with @kevin9999 on this and it uses some of your original code to test what columns determine a duplicate and what columns need to be summed.

VBA Code:
Sub UniqueSum_v02()

    Dim shtData As Worksheet
    Dim srcRngWhole As Range
    Dim srcArrWhole As Variant, outArr As Variant
    Dim LastRow As Long, LastCol As Long
    Dim i As Long, iOut As Long, j As Long, iCols As Long
    Dim dict As Object, dictKey As String
    Dim Arrnum() As Variant
    Dim Arryk() As Variant
       
    Dim t As Double: t = Timer
    Application.ScreenUpdating = False
    
    Arryk = Array(5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
    Arrnum = Array(33, 34, 35, 37)
    
    Set shtData = ActiveSheet
    With shtData
        LastRow = .Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
        LastCol = .Cells.Find("*", SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column
        Set srcRngWhole = .Range(.Cells(2, "A"), .Cells(LastRow, LastCol))
        srcArrWhole = srcRngWhole
    End With
    
    ReDim outArr(1 To UBound(srcArrWhole), 1 To UBound(srcArrWhole, 2))
    
    Set dict = CreateObject("Scripting.dictionary")
    
    ' Load details range into Dictionary
    For i = 1 To UBound(srcArrWhole)
        dictKey = ""
        For j = 0 To UBound(Arryk)
            dictKey = dictKey + srcArrWhole(i, Arryk(j))
        Next j
        
        If Not dict.exists(dictKey) Then
            iOut = iOut + 1
            dict(dictKey) = iOut
            For iCols = 1 To UBound(srcArrWhole, 2)
                outArr(dict(dictKey), iCols) = srcArrWhole(i, iCols)
                outArr(dict(dictKey), 2) = srcArrWhole(i, 1)
            Next iCols
            
        Else
            For iCols = 0 To UBound(Arrnum)
                outArr(dict(dictKey), Arrnum(iCols)) = outArr(dict(dictKey), Arrnum(iCols)) + srcArrWhole(i, Arrnum(iCols))
            Next iCols
            outArr(dict(dictKey), 2) = outArr(dict(dictKey), 2) & ", " & srcArrWhole(i, 1)
        End If
        
    Next i
    
    srcRngWhole.ClearContents
    srcRngWhole.Resize(iOut, UBound(outArr, 2)) = outArr
    
    Application.ScreenUpdating = True
    MsgBox "Completed in " & Timer - t & " Seconds"

End Sub
 
Upvote 2
Solution
Give this a try. I collaborated with @kevin9999 on this and it uses some of your original code to test what columns determine a duplicate and what columns need to be summed.

VBA Code:
Sub UniqueSum_v02()

    Dim shtData As Worksheet
    Dim srcRngWhole As Range
    Dim srcArrWhole As Variant, outArr As Variant
    Dim LastRow As Long, LastCol As Long
    Dim i As Long, iOut As Long, j As Long, iCols As Long
    Dim dict As Object, dictKey As String
    Dim Arrnum() As Variant
    Dim Arryk() As Variant
      
    Dim t As Double: t = Timer
    Application.ScreenUpdating = False
   
    Arryk = Array(5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
    Arrnum = Array(33, 34, 35, 37)
   
    Set shtData = ActiveSheet
    With shtData
        LastRow = .Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
        LastCol = .Cells.Find("*", SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column
        Set srcRngWhole = .Range(.Cells(2, "A"), .Cells(LastRow, LastCol))
        srcArrWhole = srcRngWhole
    End With
   
    ReDim outArr(1 To UBound(srcArrWhole), 1 To UBound(srcArrWhole, 2))
   
    Set dict = CreateObject("Scripting.dictionary")
   
    ' Load details range into Dictionary
    For i = 1 To UBound(srcArrWhole)
        dictKey = ""
        For j = 0 To UBound(Arryk)
            dictKey = dictKey + srcArrWhole(i, Arryk(j))
        Next j
       
        If Not dict.exists(dictKey) Then
            iOut = iOut + 1
            dict(dictKey) = iOut
            For iCols = 1 To UBound(srcArrWhole, 2)
                outArr(dict(dictKey), iCols) = srcArrWhole(i, iCols)
                outArr(dict(dictKey), 2) = srcArrWhole(i, 1)
            Next iCols
           
        Else
            For iCols = 0 To UBound(Arrnum)
                outArr(dict(dictKey), Arrnum(iCols)) = outArr(dict(dictKey), Arrnum(iCols)) + srcArrWhole(i, Arrnum(iCols))
            Next iCols
            outArr(dict(dictKey), 2) = outArr(dict(dictKey), 2) & ", " & srcArrWhole(i, 1)
        End If
       
    Next i
   
    srcRngWhole.ClearContents
    srcRngWhole.Resize(iOut, UBound(outArr, 2)) = outArr
   
    Application.ScreenUpdating = True
    MsgBox "Completed in " & Timer - t & " Seconds"

End Sub

Hey Alex, that worked like a charm!!! I ran it on the entire dataset, and it correctly found all duplicates in 6.5 seconds, down from 7+ hours!!! That's nuts. I guess the only thing that it does a little too well is the consolidated IDs. It's supposed to stay empty if the row in question doesn't have a duplicate, but I can fix that.
I would also like to thank kevin for the responses and for the code. You guys are awesome.
In any case, I'm kinda new to VBA, I started coding with it like 4 months ago, so there's a lot of learning I have to do. I don't quite understand how those dictionary comparisons work on the code, do you guys have any reference, guide or tutorial that could help me learn more about it? In my original code you guys can see that I was comparing every value individually, and it was very poor and inneficient.
 
Upvote 0
Paul Kelly of Macro Mastery also has a 4 part series on the dictionary on youtube with the 1st part being 7 mins long.

 
Upvote 0

Forum statistics

Threads
1,213,567
Messages
6,114,342
Members
448,570
Latest member
rik81h

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top