VBA remove duplicate not work on large data

BanhdaTr0n

New Member
Joined
Jun 16, 2021
Messages
7
Office Version
  1. 365
  2. 2019
Platform
  1. Windows
Hi,
I have this VBA that find duplicate for example in Column A from Sheet1 and delete entire row that Column A has that value in Sheet2.
It used to work fine with data that has about 200K rows but when it comes up with 1M rows. It only delete the first row and ignore the other.
Can someone take a look and explain what's the problem in my VBA.
VBA Code:
Option Explicit
Sub CleanDupes()
    Dim targetArray, searchArray
    Dim targetRange As Range
    Dim x As Long

    'Update these 4 lines if your target and search ranges change
    Dim TargetSheetName As String: TargetSheetName = "Sheet2"
    Dim TargetSheetColumn As String: TargetSheetColumn = "A"
    Dim SearchSheetName As String: SearchSheetName = "Sheet1"
    Dim SearchSheetColumn As String: SearchSheetColumn = "A"

    'Load target array
    With Sheets(TargetSheetName)
        Set targetRange = .Range(.Range(TargetSheetColumn & "1"), _
                .Range(TargetSheetColumn & Rows.Count).End(xlUp))
        targetArray = targetRange
    End With
    'Load Search Array
    With Sheets(SearchSheetName)
        searchArray = .Range(.Range(SearchSheetColumn & "1"), _
                .Range(SearchSheetColumn & Rows.Count).End(xlUp))
    End With


    Dim dict As Object
    Set dict = CreateObject("Scripting.Dictionary")
    'Populate dictionary from search array
    If IsArray(searchArray) Then
        For x = 1 To UBound(searchArray)
            If Not dict.exists(searchArray(x, 1)) Then
                dict.Add searchArray(x, 1), 1
            End If
        Next
    Else
        If Not dict.exists(searchArray) Then
            dict.Add searchArray, 1
        End If
    End If

    'Delete rows with values found in dictionary
    If IsArray(targetArray) Then
        'Step backwards to avoid deleting the wrong rows.
        For x = UBound(targetArray) To 1 Step -1
            If dict.exists(targetArray(x, 1)) Then
                targetRange.Cells(x).EntireRow.Delete
            End If
        Next
    Else
        If dict.exists(targetArray) Then
            targetRange.EntireRow.Delete
        End If
    End If
End Sub
 

Excel Facts

Select all contiguous cells
Pressing Ctrl+* (asterisk) will select the "current region" - all contiguous cells in all directions.
Welcome to the board, not sure what the cause is, can you upload an example of the file?

Try below but it's not that different to above
VBA Code:
Sub Clean_Dupes_Banhda()

    Dim s As Variant
    Dim t As Variant
    Dim vars As Variant
    Dim x As Long
   
    'Adjust if needed:
    vars = Split("Sheet1|A|Sheet2|A", "|")
   
    With Sheets(vars(0))
        s = .Cells(1, vars(1)).Resize(.Cells(.Rows.Count, vars(1)).End(xlUp).Row).Value
    End With
   
    With Sheets(vars(2))
        t = .Cells(1, vars(3)).Resize(.Cells(.Rows.Count, vars(3)).End(xlUp).Row).Value
    End With
   
    With CreateObject("Scripting.Dictionary")
        For x = LBound(s, 1) To UBound(s, 1)
            If Not .Exists(s(x, 1)) Then .Add Trim(s(x, 1)), 1
        Next x
        Erase s
       
        For x = LBound(t, 1) To UBound(t, 1)
            If .Exists(Trim(t(x, 1))) Then t(x, 1) = ""
        Next x
    End With
   
    Application.ScreenUpdating = False
   
    With Sheets(vars(2)).Cells(1, vars(3)).Resize(UBound(t, 1))
        .Value = t
        .SpecialCells(xlCellTypeBlanks).EntireRow.Delete
    End With
   
    Application.ScreenUpdating = True
   
    Erase vars: Erase t
End Sub
 
Upvote 0
It used to work fine with data that has about 200K rows but when it comes up with 1M rows. It only delete the first row and ignore the other.
By 1M rows do you mean you have data in the very last row of the sheet (i.e row 1,048,576)?
If yes, then the part with ... & Rows.Count).End(xlUp)) will not work correctly.

How many duplicate are there? more than 10K?
If yes, then the code could be slow in this part:
VBA Code:
        For x = UBound(targetArray) To 1 Step -1
            If dict.exists(targetArray(x, 1)) Then
                targetRange.Cells(x).EntireRow.Delete
            End If
        Next
 
Upvote 0
Welcome to the board, not sure what the cause is, can you upload an example of the file?

Try below but it's not that different to above
VBA Code:
Sub Clean_Dupes_Banhda()

    Dim s As Variant
    Dim t As Variant
    Dim vars As Variant
    Dim x As Long
  
    'Adjust if needed:
    vars = Split("Sheet1|A|Sheet2|A", "|")
  
    With Sheets(vars(0))
        s = .Cells(1, vars(1)).Resize(.Cells(.Rows.Count, vars(1)).End(xlUp).Row).Value
    End With
  
    With Sheets(vars(2))
        t = .Cells(1, vars(3)).Resize(.Cells(.Rows.Count, vars(3)).End(xlUp).Row).Value
    End With
  
    With CreateObject("Scripting.Dictionary")
        For x = LBound(s, 1) To UBound(s, 1)
            If Not .Exists(s(x, 1)) Then .Add Trim(s(x, 1)), 1
        Next x
        Erase s
      
        For x = LBound(t, 1) To UBound(t, 1)
            If .Exists(Trim(t(x, 1))) Then t(x, 1) = ""
        Next x
    End With
  
    Application.ScreenUpdating = False
  
    With Sheets(vars(2)).Cells(1, vars(3)).Resize(UBound(t, 1))
        .Value = t
        .SpecialCells(xlCellTypeBlanks).EntireRow.Delete
    End With
  
    Application.ScreenUpdating = True
  
    Erase vars: Erase t
End Sub
Sorry for late reply. But I have been outside all day and did not test your VBA yet. And my file can't be public so I will find some demo as soon as I can get back.
By 1M rows do you mean you have data in the very last row of the sheet (i.e row 1,048,576)?
If yes, then the part with ... & Rows.Count).End(xlUp)) will not work correctly.

How many duplicate are there? more than 10K?
If yes, then the code could be slow in this part:
VBA Code:
        For x = UBound(targetArray) To 1 Step -1
            If dict.exists(targetArray(x, 1)) Then
                targetRange.Cells(x).EntireRow.Delete
            End If
        Next
Yes. My sheet has data to the very last row. My duplicate is quite a lot. About 30K+ and probaly more in future. Will try your solution and update the situation as soon as I can. Thank you.
 
Upvote 0
Yes. My sheet has data to the very last row. My duplicate is quite a lot. About 30K+ and probaly more in future.
Can you sort both columns in ascending order?
If so, I think I can write a code to remove the duplicates fast.
 
Upvote 0
Can you sort both columns in ascending order?
If so, I think I can write a code to remove the duplicates fast.
It's possible. Some of my files don't need to be in any order. Also, I was just found out that my VBA only remove the duplicate header when data reach to the last row.
 
Upvote 0
It's possible. Some of my files don't need to be in any order.
Try this:

VBA Code:
Sub a1173925a()
'https://www.mrexcel.com/board/threads/vba-remove-duplicate-not-work-on-large-data.1173925/
Dim i As Long, j As Long, n As Long
Dim sh1 As String, sh2 As String
Dim va, vb
Dim d As Object

t = Timer
Application.ScreenUpdating = True
sh1 = "Sheet1"
sh2 = "Sheet2"
With Sheets(sh2)
    If .Cells(.Rows.Count, "A") <> "" Then
        vb = .Range("A1:A" & .Rows.Count)
        Else
        vb = .Range("A1", .Cells(.Rows.Count, "A").End(xlUp))
    End If
End With

With Sheets(sh1)
    va = .Range("A1", .Cells(.Rows.Count, "A").End(xlUp))
End With


Set d = CreateObject("scripting.dictionary")
d.CompareMode = vbTextCompare
    
    For Each x In va
        d(x) = Empty
    Next

For i = 1 To UBound(vb, 1)
    If d.Exists(vb(i, 1)) Then
            vb(i, 1) = ""
    End If
Next

With Sheets(sh2)
.Range("A1").Resize(UBound(vb, 1), 1) = vb
.Range("A:A").Sort Key1:=.Range("A1"), Order1:=xlAscending, Header:=xlYes
End With

Application.ScreenUpdating = False
Debug.Print "It's done in: " & Timer - t & " seconds"


End Sub

the code will sort data in sheet2 col A in this part (to remove the empty row):
.Range("A:A").Sort Key1:=.Range("A1"), Order1:=xlAscending, Header:=xlYes

The search is case insensitive, if in fact you want the search to be case sensitive & the above code isn't fast enough then we can amend the code to make it faster.
 
Upvote 0
It seems that the above code does not delete entire row but only duplicates in column A.
But wow! Your code is considerably faster than my current one.
 
Upvote 0
Sorry, I forgot that part.
You don't have blank cells in sheet2 col A inside your data set, right?
If that's not the case then you need to sort the data by col A by before running the code. Because in the end the code will delete entire row where col A is blank.
Try this one:
VBA Code:
Sub a1173925b()
'https://www.mrexcel.com/board/threads/vba-remove-duplicate-not-work-on-large-data.1173925/
Dim i As Long, j As Long, n As Long
Dim sh1 As String, sh2 As String
Dim va, vb
Dim d As Object

t = Timer
Application.ScreenUpdating = True
sh1 = "Sheet1"
sh2 = "Sheet2"
With Sheets(sh2)
    If .Cells(.Rows.Count, "A") <> "" Then
        vb = .Range("A1:A" & .Rows.Count)
        Else
        vb = .Range("A1", .Cells(.Rows.Count, "A").End(xlUp))
    End If
End With

With Sheets(sh1)
    va = .Range("A1", .Cells(.Rows.Count, "A").End(xlUp))
End With


Set d = CreateObject("scripting.dictionary")
d.CompareMode = vbTextCompare
    
    For Each x In va
        d(x) = Empty
    Next

For i = 1 To UBound(vb, 1)
    If d.Exists(vb(i, 1)) Then
            vb(i, 1) = ""
    End If
Next

With Sheets(sh2)
.Range("A1").Resize(UBound(vb, 1), 1) = vb
.UsedRange.Sort Key1:=.Range("A1"), Order1:=xlAscending, Header:=xlYes
n = .Range("A" & .Rows.Count).End(xlUp).Row + 1
.Range("A" & n & ":A" & .Rows.Count).EntireRow.Delete
End With

Application.ScreenUpdating = False
Debug.Print "It's done in: " & Timer - t & " seconds"


End Sub
 
Upvote 0
Solution
Yeah. None of my col A is blank.
And it works perfecly on my 3 columns and maximum row in 10 seconds.
However, I dont know why it also remove the header in A1. But it's not the matter since your code is 6 times faster than my last one.
 
Upvote 0

Forum statistics

Threads
1,214,830
Messages
6,121,834
Members
449,051
Latest member
excelquestion515

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top