Fast Duplicate Check For Large Data Sets Using VBA?

CaliKidd

Board Regular
Joined
Feb 16, 2011
Messages
173
Hey Experts,

I am using the following logic to check for and highlight duplicate entries. The way the logic works, it starts off slow and speeds up as it goes because the number of cells it has to check for duplicates becomes less and less. It works OK as long as the data set is relatively small (hundreds to low thousands), but I decided to load-test it on 1,000,000 cells and it crawled! It took about 30 minutes to just get through the first 15 duplicate checks (and my PC has a quad-core processor overclocked to 4Ghz and 4Gb RAM). At this rate, it would probably take months to finish... :eek:

Here's the code I borrowed from another online site:
Code:
Sub DupsGreen()
[INDENT]Application.ScreenUpdating = False
Rng = Selection.Rows.Count
For i = Rng To 1 Step -1
[INDENT]myCheck = ActiveCell
ActiveCell.Offset(1, 0).Select
For j = 1 To i
[INDENT]If ActiveCell = myCheck Then
[INDENT]Selection.Font.Bold = True
Selection.Font.ColorIndex = 4
[/INDENT]End If
ActiveCell.Offset(1, 0).Select
[/INDENT]Next j
ActiveCell.Offset(-i, 0).Select
[/INDENT]Next i
Application.ScreenUpdating = True
[/INDENT]End Sub
Does anyone know of a slicker, faster way?
 

Excel Facts

Select a hidden cell
Somehide hide payroll data in column G? Press F5. Type G1. Enter. Look in formula bar while you arrow down through G.
Hi

There's an advanced data filter which filters a column, removing any duplicate entries, pasting the results into a new column. Will this suffice?

If so - I'll post up instructions of how to do this


If you like my answer, please help me grow my own community and visit my website and forum www.excel-lence.info

One day I hope to be as large as this forum, and work in harmony with it.

Cheers

EXCEL-lence
 
Upvote 0
advancedfilter.png
 
Last edited:
Upvote 0
Hi,

Thanks for your help. I don't want the code to remove the duplicates (or extract unique records elsewhere), just find and highlight them in place. I do appreciate your help, however.
 
Upvote 0
One immediate way to speed the code is to not use select
Code:
Sub DupsGreen()
Application.ScreenUpdating = False
Rng = Selection.Rows.Count
With Range("A:A")
For i = 1 To .Cells(.Rows.Count,1).End(xlup).Row - 1
    myCheck = CStr(.Cells(i, 1).Value)
    For j = i + 1 To .Cells(.Rows.Count,1).End(xlup).Row
        With .Cells(j, 1)
            If CStr(.Value) = myCheck then
                With .Font
                    .Bold = True
                    .ColorIndex = 4
                End With
            End If
        End With
    Next j
Next i
Application.ScreenUpdating = True
End Sub
 
Upvote 0
i would use the in built excel functions to sort the data first by the key you are testing, then compare succesive entries it should be quicker as they are grouped together
 
Upvote 0
Give this macro a try... it should do what you want and do it quite quickly I would think...

Code:
Sub HighlightDuplicates()
  Dim DataColumn As Long, LastRow As Long, UnusedColumn As Long, Diff As Long
  DataColumn = ActiveCell.Column
  LastRow = Cells(Rows.Count, ActiveCell.Column).End(xlUp).Row
  UnusedColumn = Cells.Find(What:="*", SearchOrder:=xlByColumns, _
                 SearchDirection:=xlPrevious, LookIn:=xlFormulas).Column + 1
  Diff = UnusedColumn - DataColumn
  Application.ScreenUpdating = False
  With Columns(UnusedColumn)
    .Resize(LastRow).FormulaR1C1 = "=IF(COUNTIF(R1C[-" & Diff & "]:RC[-" & Diff & "],RC[-" & Diff & "])-1,""X"","""")"
    .Value = .Value
    Intersect(.Offset(, -Diff).EntireColumn, .SpecialCells(xlCellTypeConstants).EntireRow).Interior.ColorIndex = 6
    .Clear
  End With
  Application.ScreenUpdating = True
End Sub
Your original code assumed the ActiveCell would be in the column with your data, so the above is defaulted to that same requirement; however, if your data column is always the same, you can assign its column number to the DataColumn variable in the first line of code after the Dim statement (simply replace the ActiveCell.Column with the column number holding your data).
 
Upvote 0
Hi CaliKidd,

You might like to try this one. Just select any cell in the column you want checked for duplicates and run the code.
Code:
Sub markdups()
Dim d As Object, e
Set d = CreateObject("scripting.dictionary")
For Each e In Intersect(Columns(ActiveCell.Column), ActiveSheet.UsedRange)
If e.Value <> vbNullString Then
    If Not d.exists(e.Value) Then d(e.Value) = 1 Else _
        e.Font.ColorIndex = 4
End If
Next
End Sub
 
Upvote 0
Thanks, everyone. Your help is appreciated!

Mike, your code was 4x faster (able to get through 60 duplicate checks in 30 minutes), thank you, but still not in the order of magnitude I was hoping for large data sets.

Rick and Mirabeau, I like the entirely different approaches you both used. I modified Mirabeau's solution and I was able to scan all 1,000,000 records for duplicates in about 6 minutes. HUGE improvement! :)
 
Upvote 0

Forum statistics

Threads
1,217,375
Messages
6,136,192
Members
449,997
Latest member
satyam7054

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top