Remove Duplicates from a Range

Caleeco

Well-known Member
Joined
Jan 9, 2016
Messages
899
Hello,

I am attempting to remove duplicates from a set range. However, I need it to be only the range defined and not the entire column.

I read an example online whereby a unique list is created by adding each row of data to a collection (resume next on errors).

My code below correctly identifies the number of unique values in the data set stored in range A1:A7.

However, I cant work out how to then paste each item of the collection in cell D5 onwards (D6, D7, D8, etc)

Any help is much appreciated!

Code:
Sub RemoveDuplicate()


Dim NoDuplicates As New Collection
Dim rData As Range
Dim Cell As Range


Dim lrow As Long
Dim Item As Object


Set rData = ActiveSheet.Range("A1", Range("A" & Rows.Count).End(xlUp))


On Error Resume Next
For Each Cell In rData
    NoDuplicates.Add Cell.Value, CStr(Cell.Value)
Next


MsgBox "Number of Unique Values: " & NoDuplicates.Count


lrow = 5
For Each Item In NoDuplicates
    ActiveSheet.Range("D" & lrow) = Item.Value
    lrow = lrow + 1
Next Item


End Sub
 
Last edited:

Caleeco

Well-known Member
Joined
Jan 9, 2016
Messages
899
Ok, got it working after some research. Needed to define the items of the collection as VARIANT. Also I wasn't calling the data correctly. The code below does what I want, if anyone knows a better way to do it, I'm open to suggestions!

Code:
Sub RemoveDuplicate()


Dim NoDuplicates As New Collection
Dim rData As Range
Dim Cell As Range


Dim lrow As Long
Dim varItem As Variant


Set rData = ActiveSheet.Range("A1", Range("A" & Rows.Count).End(xlUp))


On Error Resume Next
For Each Cell In rData
    NoDuplicates.Add Cell.Value, CStr(Cell.Value)
Next


MsgBox "Number of Unique Values: " & NoDuplicates.Count


lrow = 5
For Each varItem In NoDuplicates
    ActiveSheet.Range("D" & lrow) = NoDuplicates(varItem)
    lrow = lrow + 1
Next varItem


End Sub
 

JoeMo

MrExcel MVP
Joined
May 26, 2009
Messages
16,691
Ok, got it working after some research. Needed to define the items of the collection as VARIANT. Also I wasn't calling the data correctly. The code below does what I want, if anyone knows a better way to do it, I'm open to suggestions!

Code:
Sub RemoveDuplicate()


Dim NoDuplicates As New Collection
Dim rData As Range
Dim Cell As Range


Dim lrow As Long
Dim varItem As Variant


Set rData = ActiveSheet.Range("A1", Range("A" & Rows.Count).End(xlUp))


On Error Resume Next
For Each Cell In rData
    NoDuplicates.Add Cell.Value, CStr(Cell.Value)
Next


MsgBox "Number of Unique Values: " & NoDuplicates.Count


lrow = 5
For Each varItem In NoDuplicates
    ActiveSheet.Range("D" & lrow) = NoDuplicates(varItem)
    lrow = lrow + 1
Next varItem


End Sub
I prefer using a dictionary object over collections. The code below uses a dictionary to extract the unique values from column A and lists them starting in D5.
Code:
Sub RemoveDupsWithDictionary()
Dim dataRng As Range, V As Variant
Dim d As Object
Set dataRng = Range("A1:A" & Cells(Rows.Count, "A").End(xlUp).Row)
V = dataRng.Value  'places values in data range into an array for faster processing
Set d = CreateObject("Scripting.dictionary")
For i = 1 To UBound(V, 1)
    If Not d.exists(V(i, 1)) Then
        d.Add V(i, 1), d.Count
    End If
Next i
Application.ScreenUpdating = False
Range("D5").Resize(d.Count, 1).Value = Application.Transpose(d.keys)
Application.ScreenUpdating = True
End Sub
 

Caleeco

Well-known Member
Joined
Jan 9, 2016
Messages
899
I prefer using a dictionary object over collections. The code below uses a dictionary to extract the unique values from column A and lists them starting in D5.
Code:
Sub RemoveDupsWithDictionary()
Dim dataRng As Range, V As Variant
Dim d As Object
Set dataRng = Range("A1:A" & Cells(Rows.Count, "A").End(xlUp).Row)
V = dataRng.Value  'places values in data range into an array for faster processing
Set d = CreateObject("Scripting.dictionary")
For i = 1 To UBound(V, 1)
    If Not d.exists(V(i, 1)) Then
        d.Add V(i, 1), d.Count
    End If
Next i
Application.ScreenUpdating = False
Range("D5").Resize(d.Count, 1).Value = Application.Transpose(d.keys)
Application.ScreenUpdating = True
End Sub
Hi JoeMo,

Thanks for the slick code! I was reading it through (trying to understand this approach & what you've done), but am stuck on the following 2 bits:

Code:
If Not d.exists(V(i, 1)) Then
        d.Add V(i, 1), d.Count
End If
Code:
Range("D5").Resize(d.Count, 1).Value = Application.Transpose(d.keys)
Would you be able to elaborate in layman's terms what these two bits of code are doing?

Many Thanks!
 

Caleeco

Well-known Member
Joined
Jan 9, 2016
Messages
899
Think it should be

ActiveSheet.Range("D" & lrow) = varItem
Thanks Sericom. This is the first time I've used custom collections, so I'm still learning how to implement them. Your suggestion works great :)
 

Zack Barresse

MrExcel MVP
Joined
Dec 9, 2003
Messages
10,724
Both a collection object (collection) and a scripting dictionary object (dictionary) provide the ability to store values and objects with string keys. Both objects allow an unlimited number of values or objects to be stored with keys and accessed and removed using those keys. Beyond that each object has certain benefits the other does not. Below is a summary of the pros and cons of each.

Types of Data
Both a collection and a dictionary can store any type of data besides strings including objects.

Determining if a Key Exists
A collection provides no easy method for checking if a key exists. A dictionary provides the Exists method. For example, to check if a key exists in a collection error handling must be used as illustrated with the sample code below.

Dim Value As Variant
On Error Resume Next
Value = MyCollection("Key")
If Err.Number = 5 Then
' Key does not exist
End If
On Error GoTo 0

Note that not only does the scripting dictionary provide the Exists method, it also allows setting the compare method using the CompareMode property by setting it to BinaryCompare, TextCompare, or DatabaseCompare. Also note that an existing key can be changed by setting the Key item value.

Accessing Items by Index
A collection allows accessing items by index as well as keys. A dictionary only allows accessing by key. To access items by index the dictionary items collection has to be copied to a variant array as illustrated with the sample code below.

Dim Values As Variant
Values = MyDictionary.Items
MsgBox Values(0)

Removing Items
A collection allows items to be removed by index and by key. A dictionary allows removal only by key.

Performance
For sets smaller than a few thousand items there is no perceptible difference in performance. A dictionary offers measurably faster performance with sets less than 40 or 50 thousand entries but never more than half the performance of a collection. If the collection or dictionary will be used created once and then used repeatedly for random checks for whether or not a key exists then a dictionary offers better performance with fewer than about 15,000 elements with the one caveat that with fewer than 1,000 elements there is no perceptible difference. Below are sample measures of adding and reading various set sizes.

Adding 1,000 entries: No perceptible difference in performance.
Adding 5,000 entries: A dictionary takes about 75% of the time that a collection does.
Adding 25,000 entries: A dictionary takes about 80% of the time that a collection does.
Adding 50,000 entries: Both the dictionary and collection offer about the same performance.
Adding 100,000 entries: A collection takes about 70% of the time that a dictionary does.

Reading 1,000 entries: No perceptible difference in performance.
Reading 5,000 entries: A dictionary takes about 60% of the time that a collection does.
Reading 25,000 entries: A collection takes about 80% of the time that a dictionary does.
Reading 50,000 entries: A collection takes about 55% of the time that a dictionary does.
Reading 100,000 entries: A collection takes about 35% of the time that a dictionary does.

There is one very important exception to the above. If error handling is used to determine whether or not a key exists in a collection then a collection can start performing very badly in a compiled VB application. This is because capturing errors with error handling takes anywhere from four to 30 times longer in a compiled VB application than when running the same code in the debugger. So if it is necessary to check if keys exist and the application is written in VB then the scripting dictionary is by far the superior choice. This is not true with a VBA implementation as VBA is always interpreted and error handling seems to perform better in that environment.
 

Zack Barresse

MrExcel MVP
Joined
Dec 9, 2003
Messages
10,724
Very welcome. In general I prefer Collections, but mostly because they're native and I don't need an external reference call. Largely it just comes down to preference.

As a side note, a wise programmer once taught me that it ends up being more expensive (in terms of efficiency and CPU cost) to declare the variable as New, as opposed to declaring it as New when setting it. I've always been in the habit ever since.
 

JoeMo

MrExcel MVP
Joined
May 26, 2009
Messages
16,691
Hi JoeMo,

Thanks for the slick code! I was reading it through (trying to understand this approach & what you've done), but am stuck on the following 2 bits:

Code:
If Not d.exists(V(i, 1)) Then
        d.Add V(i, 1), d.Count
End If
Code:
Range("D5").Resize(d.Count, 1).Value = Application.Transpose(d.keys)
Would you be able to elaborate in layman's terms what these two bits of code are doing?

Many Thanks!
I see Zack has posted extensive info on collections and dictionaries, but in case you have not extracted answers to your questions from Zack's post, here are simple explanations.

The first bit of code uses the Exists method to see if the value being tested is already in the dictionary. If so, if is skipped (so a duplicate is ignored). If not, it added to the dictionary as a key (V(i,1)) and a companion Item, in this case just a sequential number, d.count that reveals how many entries are in the dictionary thus far.

The second bit writes the unique values (i.e. those in the dictionary) to the desired range by transposing the keys (which are contained in the horizontal array d.keys so must be transposed as you want them written to a vertical range). The size of the range is just the number of cells required to hold all the keys in the dictionary which is exactly d.count.
 

Forum statistics

Threads
1,086,084
Messages
5,387,709
Members
402,075
Latest member
COwen

Some videos you may like

This Week's Hot Topics

Top