Finding unknown duplicate entries in multiple worksheets

Tpynckel

New Member
Joined
Sep 23, 2002
Messages
5
I have a file with some 100 worksheets, each containing up to 200 rows and each having identical header rows and number of columns.
I wish to search for duplicate entries in 4 of the columns (initials, DOB, Race, Sex)across all the worksheets and have the dups displayed. Only two of the columns might have matches, but I want all the duplicates displayed.
I have used the micro$oft macro for this but it is really cumbersome and time consuming.
I am not an expert but know my way around basic formulas etc.
 

Excel Facts

Format cells as time
Select range and press Ctrl+Shift+2 to format cells as time. (Shift 2 is the @ sign).
Hi,

Try the following code and report your results. It is not very efficient as it loops a lot, but it seemed to work on my test dataset.

<pre>Sub test()
Dim ws As Worksheet, ws2 As Worksheet
Dim DupArray, UniqueCollection As New Collection
Dim Counter As Long, i As Long, LastRow As Long, j As Long
Dim TestString As String
Dim TestArray

Dim InitCol As Integer
Dim DOBCol As Integer
Dim RaceCol As Integer
Dim SexCol As Integer

Dim fn As WorksheetFunction
Set fn = Application.WorksheetFunction

With ActiveSheet
InitCol = Application.Match("Initials", .Rows(1), 0)
DOBCol = Application.Match("DOB", .Rows(1), 0)
RaceCol = Application.Match("Race", .Rows(1), 0)
SexCol = Application.Match("Sex", .Rows(1), 0)
End With

For Each ws In ThisWorkbook.Worksheets
LastRow = ws.Cells.SpecialCells(xlCellTypeLastCell).Row
For i = 2 To LastRow
TestString = ws.Cells(i, InitCol) & " | " _
& ws.Cells(i, DOBCol) & " | " _
& ws.Cells(i, RaceCol) & " | " _
& ws.Cells(i, SexCol)

On Error Resume Next
UniqueCollection.Add TestString, TestString
Err.Clear
On Error GoTo 0

Next i
Next ws



For Each ws In ThisWorkbook.Worksheets
LastRow = ws.Cells.SpecialCells(xlCellTypeLastCell).Row
For i = 2 To LastRow
TestString = ws.Cells(i, InitCol) & " | " _
& ws.Cells(i, DOBCol) & " | " _
& ws.Cells(i, RaceCol) & " | " _
& ws.Cells(i, SexCol)

If UniqueCollection.Count Then
ReDim TestArray(1 To UniqueCollection.Count)
For j = 1 To UniqueCollection.Count
TestArray(j) = UniqueCollection(j)
Next j
End If

If IsError(Application.Match(TestString, TestArray, 0)) Then
Counter = Counter + 1
If Counter = 1 Then
ReDim DupArray(1 To Counter)
Else
ReDim Preserve DupArray(1 To Counter)
End If
DupArray(Counter) = TestString
Else
If UniqueCollection.Count Then
UniqueCollection.Remove Application.Match(TestString, TestArray, 0)
End If
End If
Next i
Next ws

Set ws2 = Worksheets.Add
ws2.Range("A1").Resize(UBound(DupArray) - LBound(DupArray) + 1, 1) = fn.Transpose(DupArray)

End Sub</pre>
 
Upvote 0
Jay:
Thanks for your quick response. Your solution is a bit over my head. Could you please take me step by step, starting with where and how to insert this code solution of yours. Sorry to be such a dummy, but the grey hair has started to grow into the grey matter.
 
Upvote 0
Could someone please tell me why this board is so dreadfully slow? Other forums, and even the dreaded AOL are much much faster. Yes I have broadband. Is it my computer configuration?
 
Upvote 0
Hi,

The slow board is a relatively recent thing, and I believe that Bill J (MrExcel) is working on a solution. At certain times during the day it is much slower than at others, although I do not believe that it has much to do with the number of people accessing the site (I think it is at the host provider, but don't hold me to that as it is just speculation).

I think that with the way that your data is structured, it would be extremely difficult to get a worksheet solution using native Excel functions. That is why VBA was the route to take.

What it does is the following:

1. It cycles through all the worksheets and reads all the unique items into a collection (one instance of each duplicate is also added).

2. It then recycles through the list and compares each item again to whatever is in the collection. If it is found then the item is removed from the collection and the search continues.

3. When a duplicate item is reached, it is no longer in the collection (having already been removed at the first instance) so is read into the array of duplicates -- DupArray is the variable I used.

4. After this cycle through the data is complete, a new worksheet is added and the array is loaded into the first column.

If you are not familiar with VBA arrays and collections this may seem a bit difficult. It is more advanced than other aspects of VBA programming, but these are quite powerful techniques and it is well worth the time and effort to learn them, IMO.

Another way to do this would be to collate all the workshets into one master sheet and then determine the duplicates from there. I wanted make an attempt without having to do any data manipulation on the worksheet except for the last step.

Possibly Pivot Tables would be another option, too.
 
Upvote 0
Jay:
Thanks again for the quick response. I have been dumping all the spreadsheets into one and sorting for dups that way. I just knew there had to be a better way but not being a programmer I was lost. As I mentioned in my last note, I changed the column titles in the macro to match my spreadsheet, but got an error. If you are willing to babysit me a bit, could we try to work this out? I really know nada about VBA programming or pivot tables....kinda lacking on time to learn the details. Let me know. Here is what I changed...just the column headers. Oh I have a space in the spreadsheets between "patient" and "initials"....actually the words are on two separate rows. Does this make a difference?
Sub test()
Dim ws As Worksheet, ws2 As Worksheet
Dim DupArray, UniqueCollection As New Collection
Dim Counter As Long, i As Long, LastRow As Long, j As Long
Dim TestString As String
Dim TestArray

Dim PatientInitialsCol As Integer
Dim DOBCol As Integer
Dim RaceCol As Integer
Dim SexCol As Integer

Dim fn As WorksheetFunction
Set fn = Application.WorksheetFunction

With ActiveSheet
InitCol = Application.Match("Initials", .Rows(1), 0)
DOBCol = Application.Match("DOB", .Rows(1), 0)
RaceCol = Application.Match("Race", .Rows(1), 0)
SexCol = Application.Match("Sex", .Rows(1), 0)
End With

For Each ws In ThisWorkbook.Worksheets
LastRow = ws.Cells.SpecialCells(xlCellTypeLastCell).Row
For i = 2 To LastRow
TestString = ws.Cells(i, PatientInitialsCol) & " | " _
& ws.Cells(i, DOBCol) & " | " _
& ws.Cells(i, RaceCol) & " | " _
& ws.Cells(i, SexCol)

On Error Resume Next
UniqueCollection.Add TestString, TestString
Err.Clear
On Error GoTo 0

Next i
Next ws



For Each ws In ThisWorkbook.Worksheets
LastRow = ws.Cells.SpecialCells(xlCellTypeLastCell).Row
For i = 2 To LastRow
TestString = ws.Cells(i, PatientInitialsCol) & " | " _
& ws.Cells(i, DOBCol) & " | " _
& ws.Cells(i, RaceCol) & " | " _
& ws.Cells(i, SexCol)

If UniqueCollection.Count Then
ReDim TestArray(1 To UniqueCollection.Count)
For j = 1 To UniqueCollection.Count
TestArray(j) = UniqueCollection(j)
Next j
End If

If IsError(Application.Match(TestString, TestArray, 0)) Then
Counter = Counter + 1
If Counter = 1 Then
ReDim DupArray(1 To Counter)
Else
ReDim Preserve DupArray(1 To Counter)
End If
DupArray(Counter) = TestString
Else
If UniqueCollection.Count Then
UniqueCollection.Remove Application.Match(TestString, TestArray, 0)
End If
End If
Next i
Next ws

Set ws2 = Worksheets.Add
ws2.Range("A1").Resize(UBound(DupArray) - LBound(DupArray) + 1, 1) = fn.Transpose(DupArray)

End Sub


thanks again.
tp
 
Upvote 0

Forum statistics

Threads
1,224,352
Messages
6,178,068
Members
452,822
Latest member
MtC

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top