Removing Duplicates

privxls

Board Regular
Joined
Nov 22, 2016
Messages
55
Office Version
  1. 2021
Platform
  1. Windows
Hi all,

Good day again!

I would like to ask for your expertise on the following:

I am currently working on an excel sheet where it contains 1000+ data, the problem is, the data contains A LOT of duplicates. Now, I tried removing the duplicates, it removed the duplicates as expected, however, the lines have been adjusted -- which I do not want to happen (lines being moved). I just want to remove the duplicate without affecting the lines.

Here's an example of the list (changed the name for privacy reasons):
XS8gofV.png



There are over 15 columns per line, and there are over 1000+ lines. I tried the conditional formatting where it would highlight the duplicate values, but removing them manually would take me weeks or possibly months. Would there be a better way to sort this?

I really hope you can provide me your guidance on this.

Best,
Priv

P.S - Edit, I've tried searching for other posts and even through Google, but none of the answers seem to work for me. (Or I might just be doing it wrong [for the CountIf function which I saw as an answer])
 
Last edited:

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.
Why don't you show us what the list you posted should look like after the duplicates have been removed (that way, we will not have to ask you several questions in order for you do clarify exactly what you want).
 
Upvote 0
Can you define what constitutes a duplicate? Is it the name or a combination of cell values? What data should not be shown if duplicated?
 
Upvote 0
Hi,
@Crystalyzer, I am only looking for duplicates in the 'Full Name' field. Thanks for the replies so here goes a more detailed post:

Here's the example file:
lZdVKtI.png


Now, as you can see above, it shows that on some duplicates have matching details, while some, don't.

Here's what happens every time I select Data -> Remove Duplicates:
wWK6wx1.png


What I wanted was, for every instance of duplicate value in the B:B, the line along with it gets removed.

Now, I've tried to select remove duplicates and expand selection, it just returns with 'No Duplicate Values Found' like the image below:
YPU3BbO.png



Transaction Date

<tbody>
</tbody>
Full Name

<tbody>
</tbody>
First Name

<tbody>
</tbody>
Last Name

<tbody>
</tbody>
Status

<tbody>
</tbody>
Phone Number

<tbody>
</tbody>
Address Line 1

<tbody>
</tbody>
Address Line 2

<tbody>
</tbody>
City

<tbody>
</tbody>
State

<tbody>
</tbody>
Zip

<tbody>
</tbody>
4/1/2019 11:44

<tbody>
</tbody>
John Smith

<tbody>
</tbody>
John

<tbody>
</tbody>
Smith

<tbody>
</tbody>
failed

<tbody>
</tbody>
123-456-7890

<tbody>
</tbody>
123 W St

<tbody>
</tbody>
East

<tbody>
</tbody>
FL

<tbody>
</tbody>
28530

<tbody>
</tbody>
4/1/2019 12:31

<tbody>
</tbody>
Anna Heim

<tbody>
</tbody>
Anna

<tbody>
</tbody>
Heim

<tbody>
</tbody>
failed

<tbody>
</tbody>
200-123-4567

<tbody>
</tbody>
250 Vivian Avenue

<tbody>
</tbody>
Apt 2

<tbody>
</tbody>
West

<tbody>
</tbody>
AZ

<tbody>
</tbody>
86320

<tbody>
</tbody>
4/2/2019 11:43

<tbody>
</tbody>
John Smith

<tbody>
</tbody>
John

<tbody>
</tbody>
Smith

<tbody>
</tbody>
failed

<tbody>
</tbody>
123-456-7890

<tbody>
</tbody>
123 W St

<tbody>
</tbody>
EastFL28530
4/2/2019 13:35

<tbody>
</tbody>
John Peterson

<tbody>
</tbody>
JohnPetersonfailed
758 239 0123

<tbody>
</tbody>
500 George Avenue

<tbody>
</tbody>
Apt 5B

<tbody>
</tbody>
SouthCA90706
4/2/2019 13:49

<tbody>
</tbody>
Arnold Walker

<tbody>
</tbody>
Arnold

<tbody>
</tbody>
Walkerfailed
503 658 4964

<tbody>
</tbody>
89 Cause Route

<tbody>
</tbody>
WashingtonOR97089
4/2/2019 15:58

<tbody>
</tbody>
Anna Celia

<tbody>
</tbody>
AnnaCeliafailed
783-293-0399

<tbody>
</tbody>
Complex Building, 25 East Avenue

<tbody>
</tbody>
Empysh

<tbody>
</tbody>
CA90706
4/3/2019 9:48

<tbody>
</tbody>
Anna Heim

<tbody>
</tbody>
AnnaHeimfailed
200-123-4567

<tbody>
</tbody>
250 Vivian Avenue

<tbody>
</tbody>
Apt 2WestAZ86320
4/3/2019 13:35

<tbody>
</tbody>
Tim Master

<tbody>
</tbody>
TimMasterfailed
234-567-0001

<tbody>
</tbody>
50 Cent Way

<tbody>
</tbody>
*******GA30722
4/3/2019 13:52

<tbody>
</tbody>
Noli M. Tangerine

<tbody>
</tbody>
Noli M.Tangerinefailed
900-253-860

<tbody>
</tbody>
99 Per Ctr.SpeakerGA31502
4/4/2019 10:21

<tbody>
</tbody>
John Smith

<tbody>
</tbody>
JohnSmithfailed123 456 7890123 W StEastFL28530
4/4/2019 11:50

<tbody>
</tbody>
Noli M. Tangerine

<tbody>
</tbody>
Noli M.Tangerinefailed
900253860

<tbody>
</tbody>
99 Per Ctr.

<tbody>
</tbody>
SpeakerGA31502

<tbody>
</tbody>


I really hope to get more tips and guides from ya'll.

Thank you,
Priv
200-123-4567

<tbody>
</tbody>

<tbody>
</tbody>
 
Last edited:
Upvote 0
If I am understanding you correctly, then this code will do the job. It will remove the entire line for duplicates found in column B.

Code:
Option Explicit


Sub DeleteDuplicateRows()
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' DeleteDuplicateRows
' This will delete duplicate records, based on the Active Column. That is,
' if the same value is found more than once in the Active Column, all but
' the first (lowest row number) will be deleted.
'
' To run the macro, select the entire column you wish to scan for
' duplicates, and run this procedure.
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''


    Dim r As Long
    Dim n As Long
    Dim v As Variant
    Dim rng As Range


    On Error GoTo EndMacro
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual




    Set rng = Application.Intersect(ActiveSheet.UsedRange, _
                                    ActiveSheet.Columns(ActiveCell.Column))


    Application.StatusBar = "Processing Row: " & Format(rng.Row, "#,##0")


    n = 0
    For r = rng.Rows.Count To 2 Step -1
        If r Mod 500 = 0 Then
            Application.StatusBar = "Processing Row: " & Format(r, "#,##0")
        End If


        v = rng.Cells(r, 1).Value
        '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
        ' Note that COUNTIF works oddly with a Variant that is equal to vbNullString.
        ' Rather than pass in the variant, you need to pass in vbNullString explicitly.
        '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
        If v = vbNullString Then
            If Application.WorksheetFunction.CountIf(rng.Columns(1), vbNullString) > 1 Then
                rng.Rows(r).EntireRow.Delete
                'rng.Rows(r).EntireRow.Copy Sheets("Sheet2").Range("A" & Rows.Count).End(xlUp).Row + 1
                n = n + 1
            End If
        Else
            If Application.WorksheetFunction.CountIf(rng.Columns(1), v) > 1 Then
                rng.Rows(r).EntireRow.Delete
                'rng.Rows(r).EntireRow.Copy Sheets("Sheet2").Range("A" & Rows.Count).End(xlUp).Row + 1
                n = n + 1
            End If
        End If
    Next r


EndMacro:


    Application.StatusBar = False
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    MsgBox "Duplicate Rows Deleted: " & CStr(n)




End Sub
 
Upvote 0
If I am understanding you correctly, then this code will do the job. It will remove the entire line for duplicates found in column B.

Code:
Option Explicit


Sub DeleteDuplicateRows()
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' DeleteDuplicateRows
' This will delete duplicate records, based on the Active Column. That is,
' if the same value is found more than once in the Active Column, all but
' the first (lowest row number) will be deleted.
'
' To run the macro, select the entire column you wish to scan for
' duplicates, and run this procedure.
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''


    Dim r As Long
    Dim n As Long
    Dim v As Variant
    Dim rng As Range


    On Error GoTo EndMacro
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual




    Set rng = Application.Intersect(ActiveSheet.UsedRange, _
                                    ActiveSheet.Columns(ActiveCell.Column))


    Application.StatusBar = "Processing Row: " & Format(rng.Row, "#,##0")


    n = 0
    For r = rng.Rows.Count To 2 Step -1
        If r Mod 500 = 0 Then
            Application.StatusBar = "Processing Row: " & Format(r, "#,##0")
        End If


        v = rng.Cells(r, 1).Value
        '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
        ' Note that COUNTIF works oddly with a Variant that is equal to vbNullString.
        ' Rather than pass in the variant, you need to pass in vbNullString explicitly.
        '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
        If v = vbNullString Then
            If Application.WorksheetFunction.CountIf(rng.Columns(1), vbNullString) > 1 Then
                rng.Rows(r).EntireRow.Delete
                'rng.Rows(r).EntireRow.Copy Sheets("Sheet2").Range("A" & Rows.Count).End(xlUp).Row + 1
                n = n + 1
            End If
        Else
            If Application.WorksheetFunction.CountIf(rng.Columns(1), v) > 1 Then
                rng.Rows(r).EntireRow.Delete
                'rng.Rows(r).EntireRow.Copy Sheets("Sheet2").Range("A" & Rows.Count).End(xlUp).Row + 1
                n = n + 1
            End If
        End If
    Next r


EndMacro:


    Application.StatusBar = False
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    MsgBox "Duplicate Rows Deleted: " & CStr(n)




End Sub

Oh my... what kind of sorcery is this code? It worked like how I expected it to be. I really appreciate your help alansidman.
I would really need to learn more about Excel and VBA as it would make life a whole lot easier. I'd like to say I love you, lol.

P.S - I'll try to understand how your code works and see what I'll learn from that.

And thanks to Crystalyzer and Rick Rothstein for your time and interest in helping me out, if you guys still have other methods that you would like to share, I would highly appreciate them :)

Best,
Priv
 
Last edited:
Upvote 0
Now, I've tried to select remove duplicates and expand selection, it just returns with 'No Duplicate Values Found' like the image below:

I really hope to get more tips and guides from ya'll.
If you want to do it manually
After you have expanded the selection, in the Remove Duplicates dialog you need to 'Unselect All' and then just select 'Full Name' in the list of columns then OK (but also note my comment below about sorting)

If you want it as a macro
I think all it should need is this.

Rich (BB code):
Sub RemoveDupesFullName()
    ActiveSheet.UsedRange.RemoveDuplicates Columns:=2, Header:=xlYes
End Sub

Note, however that Remove Duplicates can produce inaccurate results if the data is not sorted first (example here) so if you want to use Remove Duplicates and be confident of the results then best to sort first.

Rich (BB code):
Sub RemoveDupesFullName_v2()
    With ActiveSheet.UsedRange
      .Sort Key1:=Range("B2"), Order1:=xlAscending, Header:=xlYes
      .RemoveDuplicates Columns:=2, Header:=xlYes
    End With
End Sub
 
Last edited:
Upvote 0

Forum statistics

Threads
1,213,489
Messages
6,113,954
Members
448,535
Latest member
alrossman

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top