Removing Duplicates

privxls

New Member
Joined
Nov 22, 2016
Messages
47
Hi all,

Good day again!

I would like to ask for your expertise on the following:

I am currently working on an excel sheet where it contains 1000+ data, the problem is, the data contains A LOT of duplicates. Now, I tried removing the duplicates, it removed the duplicates as expected, however, the lines have been adjusted -- which I do not want to happen (lines being moved). I just want to remove the duplicate without affecting the lines.

Here's an example of the list (changed the name for privacy reasons):



There are over 15 columns per line, and there are over 1000+ lines. I tried the conditional formatting where it would highlight the duplicate values, but removing them manually would take me weeks or possibly months. Would there be a better way to sort this?

I really hope you can provide me your guidance on this.

Best,
Priv

P.S - Edit, I've tried searching for other posts and even through Google, but none of the answers seem to work for me. (Or I might just be doing it wrong [for the CountIf function which I saw as an answer])
 
Last edited:

Some videos you may like

Excel Facts

Best way to learn Power Query?
Read M is for (Data) Monkey book by Ken Puls and Miguel Escobar. It is the complete guide to Power Query.

Rick Rothstein

MrExcel MVP
Joined
Apr 18, 2011
Messages
35,926
Office Version
2010
Platform
Windows
Why don't you show us what the list you posted should look like after the duplicates have been removed (that way, we will not have to ask you several questions in order for you do clarify exactly what you want).
 

Crystalyzer

Board Regular
Joined
Oct 18, 2011
Messages
177
Can you define what constitutes a duplicate? Is it the name or a combination of cell values? What data should not be shown if duplicated?
 

privxls

New Member
Joined
Nov 22, 2016
Messages
47
Hi,
@Crystalyzer, I am only looking for duplicates in the 'Full Name' field. Thanks for the replies so here goes a more detailed post:

Here's the example file:


Now, as you can see above, it shows that on some duplicates have matching details, while some, don't.

Here's what happens every time I select Data -> Remove Duplicates:


What I wanted was, for every instance of duplicate value in the B:B, the line along with it gets removed.

Now, I've tried to select remove duplicates and expand selection, it just returns with 'No Duplicate Values Found' like the image below:



Transaction Date

<tbody>
</tbody>
Full Name

<tbody>
</tbody>
First Name

<tbody>
</tbody>
Last Name

<tbody>
</tbody>
Status

<tbody>
</tbody>
Phone Number

<tbody>
</tbody>
Address Line 1

<tbody>
</tbody>
Address Line 2

<tbody>
</tbody>
City

<tbody>
</tbody>
State

<tbody>
</tbody>
Zip

<tbody>
</tbody>
4/1/2019 11:44

<tbody>
</tbody>
John Smith

<tbody>
</tbody>
John

<tbody>
</tbody>
Smith

<tbody>
</tbody>
failed

<tbody>
</tbody>
123-456-7890

<tbody>
</tbody>
123 W St

<tbody>
</tbody>
East

<tbody>
</tbody>
FL

<tbody>
</tbody>
28530

<tbody>
</tbody>
4/1/2019 12:31

<tbody>
</tbody>
Anna Heim

<tbody>
</tbody>
Anna

<tbody>
</tbody>
Heim

<tbody>
</tbody>
failed

<tbody>
</tbody>
200-123-4567

<tbody>
</tbody>
250 Vivian Avenue

<tbody>
</tbody>
Apt 2

<tbody>
</tbody>
West

<tbody>
</tbody>
AZ

<tbody>
</tbody>
86320

<tbody>
</tbody>
4/2/2019 11:43

<tbody>
</tbody>
John Smith

<tbody>
</tbody>
John

<tbody>
</tbody>
Smith

<tbody>
</tbody>
failed

<tbody>
</tbody>
123-456-7890

<tbody>
</tbody>
123 W St

<tbody>
</tbody>
EastFL28530
4/2/2019 13:35

<tbody>
</tbody>
John Peterson

<tbody>
</tbody>
JohnPetersonfailed
758 239 0123

<tbody>
</tbody>
500 George Avenue

<tbody>
</tbody>
Apt 5B

<tbody>
</tbody>
SouthCA90706
4/2/2019 13:49

<tbody>
</tbody>
Arnold Walker

<tbody>
</tbody>
Arnold

<tbody>
</tbody>
Walkerfailed
503 658 4964

<tbody>
</tbody>
89 Cause Route

<tbody>
</tbody>
WashingtonOR97089
4/2/2019 15:58

<tbody>
</tbody>
Anna Celia

<tbody>
</tbody>
AnnaCeliafailed
783-293-0399

<tbody>
</tbody>
Complex Building, 25 East Avenue

<tbody>
</tbody>
Empysh

<tbody>
</tbody>
CA90706
4/3/2019 9:48

<tbody>
</tbody>
Anna Heim

<tbody>
</tbody>
AnnaHeimfailed
200-123-4567

<tbody>
</tbody>
250 Vivian Avenue

<tbody>
</tbody>
Apt 2WestAZ86320
4/3/2019 13:35

<tbody>
</tbody>
Tim Master

<tbody>
</tbody>
TimMasterfailed
234-567-0001

<tbody>
</tbody>
50 Cent Way

<tbody>
</tbody>
*******GA30722
4/3/2019 13:52

<tbody>
</tbody>
Noli M. Tangerine

<tbody>
</tbody>
Noli M.Tangerinefailed
900-253-860

<tbody>
</tbody>
99 Per Ctr.SpeakerGA31502
4/4/2019 10:21

<tbody>
</tbody>
John Smith

<tbody>
</tbody>
JohnSmithfailed123 456 7890123 W StEastFL28530
4/4/2019 11:50

<tbody>
</tbody>
Noli M. Tangerine

<tbody>
</tbody>
Noli M.Tangerinefailed
900253860

<tbody>
</tbody>
99 Per Ctr.

<tbody>
</tbody>
SpeakerGA31502

<tbody>
</tbody>


I really hope to get more tips and guides from ya'll.

Thank you,
Priv
200-123-4567

<tbody>
</tbody>

<tbody>
</tbody>
 
Last edited:

alansidman

Well-known Member
Joined
Feb 26, 2007
Messages
5,581
Office Version
2019
Platform
Windows
If I am understanding you correctly, then this code will do the job. It will remove the entire line for duplicates found in column B.

Code:
Option Explicit


Sub DeleteDuplicateRows()
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' DeleteDuplicateRows
' This will delete duplicate records, based on the Active Column. That is,
' if the same value is found more than once in the Active Column, all but
' the first (lowest row number) will be deleted.
'
' To run the macro, select the entire column you wish to scan for
' duplicates, and run this procedure.
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''


    Dim r As Long
    Dim n As Long
    Dim v As Variant
    Dim rng As Range


    On Error GoTo EndMacro
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual




    Set rng = Application.Intersect(ActiveSheet.UsedRange, _
                                    ActiveSheet.Columns(ActiveCell.Column))


    Application.StatusBar = "Processing Row: " & Format(rng.Row, "#,##0")


    n = 0
    For r = rng.Rows.Count To 2 Step -1
        If r Mod 500 = 0 Then
            Application.StatusBar = "Processing Row: " & Format(r, "#,##0")
        End If


        v = rng.Cells(r, 1).Value
        '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
        ' Note that COUNTIF works oddly with a Variant that is equal to vbNullString.
        ' Rather than pass in the variant, you need to pass in vbNullString explicitly.
        '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
        If v = vbNullString Then
            If Application.WorksheetFunction.CountIf(rng.Columns(1), vbNullString) > 1 Then
                rng.Rows(r).EntireRow.Delete
                'rng.Rows(r).EntireRow.Copy Sheets("Sheet2").Range("A" & Rows.Count).End(xlUp).Row + 1
                n = n + 1
            End If
        Else
            If Application.WorksheetFunction.CountIf(rng.Columns(1), v) > 1 Then
                rng.Rows(r).EntireRow.Delete
                'rng.Rows(r).EntireRow.Copy Sheets("Sheet2").Range("A" & Rows.Count).End(xlUp).Row + 1
                n = n + 1
            End If
        End If
    Next r


EndMacro:


    Application.StatusBar = False
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    MsgBox "Duplicate Rows Deleted: " & CStr(n)




End Sub
 

privxls

New Member
Joined
Nov 22, 2016
Messages
47
If I am understanding you correctly, then this code will do the job. It will remove the entire line for duplicates found in column B.

Code:
Option Explicit


Sub DeleteDuplicateRows()
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' DeleteDuplicateRows
' This will delete duplicate records, based on the Active Column. That is,
' if the same value is found more than once in the Active Column, all but
' the first (lowest row number) will be deleted.
'
' To run the macro, select the entire column you wish to scan for
' duplicates, and run this procedure.
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''


    Dim r As Long
    Dim n As Long
    Dim v As Variant
    Dim rng As Range


    On Error GoTo EndMacro
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual




    Set rng = Application.Intersect(ActiveSheet.UsedRange, _
                                    ActiveSheet.Columns(ActiveCell.Column))


    Application.StatusBar = "Processing Row: " & Format(rng.Row, "#,##0")


    n = 0
    For r = rng.Rows.Count To 2 Step -1
        If r Mod 500 = 0 Then
            Application.StatusBar = "Processing Row: " & Format(r, "#,##0")
        End If


        v = rng.Cells(r, 1).Value
        '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
        ' Note that COUNTIF works oddly with a Variant that is equal to vbNullString.
        ' Rather than pass in the variant, you need to pass in vbNullString explicitly.
        '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
        If v = vbNullString Then
            If Application.WorksheetFunction.CountIf(rng.Columns(1), vbNullString) > 1 Then
                rng.Rows(r).EntireRow.Delete
                'rng.Rows(r).EntireRow.Copy Sheets("Sheet2").Range("A" & Rows.Count).End(xlUp).Row + 1
                n = n + 1
            End If
        Else
            If Application.WorksheetFunction.CountIf(rng.Columns(1), v) > 1 Then
                rng.Rows(r).EntireRow.Delete
                'rng.Rows(r).EntireRow.Copy Sheets("Sheet2").Range("A" & Rows.Count).End(xlUp).Row + 1
                n = n + 1
            End If
        End If
    Next r


EndMacro:


    Application.StatusBar = False
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    MsgBox "Duplicate Rows Deleted: " & CStr(n)




End Sub
Oh my... what kind of sorcery is this code? It worked like how I expected it to be. I really appreciate your help alansidman.
I would really need to learn more about Excel and VBA as it would make life a whole lot easier. I'd like to say I love you, lol.

P.S - I'll try to understand how your code works and see what I'll learn from that.

And thanks to Crystalyzer and Rick Rothstein for your time and interest in helping me out, if you guys still have other methods that you would like to share, I would highly appreciate them :)

Best,
Priv
 
Last edited:

Peter_SSs

MrExcel MVP, Moderator
Joined
May 28, 2005
Messages
44,700
Office Version
365
Platform
Windows
Now, I've tried to select remove duplicates and expand selection, it just returns with 'No Duplicate Values Found' like the image below:

I really hope to get more tips and guides from ya'll.
If you want to do it manually
After you have expanded the selection, in the Remove Duplicates dialog you need to 'Unselect All' and then just select 'Full Name' in the list of columns then OK (but also note my comment below about sorting)

If you want it as a macro
I think all it should need is this.

Rich (BB code):
Sub RemoveDupesFullName()
    ActiveSheet.UsedRange.RemoveDuplicates Columns:=2, Header:=xlYes
End Sub
Note, however that Remove Duplicates can produce inaccurate results if the data is not sorted first (example here) so if you want to use Remove Duplicates and be confident of the results then best to sort first.

Rich (BB code):
Sub RemoveDupesFullName_v2()
    With ActiveSheet.UsedRange
      .Sort Key1:=Range("B2"), Order1:=xlAscending, Header:=xlYes
      .RemoveDuplicates Columns:=2, Header:=xlYes
    End With
End Sub
 
Last edited:

Watch MrExcel Video

Forum statistics

Threads
1,099,473
Messages
5,468,829
Members
406,612
Latest member
pedad

This Week's Hot Topics

Top