Removing Duplicates in Excel


December 26, 2006

In today's episode, I compared how to find or remove duplicates. Excel 2007 offers cool new ways to do this. The first 3 tips work in any version of Excel. The last 2 methods work only in Excel 2007.

Method 1:

Use the Unique Option in Advanced Filter

  1. To the right of your data, copy the heading from the column where you want to find unique values.
  2. Select a cell in your data set.
  3. In Excel 97-2003, choose Data - Filter - Advanced Filter. In Excel 2007, choose the Advanced icon from the Sort & Filter group of the Data ribbon.
  4. Choose Copy to another Location
  5. In the Copy To box, specify the copy of your heading. In the Figure, this is cell D1
  6. Click the box for Unique Records Only

  7. Click OK

Excel will provide you a unique list of customers in column D.

Method 2:

Use a Formula to Determine if This Record is Unique

The COUNTIF function can count how many records above the current record match the current record. The trick to making this work is to use a single dollar sign in the reference. If you are entering a formula in C2 and you reference A$1:A1, this is saying, "Start from the absolute reference of A1 and go down to the record above the current record". When you copy this formula down, the first A$1 will stay the same. The second A1 will change. In Row 17, the formula in C2 will read: =COUNTIF(A$1:A16,A17)=0.



Once you have entered the formula in C2 and copied it down to all rows, you should copy C2:C15 and then use Edit - Paste Special Values to convert the formulas to values. You can now sort descending by column C and the unique values will be at the top of the list.

Method 3:

Use a Pivot Table to get Unique Customers

A pivot table is great at finding unique values. This is the fastest way in Excel 2000-2003.

  1. Select a cell in your data set.
  2. Choose Data - Pivot Table and Pivot Chart Report.
  3. Click Finish.
  4. In the Pivot Table Field List, click on the Customer Field. Click the Add To button.

Excel will show you a unique list of customers.

Method 4:

New in Excel 2007 - Use Conditional Formatting to Mark Duplicates

Excel 2007 offers new methods for finding duplicates. Select the range of customers. From the Home ribbon, choose Conditional Formatting - Highlight Cells Rules - Duplicate Values and click OK.

If a name is found twice, Excel will highlight both occurences of the name. You would then want to sort all of the highlighted cells to the top.

  1. Click any field in the customer column. Click the AZ button in the Data ribbon.
  2. Find a cell that has the red highlighting. Right click the cell. Choose Sort - Put Selected Cell Color on Top.

Method 5:

New in Excel 2007 - Use Remove Duplicates icon

Warning!

This method is highly destructive! Make a copy of your dataset before you do this!

  1. Copy your range of data to a blank section of the worksheet
  2. Select a cell in your data set.
  3. From the Data ribbon, choose Remove Duplicates.
  4. The Remove Duplicates dialog will give you a list of columns. Choose the columns which should be considered. For example, if you needed to remove records where both the customer and invoice were identical, check the box for both fields.

    In this case, you are trying to get a unique list of customers, so choose only the Customer field.

  5. Click OK.

Excel will delete records from your dataset. It will report that n duplicates were removed and nn records remain.

As you can see, there are many methods for dealing with duplicates. Excel 2007 adds two new tools to your arsenal.

In my experience, auditors are frequently trying to locate duplicates to detect if a report has been overstated. When I wrote Excel for Auditors, I cover both Excel 2003 and Excel 2007.