Duplicate entry help

Chrix5

New Member
Joined
Sep 8, 2002
Messages
8
Essentially, my dilemma is as follows: I have a "database" of roughly 30,000 rows and six columns of Data. Unfortunately, there are about 5,000 rows of duplicate entries. What I am trying to do "copy" *only* the duplicates from the first column and “paste” them into a separate excel file... and then sort that excel file according to criteria in the other columns. I can handle the later part of this, it's just exporting the duplicates, without erasing them, is over my head. Any help would be greatly appreciated; and, one final thing: I am a complete novice when it comes to Excel, so please try and be specific as to how I should go about accomplishing this. Just to clarify one last thing, the other 5 columns of data in the duplicates do not always match...I am looking to take out the entire row based only on the data in the first column being the same as other rows. Thanks,
--Chris
This message was edited by Chrix5 on 2002-09-09 17:32
 

Excel Facts

Can Excel fill bagel flavors?
You can teach Excel a new custom list. Type the list in cells, File, Options, Advanced, Edit Custom Lists, Import, OK

PaddyD

MrExcel MVP
Joined
May 1, 2002
Messages
14,234
On 2002-09-09 17:29, Chrix5 wrote:
Essentially, my dilemma is as follows: I have a "database" of roughly 30,000 rows and six columns of Data. Unfortunately, there are about 5,000 rows of duplicate entries. What I am trying to do "copy" *only* the duplicates from the first column and “paste” them into a separate excel file... and then sort that excel file according to criteria in the other columns. I can handle the later part of this, it's just exporting the duplicates, without erasing them, is over my head. Any help would be greatly appreciated; and, one final thing: I am a complete novice when it comes to Excel, so please try and be specific as to how I should go about accomplishing this. Just to clarify one last thing, the other 5 columns of data in the duplicates do not always match...I am looking to take out the entire row based only on the data in the first column being the same as other rows. Thanks,
--Chris

Hi - welcome to the board.

One way (I assume your 'first column' to be column A and that there are column headings in the first row):

1) sort your data by column A
2) in a spare column, put a suitable lable in row 1 (e.g. "Dups"), put this formula in row 2:

=if(a2<>a1,"","Duplicate")

3) Copy formula down to the nd of your data.
4) Got to data | filter | autofilter
5) Filter for records in the new column with the text "Duplicate"
6) Do you copy paste stuff on the filtered records.

Does that do it?
Paddy
 

Chrix5

New Member
Joined
Sep 8, 2002
Messages
8
This worked Great! Thanks for the help. The only other question I have is: how would I go about adding columns (in addition to column A) to find duplicates... in say colums A, B, C, and D. Thanks,
--Chris
 

PaddyD

MrExcel MVP
Joined
May 1, 2002
Messages
14,234
could you clarify your situation. In your first post you said "...the other 5 columns of data in the duplicates do not always match". What are we looking for here? Post back with a few examples...

Paddy
 

Chrix5

New Member
Joined
Sep 8, 2002
Messages
8

ADVERTISEMENT

All right, I’ll try my best here... this is the *exact* situation (you'll have to excuse my lack of exact details yesterday; I was posting from home, not from work). I have 9 columns of data as follows:

Name 1;
Address 1;
Address 2;
City;
State;
ZIP;
Name 2;
E-mail address;
Rep;

I basically need to find all the duplicates from the first *6* columns. The duplicates will have the exact same data in *all* six columns. However, the duplicates *may* be different in the final column (#9), as well as the two that precede it. One last thing to add a little more chaos, I was recently informed that if there are duplicates, all of them must be placed into a new excel file (including the original). For example, if there turns out to be 4 of the *exact* same entry (based on the first 6 columns of data), then all 4 of them must be extracted and placed into a separate file.

Also, in case I haven't said this already, I appreciate your help more than you could ever imagine... my boss originally told me that this would need to be done by hand (can you imagine how much fun that would be?!) Once again, thanks,
--Chris
 

PaddyD

MrExcel MVP
Joined
May 1, 2002
Messages
14,234
How about this - it's a bit 'brute force' but it should work:

1) Concatenate the six colmns in question into a single value
2) Count how many times each value turns up
3) Filter for records with a count > 1

See the example:
Book7
ABCDEFGHI
1abcdefabcdefabcdef2
2ghIjklghIjklghIjkl1
3mnopqrmnopqrmnopqr2
4stuvwxstuvwxstuvwx1
5abcdefabcdefabcdef2
6mnopqrmnopqrmnopqr2
7
Sheet1


Notes:

1) MCONCAT is a function from the morefunc add in (search this site for the download address). Used here to demonstrate how you can concatenate multiple cells without having to do a lot of typing.

2) The success of this method depends on what you mean by 'exact'. For example, if one record has "smith" and another "smith " (note the extra space), it wont work. You could deal with this issue using trim().

3) Depending on how the data has been generated (manual entry?), there are likely to be cases that you'd consider an exact match that we miss here (see (2), e.g.). Getting a complete solution may take a couple more refinements as the 'exceptions' are identified.

Post back with results...

Paddy
 

Chrix5

New Member
Joined
Sep 8, 2002
Messages
8

ADVERTISEMENT

Any way to dumb this down a little?
 

PaddyD

MrExcel MVP
Joined
May 1, 2002
Messages
14,234
step-by-step:

Assumption -
1) the 6 columns of interest are col's A to F
2) there are column headings in row 1

(It does not matter whether the data's been sorted).


1) In a spare column, concatenate the values of the 6 columns into a single value - cell G2 in the example below:

=CONCATENATE(A2,B2,C2,D2,E2,F2)

Give the new column a heading - E.g. "concat"

2) Copy the formula down to the end of your data
3) In the next column, put in a formula that counts how many times each of the concatenated values turn up:

=COUNTIF($H$2:$H$6,H2)

change $h:$6 here for the last row of your data

give the column a heading - e.g. "count"

4) Go to data | filter | autofilter to set up a filter for your columns

5) Select the filter for the new Count column - select the custom... option.

6) Select 'is greater than' 1 as the criteria.

This will leave you with all the recordss that have more than one occurance on the list. You can then copy them elsewhere, delete them etc.

Paddy
This message was edited by PaddyD on 2002-09-10 16:39
 

Chrix5

New Member
Joined
Sep 8, 2002
Messages
8
Hopefully this will be the last time I have to bother you, and, god willing, this problem should be an easy one:

Can you further explain the variables in the countif() formula? I will be placing the new column of data in colum J, if that helps... I guess i'm wondering what the $ signs are for?

Also, is there a way to expand the colum after using "concat"?

Thanks,
--chris
 

PaddyD

MrExcel MVP
Joined
May 1, 2002
Messages
14,234
Can you further explain the variables in the countif() formula?

The synatx for counif is as follows:

=countif(count_where,count_what)

where 'count_where' is the range of cells from which you want to count and 'count_what' is the item you want to count. e.g. if cells a1:a4 contained the values {1;1;2;1} and cell b1 contained the vaule we wanted to count - say 1, then both

=countif(a1:a4,b1)
and
=countif(a1:a4,1)

would return 3.

I guess i'm wondering what the $ signs are for?
Check out the excel help file for absolute and relative references. A1 is a relative reference. $A$1 is an absolute reference. A$1 and $A1 are mixed. You use the $ to anchor part or all of a cell reference. This is especially useful when copy formulas from one cell to another. With the data from the last example, put the formula

=countif(a1:a4,b1) in cell c1 & copy it down to cell c2. You will see that it has changed to =countif(a2:a5,b2). If you put =countif($A$1:$A$4,b1) in c1 & copy down to c2, you get =countif($A$1:$A$4,b2) - so we're stiull counting just the range a1:a4. Have a play about copying formulas with a mix of absolute & relative references - you'll soon get the hang of it.

Also, is there a way to expand the colum after using "concat"?

why do you want to - you've still got the original data in cols a to f. I'd just delete it once you've finished with it...

Paddy
 

Forum statistics

Threads
1,144,217
Messages
5,723,063
Members
422,477
Latest member
pete101

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top