Finding Duplicates and Storing Results

amerifax-heather

New Member
Joined
May 29, 2011
Messages
3
I am going to try my best to explain my situation. I am using Excel 2007. I have an extremely large data file (1,010,165 records to be exact). It is a combination of 2 different data files (2010 and 2005) with the same exact structure. Between the 2 files I need to know if I have any duplicates in 2 different scenarios, Columns A-F and Columns B-E. I want to keep the record from 2010 and mark it as MST and delete the record from 2005 (eventually), before it is deleted I need to have a record of that duplicate in the MST record as either 2005(A-F) or 2005(B-E) in a field called "ORG".

Columns A-F contain - Name, Addr, City, St, Zip, Phn
Columns B-E contain - Addr, City, St, Zip, Phn

What I need to determine is this...Each of the records have an identifier in a Field called "YR-Q" which contains either 2010 or 2005. I need to search the records and find out where there is a duplicate using columns A-F. When a duplicate is found, I need to mark the 1st rec (if YR-Q=2010 as MST or if YR-Q=2005 as DUP). I also need to record the result in the MST record as "2005(A-F)" in a field called "ORG". And then run the same scenario using columns B-E.

The other thing I am going to need to know is if the Address information (Columns B-E) is unique (not duplicated) where YR-Q=2005. If this is the case I would like to put the value "2005(ADDR)" in the "ORG" field.

I should also tell you that before the files are joined together each individual file is checked for duplicates within the file and all duplicates removed. So the only occurrence of a duplicate will occur between the 2010 file and the 2005 file.

I hope I've explained myself well enough. Thank you in advance for any help or guidance you can give.

Heather
 
Last edited:

Excel Facts

Formula for Yesterday
Name Manager, New Name. Yesterday =TODAY()-1. OK. Then, use =YESTERDAY in any cell. Tomorrow could be =TODAY()+1.
Hi Heather,
I'm going to assume here that you want to solve this question with formula only in Excel? (As in: with macros other solutions would be possible, Access would be an other option.)
What I would do:
-Create for each line one cell with a unique identifier, which in your case would be Column B-E, so the formula for that identifier column would be "=B1&C1&D1&E1" (for the first row). This is also assuming that all your addresses are formatted the same (capitals, spaces, etc. If e.g. one address is "Main street" and another "Mainstr." and another "Mainstreet", those are 3 different names and really hard to match...)
-Okay, now each address has a unique identifier (let's call it ID). With your amount of addresses I would copy the formula down to your last datarow, let Excel calculate and then copy-paste that new column as values.
-Next, in an extra column you can count the number of times an ID is in the list: =COUNTIF(F$2:F$6;F2) (change the ranges for your sheet). Copy paste that formula down your list, let excel calculate and copy-paste as values.
-Then, you know for each line whether it is from 2005 or 2010 and how many times the address is in the list. So 2005 and 1 time in the list means it's only in 2005, not in 2010. A 2 and 2010 means that it's both in 2005 and 2010... So one extra formula for each line should get you your result: =IF(F1=2005;IF(G1=1;"ONLY 2005";"2005 and 2010");IF(G1=1;"ONLY 2010";"2010 and 2005")) (change ranges and names in the formula).

Would that do the trick?

Cheers,

Koen
 
Upvote 0

Forum statistics

Threads
1,224,566
Messages
6,179,553
Members
452,928
Latest member
101blockchains

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top