How do I look for "incorrect duplicates"?

kmprice710

Board Regular
Joined
Jan 8, 2014
Messages
87
Office Version
  1. 2019
Platform
  1. Windows
I have a list of sales. Each customer has an assigned Customer ID. Each row has the Customer ID. But I have found out that the same Customer ID has been incorrectly assigned to more than one customer.

How do I look for duplicate Customer IDs that are incorrectly assigned on a sheet in which some "correct" duplicates appear?

DateCustomer NameAddressCustomer_ID
1/1/2001Jim Jones2565 Gross PointJONJI01
1/2/2001Jim Johnson1024 AveryJONJI02
1/3/2001Jim JamesP.O. Box 32JAMJI01
1/4/2001Jim Mason2754 CentralMASJI01
1/5/2001Jim Newton1520 CosgroveNEWJI01
1/6/2001Jim Newsome502 BoystonNEWJI01
1/7/2001Jim Newsome502 BoystonNEWJI01
1/7/2001Jim SmithPO Box 510SMIJI01
1/8/2001Jim Smith
902 BirchSMIJI02
1/9/2001Jim Smith3254 N. Lake ShoreSMIJI01
1/10/2001Jim Jones2565 Gross PointJONJI01
1/11/2001Jim Jones4891 4th St.JONJI01
1/12/2001Jim Jones2565 Gross PointJONJI01

<tbody>
</tbody>

For example, in this dummy example, SMIJI01 has been assigned to multiple customers --which is incorrect --but Jim Smith on 2565 Gross Point, the one who was first assigned SMIJI01, CORRECTLY appears three times on this sheet. The ID are supposed to be assigned based on the first 3 letters of the customer's last name and first two letters of the first name and then based on the address. If the ID is taken, a number 02, 03 ... etc. should be assigned.

How do I write a conditional statement that will look at the customer name and the address fields and find the "incorrect" duplicates only?
 
Last edited:

Excel Facts

Will the fill handle fill 1, 2, 3?
Yes! Type 1 in a cell. Hold down Ctrl while you drag the fill handle.
Hi, do you want just "incorrect" or would you like as well the corrected code?
Given in A1:
DateCustomer NameAddressCustomer_IDCheck
1/1/01Jim Jones2565 Gross PointJONJI01
1/2/01Jim Johnson1024 AveryJONJI02incorrect JOHJI
1/3/01Jim JamesP.O. Box 32JAMJI01
1/4/01Jim Mason2754 CentralMASJI01
1/5/01Jim Newton1520 CosgroveNEWJI01
1/6/01Jim Newsome502 BoystonNEWJI01
1/7/01Jim Newsome502 BoystonNEWJI01
1/7/01Jim SmithPO Box 510SMIJI01
1/8/01Jim Smith902 BirchSMIJI02
1/9/01Jim Smith3254 N. Lake ShoreSMIJI01
1/10/01Jim Jones2565 Gross PointJONJI01
1/11/01Jim Jones4891 4th St.JONJI01
1/12/01Jim Jones2565 Gross PointJONJI01

<tbody>
</tbody>

Formula in E2 could be either =IF(LEFT(D2,5)=UPPER(LEFT(TRIM(MID(SUBSTITUTE($B2," ",REPT(" ",999)),COLUMN(B2)*999-998,999)),3)&LEFT(TRIM(MID(SUBSTITUTE($B2," ",REPT(" ",999)),COLUMN(A2)*999-998,999)),2)),"","incorrect")

or formula could be
=IF(LEFT(D2,5)=UPPER(LEFT(TRIM(MID(SUBSTITUTE($B2," ",REPT(" ",999)),COLUMN(B2)*999-998,999)),3)&LEFT(TRIM(MID(SUBSTITUTE($B2," ",REPT(" ",999)),COLUMN(A2)*999-998,999)),2)),"","incorrect"& UPPER(LEFT(TRIM(MID(SUBSTITUTE($B2," ",REPT(" ",999)),COLUMN(B2)*999-998,999)),3)&LEFT(TRIM(MID(SUBSTITUTE($B2," ",REPT(" ",999)),COLUMN(A2)*999-998,999)),2)))

would that be close to what you needed?
 
Upvote 0
What about the last 3 entries for Jim Jones, shouldn't the cus_ID be JONJI01, JONJI02 and JONJI03?
if yes then the data would be as follows:
DateCustomer NameAddressCustomer_IDCheck
1/1/01Jim Jones2565 Gross PointJONJI01
1/2/01Jim Johnson1024 AveryJONJI02incorrect
1/3/01Jim JamesP.O. Box 32JAMJI01
1/4/01Jim Mason2754 CentralMASJI01
1/5/01Jim Newton1520 CosgroveNEWJI01
1/6/01Jim Newsome502 BoystonNEWJI01
1/7/01Jim Newsome502 BoystonNEWJI01incorrect
1/7/01Jim SmithPO Box 510SMIJI01
1/8/01Jim Smith902 BirchSMIJI02
1/9/01Jim Smith3254 N. Lake ShoreSMIJI01incorrect
1/10/01Jim Jones2565 Gross PointJONJI01incorrect
1/11/01Jim Jones4891 4th St.JONJI01incorrect
1/12/01Jim Jones2565 Gross PointJONJI01incorrect

<tbody>
</tbody>
and the formula in E2 changed to =IF(D2=UPPER(LEFT(TRIM(MID(SUBSTITUTE($B2," ",REPT(" ",999)),COLUMN(B2)*999-998,999)),3)&LEFT(TRIM(MID(SUBSTITUTE($B2," ",REPT(" ",999)),COLUMN(A2)*999-998,999)),2))&TEXT(COUNTIF($B$2:B2,B2),"00"),"","incorrect")

Note data must be sorted per date and per customer name. Thanks to clarify.
 
Last edited:
Upvote 0
What about the last 3 entries for Jim Jones, shouldn't the cus_ID be JONJI01, JONJI02 and JONJI03?
if yes then the data would be as follows:
DateCustomer NameAddressCustomer_IDCheck
1/1/01Jim Jones2565 Gross PointJONJI01
1/2/01Jim Johnson1024 AveryJONJI02incorrect
1/3/01Jim JamesP.O. Box 32JAMJI01
1/4/01Jim Mason2754 CentralMASJI01
1/5/01Jim Newton1520 CosgroveNEWJI01
1/6/01Jim Newsome502 BoystonNEWJI01
1/7/01Jim Newsome502 BoystonNEWJI01incorrect
1/7/01Jim SmithPO Box 510SMIJI01
1/8/01Jim Smith902 BirchSMIJI02
1/9/01Jim Smith3254 N. Lake ShoreSMIJI01incorrect
1/10/01Jim Jones2565 Gross PointJONJI01incorrect
1/11/01Jim Jones4891 4th St.JONJI01incorrect
1/12/01Jim Jones2565 Gross PointJONJI01incorrect

<tbody>
</tbody>
and the formula in E2 changed to =IF(D2=UPPER(LEFT(TRIM(MID(SUBSTITUTE($B2," ",REPT(" ",999)),COLUMN(B2)*999-998,999)),3)&LEFT(TRIM(MID(SUBSTITUTE($B2," ",REPT(" ",999)),COLUMN(A2)*999-998,999)),2))&TEXT(COUNTIF($B$2:B2,B2),"00"),"","incorrect")

Note data must be sorted per date and per customer name. Thanks to clarify.


Ah, but the bottom 2 "2565 Gross Point" Jim Joneses are NOT incorrect. It is the same guy, so the customer is CORRECTLY duplicated. I just need a check that says "Oh you already have a JONJI01 assigned to another guy" for the Jim Jones on 4891 4th St., for example.

Does this make sense?
 
Upvote 0
1-Row 2: Jim Johnson is incorrect as it should be JOHJI01 instead of JONJI02 right?
2-Row 8: Jim Newsome is incorrect as it should be NEWJI02 instead of NEWJI01 right?
3-Row 10: Jim Smith is incorrect as it should be SMIJI01 instead of SMIJI02, the address is different, it could be a different person right?
4-Row 12: Jim Jones is incorrect as it should be JONJI02 instead of JONJI01 right?
and lastly 5- Row 14: Jim Jones is incorrect as it should be JONJI03 instead of JONJI01 right?

Or is the address inconsequential.
In which case how do you know if a client is correct or not. Thanks for the clarification.
 
Upvote 0
1-Row 2: Jim Johnson is incorrect as it should be JOHJI01 instead of JONJI02 right?
2-Row 8: Jim Newsome is incorrect as it should be NEWJI02 instead of NEWJI01 right?
3-Row 10: Jim Smith is incorrect as it should be SMIJI01 instead of SMIJI02, the address is different, it could be a different person right?
4-Row 12: Jim Jones is incorrect as it should be JONJI02 instead of JONJI01 right?
and lastly 5- Row 14: Jim Jones is incorrect as it should be JONJI03 instead of JONJI01 right?

Or is the address inconsequential.
In which case how do you know if a client is correct or not. Thanks for the clarification.

1. Yes, you are right
2. Right.
3. See, this is the problem. Row 10 should be SMIJI03
4. Right.
5. No, row 1, 11 & 13 are the same guy and should have the same ID (exact same name and same address = same person).

You can see why this is so complicated, because of the multiple rows of the same person.
 
Upvote 0
why would the second Jim Newsome be NEWJI02 while all three Jim Jones remain JONJI01 You said 2. right and 5. wrong, this is contradictory.

same for Jim smith, 3 occurrences but 3 different addresses.
Sorry to bug you, just trying to identify a clear pattern.
 
Upvote 0
why would the second Jim Newsome be NEWJI02 while all three Jim Jones remain JONJI01 You said 2. right and 5. wrong, this is contradictory.

same for Jim smith, 3 occurrences but 3 different addresses.
Sorry to bug you, just trying to identify a clear pattern.

Okay... Jim Newton gets NEWJI01 so when Jim Newsome comes along, he should be NEWJI02. When he comes along again in the next row, you're right, he's the same guy, but remember, I made him NEWJI02 so as not to conflict with Jim Newton, NEWJI01.

Make sense? (Yeah, this is very confusing, I know, but it's the way the database is set up.)
 
Upvote 0
Please confirm last column:
DateCustomer NameAddressCust_IDcorrected
1/1/01Jim Jones2565 Gross PointJONJI01JONJI1
1/2/01Jim Johnson1024 AveryJONJI02JOHJI1
1/3/01Jim JamesP.O. Box 32JAMJI01JAMJI1
1/4/01Jim Mason2754 CentralMASJI01MASJI1
1/5/01Jim Newton1520 CosgroveNEWJI01NEWJI1
1/6/01Jim Newsome502 BoystonNEWJI01NEWJI2
1/7/01Jim Newsome502 BoystonNEWJI01NEWJI2
1/7/01Jim SmithPO Box 510SMIJI01SMIJI1
1/8/01Jim Smith902 BirchSMIJI02SMIJI2
1/9/01Jim Smith3254 N. Lake ShoreSMIJI01SMIJI3
1/10/01Jim Jones2565 Gross PointJONJI01JONJI1
1/11/01Jim Jones4891 4th St.JONJI01JONJI2
1/12/01Jim Jones2565 Gross PointJONJI01JONJI1

<colgroup><col><col><col><col><col></colgroup><tbody>
</tbody>
 
Upvote 0

Forum statistics

Threads
1,214,791
Messages
6,121,611
Members
449,038
Latest member
apwr

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top