MrExcel Publishing
Your One Stop for Excel Tips & Solutions

I would like to compare 2 customer list and find any duplicates


Posted by Jim Reesor on December 31, 2000 4:10 PM

I have 2 customer lists consiting of 5 columns (Name, Address, City, State & Zip). As you know, people have different ways of entering data. Some will enter the Name as JD Edwards or J D Edwards or J.D. Edwards. Having the data like this won't let me do a vlookup. Is there a way to look at the characters and find one or two strings that appear to be the same and output DUP if it's considered a possible match?


Posted by Ed on December 31, 2000 5:31 PM


When using VLOOKUP change the Range_Look up Option from False to True, this will cause an approximate match to be returned.

=VLOOKUP(A1,E:G,3,TRUE)

Posted by Aladin Akyurek on December 31, 2000 7:12 PM

Would you care to answer following questions:

A) Do Address+Zip together uniquely identify the customers? (That is, there are never 2 different customers who have the same Address and the same Zip.)

B) Am I understanding you correctly that you have 2 customer lists which you want to compare in order to establish whether "the same customer" appears in both lists?

C) Do these 2 lists contain Name, Address, City, State, and Zip info?

Aladin

Posted by Dave on December 31, 2000 11:10 PM

Hi Jim

Assuming you names are in Column A and State & Zip codes are in Column D.

Paste list 2 underneath list 2.
Sort the list by "Names" i.e Column A
Place this rather lengthy formula in cell E1:

=IF(EXACT(A1,A2),"A Match",IF(AND(RIGHT(A1,4)=RIGHT(A2,4),D1=D2),"B Match",IF(AND(RIGHT(A1,3)=RIGHT(A2,3),D1=D2),"C Match",IF(AND(RIGHT(A1,2)=RIGHT(A2,2),D1=D2),"D Match",IF(AND(LEFT(A1,1)=LEFT(A2,1),D1=D2),"E Match")))))

Copy it down are far as needed.
Highlight the entire of Column E and Copy, then PasteSpecial as Values over the top of itself.

Now sort the entire list including Column E by Column E.

You can now use AutoFilters on Column E to show only "A Match". Then with you entire list highlighted Push F5 Then click Special/Visible cells only. The Edit>Clear>All.

Now sort again by column E to remove all blank rows.

You could then use Data>Autofilter>Custom on the other Matches to remove the less obvious.

Hope that helps

Dave


  • OzGrid Business Applications

Posted by Jim Reesor on January 01, 2001 1:05 PM


Follow up
A: There could be duplicate address for different customers. (ex- Large office building has many suites).Same for zip codes. Suites not always a part of the address.

B: Yes, I have 2 different list, one from my data base and the other is a Purchased data list. I want to remove any of my current customers from the Purchased list. I would be using the purchased list for leads for my sales department. I don't want to look foolish by calling or mailing special promo to my current customers.

C: My data base has Name, Address, City, State, Zip and other customer information. Normally the purhcased list would contain the same basic info along with phone number, contact & bussiness size.

I could use a vlookup or match but it would only work on exact dups, not with almost matches.

Posted by Aladin Akyurek on January 01, 2001 2:36 PM

Jim,

This is a tough problem, specifically because there is no certainty about uniquely identifying a customer. You might be able to refine the following scheme to help you out.

I'll asume a current customers list, the one from your database, occupying a range
in columns A thru E where the first contains the labels Name, Address, City, State, and Zip. Similarly, a Purchased list occupying a range in columns G thru K.

Enter the following array formula (remember to hit control+shift+enter)

L2 =IF(ISNA(MATCH(H2&K2,$B$2:$B$12&$E$2:$E$12,0)),"",INDEX($A$2:$A$12,MATCH(H2&K2,$B$2:$B$12&$E$2:$E$12,0))) [ copy down as far as needed ]


M2 =IF(LEN(L2)=0,G2&" is a new customer",IF(AND(LEN(L2)>0,L2<>G2),"DUP: "&G2&" is a current customer",G2&" is a current customer")) [ copy down as needed ]

You may use more criteria (I used Address & Zip above) in the MATCH-part to decrease the likelihood of seeing two different customers as one. If your lists are not too big, you may just inspect the results in M and get away with it.

Given the following customers list

name, address, city, state, zip
jd edwards, a1, Detroit, MI, 44235
A. AKY, a2, Den Haag, NL, 2582
Zox JD, a3, Den Haag, NL, 2657

and a purchased list

name, address, city, state, zip
j d edwards, a1, Detroit, MI, 44235
A AKY, a2, Den Haag, NL, 2582
Zox JD, a3, Den Haag, NL, 2657
D. Gohan, a4, New York, NY, 12345

we get:

jd edwards, DUP: j d edwards is a current customer
A. AKY, DUP: A AKY is a current customer
Zox JD, Zox JD is a current customer
<blank>,D. Gohan is a new customer

Hope this helps.

Aladin