Approximate vlookup


Posted by Joel Horowitz on December 04, 2001 1:22 PM

Hye,

I am currently trying to match two databases of company names, but the problem is that their names do not always appear exactly the same.

Who has a suggestion on how to tackle the problem of 'approximate vlookup'? I mean looking names in a database when they're not written exactly the same way. I'm aware of the TRUE attribute for vlookup, but that doesn't really solve the problem.

Let me define properly what I would like to do: I want to give as input a name + a list, and get as output the first entry in the list that contains that name, or alternatively, the entry that has the biggest chunk of letter in common (or any other criteria that would maximize the chance that the match is the correct one)

Thanks

Posted by Duane Kennerson on December 04, 2001 1:25 PM

see the "Unlock Mr. Excel Challenge" banner at the top of the page. That may help your situation. Its a similar problem with some solutions.:

Hye,

Posted by IML on December 04, 2001 2:07 PM

formatting?

If one list has
Public, John Q
and another is
John Public
you may want to consider a formula that will put the names in a similar format. Is this is what is going in, please describe the format the name is currently in, and what format you would like to get it to.


Posted by Joel Horowitz on December 05, 2001 10:55 AM

Re: formatting?


Well thanks for the advise, but the names are companies, and the typical things I have are like this (they are often in French)

'Procter & Gamble' vs. 'Procter and Gamble'
'Unilever Europe' vs. 'Unilever S.A.'
'Banque Degroof' vs. 'Degroof'

So I cannot speak about a format...

Thanks



Posted by Aladin Akyurek on December 06, 2001 10:48 AM

Joel --

What follows might be of some help.

Lets say that A1:A4 houses the 1st list, which is:

{"Procter and Gamble";
"Unilever S.A.";
"Degroof";
"Excellence"}

and C1:C4 the 2nd list, which is:

{"Procter & Gamble";
"Unilever Europe";
"Banque Degroof";
"Excellence"}

Note 1. The lists need not be in the same sheet or of the same length.

In D1 enter: =IF(LEN(G1),IF(SUMPRODUCT((ISNUMBER(SEARCH(G1,$A$1:$A$4)))+0)>0,INDEX($A$1:$A$4,SUMPRODUCT((ISNUMBER(SEARCH(G1,$A$1:$A$4)))*(ROW($A$1:$A$4))))),F1)

In E1 enter: =IF(LEN(F1),F1,IF(SUMPRODUCT((ISNUMBER(SEARCH(LEFT(C1,SEARCH(" ",C1)-1),$A$1:$A$4)))+0)>0,INDEX($A$1:$A$4,SUMPRODUCT((ISNUMBER(SEARCH(LEFT(C1,SEARCH(" ",C1)-1),$A$1:$A$4)))*(ROW($A$1:$A$4)))),""))

In F1 enter: =IF(COUNTIF($A$1:$A$4,C1),C1,"")

In G1 enter: =IF(LEN(E1),"",RIGHT(C1,LEN(C1)-SEARCH("@",SUBSTITUTE(C1," ","@",LEN(C1)-LEN(SUBSTITUTE(C1," ",""))))))

Select D1:G1 and copy down as far as needed.

I've put the result list expressly next to the second list. What this system of formulas does is: (A) Determine if an item in the 2nd list is also in the 1st list-if so, return that item; (B) If A fails, determine if the first word (delimited by a space) of an item in the 2nd list occurs somewhere in the 1st list-if so, return the item from the 1st list that contains the target word; (C) If B fails, determine if the last word of an item in the 2nd list occurs somewhere in the 1st list-if so, return the item from the 1st list that contains the last target word.

Caveat. If there are more than 1 item that contains the target word, the formula will spawn erroneous results. So becareful.

Note 2. If you like to have a copy of the workbook containing the above, just drop me a line.


Aladin

===========