"Fuzzy" VLOOKUP question

Shumande

New Member
Joined
Apr 14, 2002
Messages
7
Hi,

I have a list of five hundred companies on a spreadsheet called "Companies" and another spreadsheet called "Public" with all 7,000+ public companies and their associated ticker symbols. I'm trying to match on the company names of both lists (column A of both spreadsheets) and return the associated ticker symbol. However, the names don't match precisely. Some names have ", Inc." (with a comma and space) or ", Corp." at the end, while some don't have any suffix or have “Inc.” without a comma. To make matters more complicated, some have the word "Corporation" spelled out completely instead of the ", Corp." abbreviation. For example, on one list Microsoft is simply "Microsoft", while on the other list it's called "Microsoft Corporation".

Using “true” or 1 in the fourth parameter of VLOOKUP doesn’t help. The matches I get are close, but no cigar. Excel spits out a match but not the right one.

Any ideas on how to solve this without resorting to brute force, manual lookups?

Thanks
 

Excel Facts

Create a Pivot Table on a Map
If your data has zip codes, postal codes, or city names, select the data and use Insert, 3D Map. (Found to right of chart icons).
Hi,

Clean your lists. Copy and then combine them, sort by ticker symbol and choose one asset description per ticker.

Then, do a vlookup by ticker on the original lists. When you are satisfied, copy and pastespecial (values) on the original columns and you should be OK.

Possibly a pivot table can be used.

If not, you will be using Vlookup or Index/Match with something like "LEFT(cell,9)="Microsoft" which will become a big mess very quickly.

Take the time to clean the data and you will be much happier in the long run, I believe.

Bye,
Jay
 
Upvote 0
I recall use of soundex feature in such cases ... but I am fuzzy on this at the moment. Perhaps some Excel Gurus will shed some light on this!
 
Upvote 0
On 2002-04-22 18:16, Yogi Anand wrote:
I recall use of soundex feature in such cases ... but I am fuzzy on this at the moment. Perhaps some Excel Gurus will shed some light on this!

Hi Yogi,

Nice suggestion. I just searched google.com and found the following:

http://www.myatt.demon.co.uk/soundex.htm

Has the soundex application in Access VBA

Also, Harlan Grove posted the following

-----------------------------------------
Subject: Re: 'fuzzy' string comparison?
Date: Wed, 21 Jun 2000 12:36:05 -0700
From: Harlan Grove<hrlngrvNOhrSPAM@aol.com.invalid>
Newsgroups: comp.apps.spreadsheets

I had to try. This gives a score. Maybe too deterministic, and definitely too dumb with respect to common abbreviations.

Function StrSimilar(s1 As String, s2 As String) As Double
Dim i As Long, j As Long, k As Long, n(2) As Long
Dim c1 As String, c2 As String
Const alphanum As String = "123456789abcdefghijklmnopqrstuvwxyz"

s1 = LCase(s1)
For i = 1 To Len(s1)
If Not InStr(alphanum, Mid(s1, i, 1)) > 0 Then Mid(s1, i, 1) = " "
Next i
s1 = Application.WorksheetFunction.Trim(s1)

s2 = LCase(s2)
For j = 1 To Len(s2)
If Not InStr(alphanum, Mid(s2, j, 1)) > 0 Then Mid(s2, j, 1) = " "
Next j
s2 = Application.WorksheetFunction.Trim(s2)

j = 1
n(1) = 0
For i = 1 To Len(s1)
c1 = LCase(Mid(s1, i, 1))

k = 0
Do
c2 = LCase(Mid(s2, j + k, 1))
k = k + 1
Loop Until j + k > Len(s2) Or c1 = c2

If c1 = c2 Then
n(1) = n(1) + 1
If j< Len(s2) Then j = j + 1 Else Exit For
End If
Next i

i = 1
n(2) = 0
For j = 1 To Len(s2)
c2 = LCase(Mid(s2, j, 1))

k = 0
Do
c1 = LCase(Mid(s1, i + k, 1))
k = k + 1
Loop Until i + k > Len(s1) Or c1 = c2

If c1 = c2 Then
n(2) = n(2) + 1
If i< Len(s1) Then i = i + 1 Else Exit For
End If
Next j

StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) / _
CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2)))
End Function
-------------------------------------

Don't think this will particularly work for the OP, but it may be worth a shot.

Bye,
Jay
This message was edited by Jay Petrulis on 2002-04-22 18:29
 
Upvote 0
Oh my goodness Jay:
You are on top of things buddy! ... looks like you have got things on your finger tips.

Regards!

Yogi Anand
 
Upvote 0
Hi Yogi,

No credit to me. I have looked at a lot of Harlan's posts in great detail. He is awesome.

He is the only one I know that can hang with the elite of the computer scientists, the engineers, the finance gurus and the statisticians. I have no idea about his background but he knows his stuff. (He also fights with everybody, but nobody's perfect!) :biggrin:

Bye,
Jay
 
Upvote 0
On 2002-04-22 16:37, Shumande wrote:
Hi,

I have a list of five hundred companies on a spreadsheet called "Companies" and another spreadsheet called "Public" with all 7,000+ public companies and their associated ticker symbols. I'm trying to match on the company names of both lists (column A of both spreadsheets) and return the associated ticker symbol. However, the names don't match precisely. Some names have ", Inc." (with a comma and space) or ", Corp." at the end, while some don't have any suffix or have “Inc.” without a comma. To make matters more complicated, some have the word "Corporation" spelled out completely instead of the ", Corp." abbreviation. For example, on one list Microsoft is simply "Microsoft", while on the other list it's called "Microsoft Corporation".

Using “true” or 1 in the fourth parameter of VLOOKUP doesn’t help. The matches I get are close, but no cigar. Excel spits out a match but not the right one.

Any ideas on how to solve this without resorting to brute force, manual lookups?

Thanks

Also have a look at:

http://www.mrexcel.com/wwwboard/messages/8546.html

for a possible approach.
 
Upvote 0

Forum statistics

Threads
1,214,641
Messages
6,120,693
Members
448,979
Latest member
DET4492

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top