Error Checking in Excel
Thanks Thanks:  0
Likes Likes:  0
Results 1 to 8 of 8

Thread: "Fuzzy" VLOOKUP question

  1. #1
    New Member
    Join Date
    Apr 2002
    Posts
    7
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default

    Hi,

    I have a list of five hundred companies on a spreadsheet called "Companies" and another spreadsheet called "Public" with all 7,000+ public companies and their associated ticker symbols. I'm trying to match on the company names of both lists (column A of both spreadsheets) and return the associated ticker symbol. However, the names don't match precisely. Some names have ", Inc." (with a comma and space) or ", Corp." at the end, while some don't have any suffix or have “Inc.” without a comma. To make matters more complicated, some have the word "Corporation" spelled out completely instead of the ", Corp." abbreviation. For example, on one list Microsoft is simply "Microsoft", while on the other list it's called "Microsoft Corporation".

    Using “true” or 1 in the fourth parameter of VLOOKUP doesn’t help. The matches I get are close, but no cigar. Excel spits out a match but not the right one.

    Any ideas on how to solve this without resorting to brute force, manual lookups?

    Thanks

  2. #2
    MrExcel MVP Jay Petrulis's Avatar
    Join Date
    Mar 2002
    Location
    Chicago, IL USA
    Posts
    2,040
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default

    Hi,

    Clean your lists. Copy and then combine them, sort by ticker symbol and choose one asset description per ticker.

    Then, do a vlookup by ticker on the original lists. When you are satisfied, copy and pastespecial (values) on the original columns and you should be OK.

    Possibly a pivot table can be used.

    If not, you will be using Vlookup or Index/Match with something like "LEFT(cell,9)="Microsoft" which will become a big mess very quickly.

    Take the time to clean the data and you will be much happier in the long run, I believe.

    Bye,
    Jay


  3. #3
    MrExcel MVP
    Join Date
    Mar 2002
    Location
    Michigan USA
    Posts
    11,454
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default

    I recall use of soundex feature in such cases ... but I am fuzzy on this at the moment. Perhaps some Excel Gurus will shed some light on this!

  4. #4
    MrExcel MVP Jay Petrulis's Avatar
    Join Date
    Mar 2002
    Location
    Chicago, IL USA
    Posts
    2,040
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default

    On 2002-04-22 18:16, Yogi Anand wrote:
    I recall use of soundex feature in such cases ... but I am fuzzy on this at the moment. Perhaps some Excel Gurus will shed some light on this!
    Hi Yogi,

    Nice suggestion. I just searched google.com and found the following:

    http://www.myatt.demon.co.uk/soundex.htm

    Has the soundex application in Access VBA

    Also, Harlan Grove posted the following

    -----------------------------------------
    Subject: Re: 'fuzzy' string comparison?
    Date: Wed, 21 Jun 2000 12:36:05 -0700
    From: Harlan Grove
    Newsgroups: comp.apps.spreadsheets

    I had to try. This gives a score. Maybe too deterministic, and definitely too dumb with respect to common abbreviations.

    Function StrSimilar(s1 As String, s2 As String) As Double
    Dim i As Long, j As Long, k As Long, n(2) As Long
    Dim c1 As String, c2 As String
    Const alphanum As String = "123456789abcdefghijklmnopqrstuvwxyz"

    s1 = LCase(s1)
    For i = 1 To Len(s1)
    If Not InStr(alphanum, Mid(s1, i, 1)) > 0 Then Mid(s1, i, 1) = " "
    Next i
    s1 = Application.WorksheetFunction.Trim(s1)

    s2 = LCase(s2)
    For j = 1 To Len(s2)
    If Not InStr(alphanum, Mid(s2, j, 1)) > 0 Then Mid(s2, j, 1) = " "
    Next j
    s2 = Application.WorksheetFunction.Trim(s2)

    j = 1
    n(1) = 0
    For i = 1 To Len(s1)
    c1 = LCase(Mid(s1, i, 1))

    k = 0
    Do
    c2 = LCase(Mid(s2, j + k, 1))
    k = k + 1
    Loop Until j + k > Len(s2) Or c1 = c2

    If c1 = c2 Then
    n(1) = n(1) + 1
    If j < Len(s2) Then j = j + 1 Else Exit For
    End If
    Next i

    i = 1
    n(2) = 0
    For j = 1 To Len(s2)
    c2 = LCase(Mid(s2, j, 1))

    k = 0
    Do
    c1 = LCase(Mid(s1, i + k, 1))
    k = k + 1
    Loop Until i + k > Len(s1) Or c1 = c2

    If c1 = c2 Then
    n(2) = n(2) + 1
    If i < Len(s1) Then i = i + 1 Else Exit For
    End If
    Next j

    StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) / _
    CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2)))
    End Function
    -------------------------------------

    Don't think this will particularly work for the OP, but it may be worth a shot.

    Bye,
    Jay

    [ This Message was edited by: Jay Petrulis on 2002-04-22 18:29 ]

  5. #5
    MrExcel MVP
    Join Date
    Mar 2002
    Location
    Michigan USA
    Posts
    11,454
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default

    Oh my goodness Jay:
    You are on top of things buddy! ... looks like you have got things on your finger tips.

    Regards!

    Yogi Anand

  6. #6
    MrExcel MVP Jay Petrulis's Avatar
    Join Date
    Mar 2002
    Location
    Chicago, IL USA
    Posts
    2,040
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default

    Hi Yogi,

    No credit to me. I have looked at a lot of Harlan's posts in great detail. He is awesome.

    He is the only one I know that can hang with the elite of the computer scientists, the engineers, the finance gurus and the statisticians. I have no idea about his background but he knows his stuff. (He also fights with everybody, but nobody's perfect!)

    Bye,
    Jay

  7. #7
    New Member
    Join Date
    Apr 2002
    Posts
    7
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default

    Jay and Yogi, thanks so much for the ideas. I genuinely appreciate it.

  8. #8
    MrExcel MVP Aladin Akyurek's Avatar
    Join Date
    Feb 2002
    Location
    The Hague, NL
    Posts
    84,026
    Post Thanks / Like
    Mentioned
    37 Post(s)
    Tagged
    7 Thread(s)

    Default

    On 2002-04-22 16:37, Shumande wrote:
    Hi,

    I have a list of five hundred companies on a spreadsheet called "Companies" and another spreadsheet called "Public" with all 7,000+ public companies and their associated ticker symbols. I'm trying to match on the company names of both lists (column A of both spreadsheets) and return the associated ticker symbol. However, the names don't match precisely. Some names have ", Inc." (with a comma and space) or ", Corp." at the end, while some don't have any suffix or have “Inc.” without a comma. To make matters more complicated, some have the word "Corporation" spelled out completely instead of the ", Corp." abbreviation. For example, on one list Microsoft is simply "Microsoft", while on the other list it's called "Microsoft Corporation".

    Using “true” or 1 in the fourth parameter of VLOOKUP doesn’t help. The matches I get are close, but no cigar. Excel spits out a match but not the right one.

    Any ideas on how to solve this without resorting to brute force, manual lookups?

    Thanks
    Also have a look at:

    http://www.mrexcel.com/wwwboard/messages/8546.html

    for a possible approach.

Some videos you may like

User Tag List

Like this thread? Share it with others

Like this thread? Share it with others

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •