How to find the most similar Word to a word? (Using Excel or Vba)

makiwara

Board Regular
Joined
Mar 8, 2018
Messages
171
INPUT:

List1 BASIS (FIX)

all
allow
dog
dumm
elephant
element


List2

alarm
allowance
dogtag
donut
duck
elemental


OUTPUT

List1 BASIS (FIX) List2

all allowance
allow allowance
dog dogtag
dumm duck
elephant elemental
element elemental

I tried to use the FUZZY LOOKUP, but it doesn't return the right values I expect.

Thank you for your help, have a very nice day! :)
 
OK, here's a minor tweak to the code:

VBA Code:
Public Function LevDist(str1 As String, str2 As String, Optional type1 = 0)
Dim d() As Long, m As Long, n As Long, i As Long, j As Long, Cost As Long

    m = Len(str1)
    n = Len(str2)
    ReDim d(0 To m, 0 To n)
   
    For i = 1 To m
        d(i, 0) = i
    Next i
   
    For j = 1 To n
        d(0, j) = j
    Next j
   
    For j = 1 To n
        For i = 1 To m
            Cost = IIf(Mid(str1, i, 1) = Mid(str2, j, 1), 0, 1)
            d(i, j) = WorksheetFunction.Min(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + Cost)
        Next i
    Next j
   
    LevDist = d(m, n)
    If type1 > 0 Then LevDist = 1 - (LevDist / WorksheetFunction.Max(m, n))
   
End Function

I added an optional code to the parameters to return the percent match instead of the distance. Add a 1 as a third parameter like so:

Book1
IJ
1
2bananas86%
3banana
4
5John Doe63%
6Jane Doe
7
8Jim Smith57%
9James A. Smith
Sheet15
Cell Formulas
RangeFormula
J2,J8,J5J2=levdist(I2,I3,1)




However, here are a few thoughts. I don't answer a lot of fuzzy matching questions anymore, it's just too hard to do well. Too many false positives or missed positives. I actually do some of this in my fulltime job, and the users are constantly asking me to make it better, but there's very little room for improvement. The Jim Smith example is especially telling. A human would instantly recognize that as a 99% likely match, but this algorithm only 57%. There are much better algorithms. Google "java fuzzy string matching" and you can find a java (or Python) tool that figures out a percentage match based on similarities other than just the Levenshtein distance. Sadly, I can't help much with that though.

But maybe this might work for your situation. I hope so! Good luck!
Many Thanks Eric!! I think you're right might be more options with Python and Java. R also has some options. But for the PH agency, it will have to be in excel. And this already gets us bounds ahead of the exact comparisons that were being used. Many thanks!!! :)
 
Upvote 0

Excel Facts

Best way to learn Power Query?
Read M is for (Data) Monkey book by Ken Puls and Miguel Escobar. It is the complete guide to Power Query.
Everytime i run this, I get an error, ambiguous name detected: GetClosest

ANy edits I should be doing here?
 
Upvote 0
Welcome to MrExcel!

If you're getting the "ambiguous name detected" error, that means you have 2 (or more) procedures with the same name. When you copied and pasted the code, did you maybe copy it twice by accident? Or copy code from multiple posts thinking they were different?
 
Upvote 0

Forum statistics

Threads
1,215,157
Messages
6,123,341
Members
449,097
Latest member
thnirmitha

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top