How to find the most similar Word to a word? (Using Excel or Vba)

makiwara

Board Regular
Joined
Mar 8, 2018
Messages
171
INPUT:

List1 BASIS (FIX)

all
allow
dog
dumm
elephant
element


List2

alarm
allowance
dogtag
donut
duck
elemental


OUTPUT

List1 BASIS (FIX) List2

all allowance
allow allowance
dog dogtag
dumm duck
elephant elemental
element elemental

I tried to use the FUZZY LOOKUP, but it doesn't return the right values I expect.

Thank you for your help, have a very nice day! :)
 

Excel Facts

Fastest way to copy a worksheet?
Hold down the Ctrl key while dragging tab for Sheet1 to the right. Excel will make a copy of the worksheet.
Please explicitly detail the way to identify the right values. (Then it will be possible to work on a formula or code to achieve that.)
 
Upvote 0
Dear Fazza and Niteshnitesh!

I tried that, but unfortunately it doesn't work correctly. Could you help me plese?
I inserted two random wordlists to represent the problem exactly, so maybe with word lists its easier to understand what the problem is:
https://imgur.com/k1Opij2
So I need a VBA code or a function which examines a word by letter to letter, and returns the value from list 2 which is the most similar to the original "root word". Thank fo your your help, it means to me a lot! have a very nice day!


//original list 1 and list 2 if needed:
List1:
abandon
ability
able
about
about
above
above
abroad
absence
absolute
absolutely
absorb
abuse
academic
accept
acceptable
access
accident
accommodation
accompany
according to
account
account
accurate
accuse
achieve
achievement
acid
acknowledge
acquire
across
act
act
action
active
activist
activity
actor
actual
actually
ad
adapt
add

List2:
abandonedly
abasedly
abashedly
abatedly
abatingly
abbreviatedly
aberrantly
abeyantly
abhorrently
abidingly
abjectly
ably
abnormally
abominably
abortedly
about
abrasively
abruptly
absently
absentmindedly
absent-mindedly
absolutely
absorbantly
absorbedly
absorbingly
abstainedly
abstentiously
abstinently
abstractedly
abstractly
abstrusely
absurdly
abundantly
abusedly
abusively
abysmally
academically
acapella
acceptingly
accessibly
accidentally
acclamatorily
accommodatingly

<colgroup><col></colgroup><tbody>
</tbody>

<colgroup><col></colgroup><tbody>
</tbody>
 
Upvote 0
makiwara, I don't see where you ever explicitly described "most similar" as Fazza requested. One way of describing similarity between strings is the Levenshtein distance. See:

https://en.wikipedia.org/wiki/Levenshtein_distance

The distance is roughly defined as how many changes are needed to change one string into the other, adds, deletes, changes. I converted the algorithm on that web page to VBA. Then I wrote another function that compares one word to every word on another list, and returns the word with the smallest Levenshtein distance.

To try it, open a copy of your workbook. Press Alt-F11 to open the VBA editor. From the menu, select Insert > Module. Paste the following code into the window that opens:

Code:
Public Function LevDist(str1 As String, str2 As String)
Dim d() As Long, m As Long, n As Long, i As Long, j As Long, Cost As Long

    m = Len(str1)
    n = Len(str2)
    ReDim d(0 To m, 0 To n)
    
    For i = 1 To m
        d(i, 0) = i
    Next i
    
    For j = 1 To n
        d(0, j) = j
    Next j
    
    For j = 1 To n
        For i = 1 To m
            Cost = IIf(Mid(str1, i, 1) = Mid(str2, j, 1), 0, 1)
            d(i, j) = WorksheetFunction.Min(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + Cost)
        Next i
    Next j
    
    LevDist = d(m, n)
    
End Function


Public Function GetClosest(str1 As String, r1 As Range)
Dim d1 As Variant, MinDist As Long, i As Long, j As Long, LD As Long

    d1 = r1.Value
    MinDist = 99999
    GetClosest = ""
    
    For i = 1 To UBound(d1)
        For j = 1 To UBound(d1, 2)
            LD = LevDist(str1, CStr(d1(i, j)))
            If LD < MinDist Then
                MinDist = LD
                GetClosest = d1(i, j)
            End If
        Next j
    Next i
        
End Function
Press Alt-Q to close the editor. Then enter the function like this on your worksheet:


ABCDEFGHI
1List1List2List1List2
2a
bandon
abandonedlyabandonedlyabandonallalarmalarmall
3abilityablyabasedlyabandonallowalarmallowanceallow
4ableablyabashedlyabandondogdogtagdogtagdog
5aboutaboutabatedlyabandondummduckdonutdog
6aboutaboutabatinglyabandonelephantelementalduckdumm
7aboveaboutabbreviatedlyabsolutelyelementelementalelementalelement
8aboveaboutaberrantlyabandon
9ab
road
aboutabeyantlyabandon
10absenceabsentlyabhorrentlyabsolutely
11absoluteabsolutelyabidinglyability>
12absolutelyabsolutelyabjectlyability
13absorbaboutablyable
14abuseablyabnormallyabsolutely
15academicacademicallyabominablyability
16acceptaboutabortedlyabsolutely
17acceptableacceptinglyaboutabout
18accessaccessiblyabra
sively
absolutely
19accidentaccidentallyabruptlyability
20accommodationaccommodatinglyabsentlyabsence
21accompanyabnormallyabsentmindedlyabsence
22according toabsorbinglyabsent-mindedlyabsence
23accountaboutabsolutelyabsolutely
24accountaboutabsorbantlyabsolutely
25accurateaboutabsorbedlyabsolutely
26accuseaboutabsorbinglyabsolutely
27achieveacapellaabstainedlyabsolutely
28achievementabhorrentlyabstentiouslyabsolutely
29acidablyabstinentlyability
30acknowledgeabandonedlyabstractedlyabsolutely
31acquireaboutabstractlyabsolutely
32d>acrossaboutabstruselyabsolutely
33actablyabsurdlyabsolutely
34actablyabundantlyabandon
35actionablyabusedlyabuse
36activeablyabusivelyabuse<
/td>
37activistabatinglyabysmallyactually
38activityabatinglyacademicallyacademic
39actorablyacapellaacademic
40actualaboutacceptinglyacceptable
41actuallyabruptlyaccessiblyacceptable
42adablyaccidentallyaccident
43adaptaboutacclamatorilyacceptable
44addablyaccommodatinglyaccommodation

<tbody>
</tbody>
Sheet1

Worksheet
Formulas
CellFormula
B2=getclosest(A2,$C$2:$C$44)

<tbody>
</tbody>

<tbody>
</tbody>



Note that this does NOT return all the results you expect. For example, on your sheet you show "ability" returning "abasedly", while on my sheet I get "ably". The Levenshtein distance between "ability" and "abasedly" is 5 (change "ilit" to "ased", 4 changes, and insert "l"). The Levenshtein distance between "ability" and "ably" is 3 (delete "i" and "it").

So there you go. If this does not return the results you want, you'll need to describe how to get those results.
 
Upvote 0
You are awesome!!!! Thank you a lot works perfectly!!! :)

makiwara, I don't see where you ever explicitly described "most similar" as Fazza requested. One way of describing similarity between strings is the Levenshtein distance. See:

https://en.wikipedia.org/wiki/Levenshtein_distance

The distance is roughly defined as how many changes are needed to change one string into the other, adds, deletes, changes. I converted the algorithm on that web page to VBA. Then I wrote another function that compares one word to every word on another list, and returns the word with the smallest Levenshtein distance.

To try it, open a copy of your workbook. Press Alt-F11 to open the VBA editor. From the menu, select Insert > Module. Paste the following code into the window that opens:

Code:
Public Function LevDist(str1 As String, str2 As String)
Dim d() As Long, m As Long, n As Long, i As Long, j As Long, Cost As Long

    m = Len(str1)
    n = Len(str2)
    ReDim d(0 To m, 0 To n)
    
    For i = 1 To m
        d(i, 0) = i
    Next i
    
    For j = 1 To n
        d(0, j) = j
    Next j
    
    For j = 1 To n
        For i = 1 To m
            Cost = IIf(Mid(str1, i, 1) = Mid(str2, j, 1), 0, 1)
            d(i, j) = WorksheetFunction.Min(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + Cost)
        Next i
    Next j
    
    LevDist = d(m, n)
    
End Function


Public Function GetClosest(str1 As String, r1 As Range)
Dim d1 As Variant, MinDist As Long, i As Long, j As Long, LD As Long

    d1 = r1.Value
    MinDist = 99999
    GetClosest = ""
    
    For i = 1 To UBound(d1)
        For j = 1 To UBound(d1, 2)
            LD = LevDist(str1, CStr(d1(i, j)))
            If LD < MinDist Then
                MinDist = LD
                GetClosest = d1(i, j)
            End If
        Next j
    Next i
        
End Function
Press Alt-Q to close the editor. Then enter the function like this on your worksheet:


ABCDEFGHI
1List1List2List1List2
2a
bandon
abandonedlyabandonedlyabandonallalarmalarmall
3abilityablyabasedlyabandonallowalarmallowanceallow
4ableablyabashedlyabandondogdogtagdogtagdog
5aboutaboutabatedlyabandondummduckdonutdog
6aboutaboutabatinglyabandonelephantelementalduckdumm
7aboveaboutabbreviatedlyabsolutelyelementelementalelementalelement
8aboveaboutaberrantlyabandon
9ab
road
aboutabeyantlyabandon
10absenceabsentlyabhorrentlyabsolutely
11absoluteabsolutelyabidinglyability>
12absolutelyabsolutelyabjectlyability
13absorbaboutablyable
14abuseablyabnormallyabsolutely
15academicacademicallyabominablyability
16acceptaboutabortedlyabsolutely
17acceptableacceptinglyaboutabout
18accessaccessiblyabra
sively
absolutely
19accidentaccidentallyabruptlyability
20accommodationaccommodatinglyabsentlyabsence
21accompanyabnormallyabsentmindedlyabsence
22according toabsorbinglyabsent-mindedlyabsence
23accountaboutabsolutelyabsolutely
24accountaboutabsorbantlyabsolutely
25accurateaboutabsorbedlyabsolutely
26accuseaboutabsorbinglyabsolutely
27achieveacapellaabstainedlyabsolutely
28achievementabhorrentlyabstentiouslyabsolutely
29acidablyabstinentlyability
30acknowledgeabandonedlyabstractedlyabsolutely
31acquireaboutabstractlyabsolutely
32d>acrossaboutabstruselyabsolutely
33actablyabsurdlyabsolutely
34actablyabundantlyabandon
35actionablyabusedlyabuse
36activeablyabusivelyabuse<
/td>
37activistabatinglyabysmallyactually
38activityabatinglyacademicallyacademic
39actorablyacapellaacademic
40actualaboutacceptinglyacceptable
41actuallyabruptlyaccessiblyacceptable
42adablyaccidentallyaccident
43adaptaboutacclamatorilyacceptable
44addablyaccommodatinglyaccommodation

<tbody>
</tbody>
Sheet1

Worksheet
Formulas
CellFormula
B2=getclosest(A2,$C$2:$C$44)

<tbody>
</tbody>

<tbody>
</tbody>



Note that this does NOT return all the results you expect. For example, on your sheet you show "ability" returning "abasedly", while on my sheet I get "ably". The Levenshtein distance between "ability" and "abasedly" is 5 (change "ilit" to "ased", 4 changes, and insert "l"). The Levenshtein distance between "ability" and "ably" is 3 (delete "i" and "it").

So there you go. If this does not return the results you want, you'll need to describe how to get those results.
 
Upvote 0

Forum statistics

Threads
1,213,527
Messages
6,114,150
Members
448,552
Latest member
WORKINGWITHNOLEADER

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top