# Index match or find the maximum matching value

#### peter8848

##### Board Regular
Hi All,

I am trying to design a formular, I have a list of products with its descriptions. Also I developed a function called similarity which determine two cells similartity %.

So Now I have a list of new products and old products list with its functions. Is it possible in B15 and B16 to use a formular like =Index(B2:B7, Match(max(xx,Similarity(A:A)) and then in cell C15 & C16 Showing their percentage? Should we get the max % first and then work on its function?

My similarity function is simple say =Similarity(A2,A15) = 70% or =Similarity(A3,A15) = 65% as an example or should we work out a formular to come up with maximum similarity % first?

 A B C 1 Product Name Function 2 AA Building 3 BB entertainment 4 CC Travel 5 DD Stationery 6 EE Tools 7 FF Phone 8 9 10 11 12 New Product List 13 14 Name Function Max Similarity % 15 GG 16 HH

<tbody>
</tbody>

Thanks,

Peter

Last edited:

### Excel Facts

Fastest way to copy a worksheet?
Hold down the Ctrl key while dragging tab for Sheet1 to the right. Excel will make a copy of the worksheet.

#### DanteAmor

##### Well-known Member
In your example, in the first part, the values ​​in column C were missing.
You can send your complete example again and in the second part put the expected result.

#### peter8848

##### Board Regular
Hi DanteAmor,

The one in red circle should be blank as they are the source data but I am trying to put something in B15 and C15 and the below ones too to determine GG's biggest similarity % to product from AA to FF, say DD, then B15 shows "Stationery" and C15 shows "80%"

It just my current self developed formular for Similarity as below code:

Public Function Similarity(ByVal String1 As String, _
ByVal String2 As String, _
Optional ByRef RetMatch As String, _
Optional min_match = 1) As Single

Dim b1() As Byte, b2() As Byte
Dim lngLen1 As Long, lngLen2 As Long
Dim lngResult As Long

If UCase(String1) = UCase(String2) Then
Similarity = 1
Else:
lngLen1 = Len(String1)
lngLen2 = Len(String2)
If (lngLen1 = 0) Or (lngLen2 = 0) Then
Similarity = 0
Else:
b1() = StrConv(UCase(String1), vbFromUnicode)
b2() = StrConv(UCase(String2), vbFromUnicode)
lngResult = Similarity_sub(0, lngLen1 - 1, _
0, lngLen2 - 1, _
b1, b2, _
String1, _
RetMatch, _
min_match)
Erase b1
Erase b2
If lngLen1 >= lngLen2 Then
Similarity = lngResult / lngLen1
Else
Similarity = lngResult / lngLen2
End If
End If
End If

End Function

Private Function Similarity_sub(ByVal start1 As Long, ByVal end1 As Long, _
ByVal start2 As Long, ByVal end2 As Long, _
ByRef b1() As Byte, ByRef b2() As Byte, _
ByVal FirstString As String, _
ByRef RetMatch As String, _
ByVal min_match As Long, _
Optional recur_level As Integer = 0) As Long
'* CALLED BY: Similarity *(RECURSIVE)

Dim lngCurr1 As Long, lngCurr2 As Long
Dim lngMatchAt1 As Long, lngMatchAt2 As Long
Dim I As Long
Dim lngLongestMatch As Long, lngLocalLongestMatch As Long
Dim strRetMatch1 As String, strRetMatch2 As String

If (start1 > end1) Or (start1 < 0) Or (end1 - start1 + 1 < min_match) _
Or (start2 > end2) Or (start2 < 0) Or (end2 - start2 + 1 < min_match) Then
Exit Function '(exit if start/end is out of string, or length is too short)
End If

For lngCurr1 = start1 To end1
For lngCurr2 = start2 To end2
I = 0
Do Until b1(lngCurr1 + I) <> b2(lngCurr2 + I)
I = I + 1
If I > lngLongestMatch Then
lngMatchAt1 = lngCurr1
lngMatchAt2 = lngCurr2
lngLongestMatch = I
End If
If (lngCurr1 + I) > end1 Or (lngCurr2 + I) > end2 Then Exit Do
Loop
Next lngCurr2
Next lngCurr1

If lngLongestMatch < min_match Then Exit Function

lngLocalLongestMatch = lngLongestMatch
RetMatch = ""

lngLongestMatch = lngLongestMatch _
+ Similarity_sub(start1, lngMatchAt1 - 1, _
start2, lngMatchAt2 - 1, _
b1, b2, _
FirstString, _
strRetMatch1, _
min_match, _
recur_level + 1)
If strRetMatch1 <> "" Then
RetMatch = RetMatch & strRetMatch1 & "*"
Else
RetMatch = RetMatch & IIf(recur_level = 0 _
And lngLocalLongestMatch > 0 _
And (lngMatchAt1 > 1 Or lngMatchAt2 > 1) _
, "*", "")
End If

RetMatch = RetMatch & Mid\$(FirstString, lngMatchAt1 + 1, lngLocalLongestMatch)

lngLongestMatch = lngLongestMatch _
+ Similarity_sub(lngMatchAt1 + lngLocalLongestMatch, end1, _
lngMatchAt2 + lngLocalLongestMatch, end2, _
b1, b2, _
FirstString, _
strRetMatch2, _
min_match, _
recur_level + 1)

If strRetMatch2 <> "" Then
RetMatch = RetMatch & "*" & strRetMatch2
Else
RetMatch = RetMatch & IIf(recur_level = 0 _
And lngLocalLongestMatch > 0 _
And ((lngMatchAt1 + lngLocalLongestMatch < end1) _
Or (lngMatchAt2 + lngLocalLongestMatch < end2)) _
, "*", "")
End If

Similarity_sub = lngLongestMatch

End Function

#### DanteAmor

##### Well-known Member
So, did you solve it yet?
Because I don't understand, how are you going to calculate values ​​without a value base?
In your example, you are not putting the expected result.

#### Eric W

##### MrExcel MVP

You want to perform what's called fuzzy matching, which is tricky. Look at post 4 in this thread:

https://www.mrexcel.com/forum/excel-questions/1086332-vlookup-question.html

There's a link to a Microsoft site with a free download that does what you want, calculates a similarity percentage. There's also a link to a routine I wrote, and another link to another thread with lots of options. You should be able to find something that works for you. Good luck!

#### peter8848

##### Board Regular
Mate you are a champion!!!!!!!

So how will be the best to describe the closest relationship as I guess we can limit it whether there is permission for deletion or replacements or only adding up etc? Are there anything else we can consider?

Cheers,

Peter

#### peter8848

##### Board Regular

Hi Dante,

I am not creating value but like Eric W's example, I have a list but will find the most matching 2nd list from a different data set.

Cheers,

Peter

#### peter8848

##### Board Regular
Hi Eric,

Sorry I saw you codes from another post and this works perfectly, but just wondering how do I change the code that when function select the maximum matching ones will start the last character or number of the selected text and its matching database?

I meant how would we make sure say the program for text 209f80 will select matching 209h79 (2 edit but changing from the last number/characters(s)) rather than 0209f80 or y209f80(1 edit but from the starting number or character)? thanks a lot.

Cheers,

Peter

#### Eric W

##### MrExcel MVP
This is where it gets tricky. Well, MORE tricky! Fuzzy matching is a very inexact science, and the results you want are often very dependent on the type of data you have, plus some specific requirements that won't necessarily apply to other people.

You have a few options here. First, if you look at my code from that other thread, you'll see 2 functions. The first is LevDist, which calculates the distance between 2 strings, and GetClosest, which applies LevDist to every word in a range and picks the closest one. If you are happy with your Similarity function, you can substitute it into the GetClosest function, and see if that works for you.

Next, if that doesn't work well enough, list a few pairs of words, show the result the GetClosest returns, the result you WANT it to return, and explain why you want that one. It's possible I can tweak LevDist a bit. I did for the other poster. But it can get complicated fast, so it might exceed the time I have to devote to it. Also, I'll probably be away for a few days, so it might be a while for me to get back.

#### peter8848

##### Board Regular
Hi Eric,

I used the getclosed function you created but it will try to match any words within the text but is it possible for the function to prioritize the match from the last (further right) word of the text? the reason being with my product/text ranges or types start the same letters/numbers and start to vary from left to the right. therefore we should try to match from the right to the left instead of any position of the text. (Ensure the products are matching within same range or category)

in my previous example, for text 209f80 getclose function will NOT select matching 209h79 (3 edit but changing from the last number/characters(s)) -from right to left - least important word to most important word) INSTEAD it will select y209f80(1 edit but from the starting number or character - in fact very close matching but in my case it is a totally different product range? is there anyway we can avoid this from happening? i am thinking about prioritise on position matching but i am not sure how to do it?

thanks a lot.

peter

Replies
3
Views
30
Replies
11
Views
102
Replies
1
Views
156
Replies
7
Views
402
Replies
1
Views
248

1,108,730
Messages
5,524,493
Members
409,584
Latest member
RedHelp