Index match or find the maximum matching value

peter8848

Board Regular
Joined
Oct 7, 2018
Messages
112
Hi All,

I am trying to design a formular, I have a list of products with its descriptions. Also I developed a function called similarity which determine two cells similartity %.

So Now I have a list of new products and old products list with its functions. Is it possible in B15 and B16 to use a formular like =Index(B2:B7, Match(max(xx,Similarity(A:A)) and then in cell C15 & C16 Showing their percentage? Should we get the max % first and then work on its function?

My similarity function is simple say =Similarity(A2,A15) = 70% or =Similarity(A3,A15) = 65% as an example or should we work out a formular to come up with maximum similarity % first?


ABC
1Product NameFunction
2AABuilding
3BBentertainment
4CCTravel
5DDStationery
6EETools
7FFPhone
8
9
10
11
12New Product List
13
14NameFunctionMax Similarity %
15GG
16HH

<tbody>
</tbody>


Thanks,

Peter
 
Last edited:

Excel Facts

How to total the visible cells?
From the first blank cell below a filtered data set, press Alt+=. Instead of SUM, you will get SUBTOTAL(9,)
In your example, in the first part, the values ​​in column C were missing.
You can send your complete example again and in the second part put the expected result.


609c6e2e2c7f560643f2cca8c7b1dbf4.jpg
 
Upvote 0
Hi DanteAmor,

Thanks for your reply.

The one in red circle should be blank as they are the source data but I am trying to put something in B15 and C15 and the below ones too to determine GG's biggest similarity % to product from AA to FF, say DD, then B15 shows "Stationery" and C15 shows "80%"

It just my current self developed formular for Similarity as below code:



Public Function Similarity(ByVal String1 As String, _
ByVal String2 As String, _
Optional ByRef RetMatch As String, _
Optional min_match = 1) As Single


Dim b1() As Byte, b2() As Byte
Dim lngLen1 As Long, lngLen2 As Long
Dim lngResult As Long


If UCase(String1) = UCase(String2) Then
Similarity = 1
Else:
lngLen1 = Len(String1)
lngLen2 = Len(String2)
If (lngLen1 = 0) Or (lngLen2 = 0) Then
Similarity = 0
Else:
b1() = StrConv(UCase(String1), vbFromUnicode)
b2() = StrConv(UCase(String2), vbFromUnicode)
lngResult = Similarity_sub(0, lngLen1 - 1, _
0, lngLen2 - 1, _
b1, b2, _
String1, _
RetMatch, _
min_match)
Erase b1
Erase b2
If lngLen1 >= lngLen2 Then
Similarity = lngResult / lngLen1
Else
Similarity = lngResult / lngLen2
End If
End If
End If


End Function


Private Function Similarity_sub(ByVal start1 As Long, ByVal end1 As Long, _
ByVal start2 As Long, ByVal end2 As Long, _
ByRef b1() As Byte, ByRef b2() As Byte, _
ByVal FirstString As String, _
ByRef RetMatch As String, _
ByVal min_match As Long, _
Optional recur_level As Integer = 0) As Long
'* CALLED BY: Similarity *(RECURSIVE)


Dim lngCurr1 As Long, lngCurr2 As Long
Dim lngMatchAt1 As Long, lngMatchAt2 As Long
Dim I As Long
Dim lngLongestMatch As Long, lngLocalLongestMatch As Long
Dim strRetMatch1 As String, strRetMatch2 As String


If (start1 > end1) Or (start1 < 0) Or (end1 - start1 + 1 < min_match) _
Or (start2 > end2) Or (start2 < 0) Or (end2 - start2 + 1 < min_match) Then
Exit Function '(exit if start/end is out of string, or length is too short)
End If


For lngCurr1 = start1 To end1
For lngCurr2 = start2 To end2
I = 0
Do Until b1(lngCurr1 + I) <> b2(lngCurr2 + I)
I = I + 1
If I > lngLongestMatch Then
lngMatchAt1 = lngCurr1
lngMatchAt2 = lngCurr2
lngLongestMatch = I
End If
If (lngCurr1 + I) > end1 Or (lngCurr2 + I) > end2 Then Exit Do
Loop
Next lngCurr2
Next lngCurr1


If lngLongestMatch < min_match Then Exit Function


lngLocalLongestMatch = lngLongestMatch
RetMatch = ""


lngLongestMatch = lngLongestMatch _
+ Similarity_sub(start1, lngMatchAt1 - 1, _
start2, lngMatchAt2 - 1, _
b1, b2, _
FirstString, _
strRetMatch1, _
min_match, _
recur_level + 1)
If strRetMatch1 <> "" Then
RetMatch = RetMatch & strRetMatch1 & "*"
Else
RetMatch = RetMatch & IIf(recur_level = 0 _
And lngLocalLongestMatch > 0 _
And (lngMatchAt1 > 1 Or lngMatchAt2 > 1) _
, "*", "")
End If




RetMatch = RetMatch & Mid$(FirstString, lngMatchAt1 + 1, lngLocalLongestMatch)




lngLongestMatch = lngLongestMatch _
+ Similarity_sub(lngMatchAt1 + lngLocalLongestMatch, end1, _
lngMatchAt2 + lngLocalLongestMatch, end2, _
b1, b2, _
FirstString, _
strRetMatch2, _
min_match, _
recur_level + 1)


If strRetMatch2 <> "" Then
RetMatch = RetMatch & "*" & strRetMatch2
Else
RetMatch = RetMatch & IIf(recur_level = 0 _
And lngLocalLongestMatch > 0 _
And ((lngMatchAt1 + lngLocalLongestMatch < end1) _
Or (lngMatchAt2 + lngLocalLongestMatch < end2)) _
, "*", "")
End If


Similarity_sub = lngLongestMatch


End Function
 
Upvote 0
So, did you solve it yet?
Because I don't understand, how are you going to calculate values ​​without a value base?
In your example, you are not putting the expected result.
 
Upvote 0
You want to perform what's called fuzzy matching, which is tricky. Look at post 4 in this thread:

https://www.mrexcel.com/forum/excel-questions/1086332-vlookup-question.html

There's a link to a Microsoft site with a free download that does what you want, calculates a similarity percentage. There's also a link to a routine I wrote, and another link to another thread with lots of options. You should be able to find something that works for you. Good luck!
 
Upvote 0
Eric W,Mate you are a champion!!!!!!!

So how will be the best to describe the closest relationship as I guess we can limit it whether there is permission for deletion or replacements or only adding up etc? Are there anything else we can consider?

Cheers,

Peter
 
Upvote 0
Hi Dante,

I am not creating value but like Eric W's example, I have a list but will find the most matching 2nd list from a different data set.

Cheers,

Peter
 
Upvote 0
Hi Eric,

Sorry I saw you codes from another post and this works perfectly, but just wondering how do I change the code that when function select the maximum matching ones will start the last character or number of the selected text and its matching database?

I meant how would we make sure say the program for text 209f80 will select matching 209h79 (2 edit but changing from the last number/characters(s)) rather than 0209f80 or y209f80(1 edit but from the starting number or character)? thanks a lot.

Cheers,

Peter
 
Upvote 0
This is where it gets tricky. Well, MORE tricky! Fuzzy matching is a very inexact science, and the results you want are often very dependent on the type of data you have, plus some specific requirements that won't necessarily apply to other people.

You have a few options here. First, if you look at my code from that other thread, you'll see 2 functions. The first is LevDist, which calculates the distance between 2 strings, and GetClosest, which applies LevDist to every word in a range and picks the closest one. If you are happy with your Similarity function, you can substitute it into the GetClosest function, and see if that works for you.

Next, if that doesn't work well enough, list a few pairs of words, show the result the GetClosest returns, the result you WANT it to return, and explain why you want that one. It's possible I can tweak LevDist a bit. I did for the other poster. But it can get complicated fast, so it might exceed the time I have to devote to it. Also, I'll probably be away for a few days, so it might be a while for me to get back.
 
Upvote 0
Hi Eric,

thanks for your reply.

I used the getclosed function you created but it will try to match any words within the text but is it possible for the function to prioritize the match from the last (further right) word of the text? the reason being with my product/text ranges or types start the same letters/numbers and start to vary from left to the right. therefore we should try to match from the right to the left instead of any position of the text. (Ensure the products are matching within same range or category)

in my previous example, for text 209f80 getclose function will NOT select matching 209h79 (3 edit but changing from the last number/characters(s)) -from right to left - least important word to most important word) INSTEAD it will select y209f80(1 edit but from the starting number or character - in fact very close matching but in my case it is a totally different product range? is there anyway we can avoid this from happening? i am thinking about prioritise on position matching but i am not sure how to do it?

thanks a lot.

peter
 
Upvote 0

Forum statistics

Threads
1,213,535
Messages
6,114,192
Members
448,554
Latest member
Gleisner2

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top