Closest match

Jon Jagd

Board Regular
Joined
Jul 31, 2002
Messages
72
Hi,

I need a function or macro that helps me in matching manufacturer names from incoming data files with all the values in an internal standard list with valid manufacturer names. After the matching it gives me the internal standard value that comes closest to the incoming value. As an example all the incoming values: Abbott Labs, ABBOTT and Abbott Laboratories A/S must be translated to the internal standard value ABBBOTT LABS. I'm envisaging a function similar to this:

=CLOSESTMATCHES(IncomingValue, standardlist)

Ex. I write the following function in cell C3:
=CLOSESTMATCHES(A2;B2:B8)

When the function finds a value in the standardlist that looks like the incoming value it writes the internal standard value in cell C3. If it doesn't find anything it writes f.ex NEW and if there are several matches it writes all the values, either separated by semicolon or in the following cells in the same row.

I don't know if there is a native function that will meet my requirements and it's not that I don't know how to write a custom function. My problem is that I don't know how approach the problem of comparing and matching the incoming values with standard values. I've been considdering the function:
SEARCH(find_text,within_text,start_num)
but I don't feel that it meets my requirements.

Thanks.

Jon
 

Excel Facts

Get help while writing formula
Click the italics "fx" icon to the left of the formula bar to open the Functions Arguments dialog. Help is displayed for each argument.
Hi,

Following on from that, how about my FuzzyVLookup UDF:
Code:
Option Explicit
Type RankInfo
    Offset As Integer
    Percentage As Single
End Type
Function FuzzyPercent(ByVal String1 As String, _
                      ByVal String2 As String, _
                      Optional Algorithm As Integer = 3) As Single
'*************************************
'** Return a % match on two strings **
'*************************************
Dim intLen1 As Integer
Dim intCurLen As Integer
Dim intTo As Integer
Dim intPos As Integer
Dim intPtr As Integer
Dim intScore As Integer
Dim intTotScore As Integer
Dim intStartPos As Integer
Dim StrWork As String
Dim Str1 As String
Dim Str2 As String

'---------------------------------------------------
'-- Remove surrounding spaces, ensure lower case  --
'-- and remove multiple internal spaces           --
'---------------------------------------------------
Str1 = LCase$(Trim$(String1))
Do
    StrWork = Str1
    Str1 = Application.WorksheetFunction.Substitute _
                                        (StrWork, "  ", " ")
Loop Until StrWork = Str1

Str2 = LCase$(Trim$(String2))
Do
    StrWork = Str2
    Str2 = Application.WorksheetFunction.Substitute _
                                        (StrWork, "  ", " ")
Loop Until StrWork = Str2

'----------------------------------------------
'-- Give 100% match if strings exactly equal --
'----------------------------------------------
If Str1 = Str2 Then
    FuzzyPercent = 1
    Exit Function
End If

intLen1 = Len(Str1)

'----------------------------------------
'-- Give 0% match if string length < 2 --
'----------------------------------------
If intLen1 < 2 Then
    FuzzyPercent = 0
    Exit Function
End If

intTotScore = 0                   'initialise total possible score
intScore = 0                      'initialise current score

'--------------------------------------------------------
'-- If Algorithm = 1 or 3, Search for single characters --
'--------------------------------------------------------
If (Algorithm And 1) <> 0 Then
    intTotScore = intLen1                   'initialise total possible score
    intPos = 0
    For intPtr = 1 To intLen1
        intStartPos = intPos + 1
        intPos = InStr(intStartPos, Str2, Mid$(Str1, intPtr, 1))
        If intPos > 0 Then
            If intPos > intStartPos + 3 Then     'No match if char is > 3 bytes away
                intPos = intStartPos
            Else
                intScore = intScore + 1          'Update current score
            End If
        Else
            intPos = intStartPos
        End If
    Next intPtr
End If

'-----------------------------------------------------------
'-- If Algorithm = 2 or 3, Search for pairs, triplets etc. --
'-----------------------------------------------------------
If (Algorithm And 2) <> 0 Then
    For intCurLen = 2 To intLen1
        StrWork = Str2                          'Get a copy of String2
        intTo = intLen1 - intCurLen + 1
        intTotScore = intTotScore + Int(intLen1 / intCurLen)  'Update total possible score
        For intPtr = 1 To intTo Step intCurLen
            intPos = InStr(StrWork, Mid$(Str1, intPtr, intCurLen))
            If intPos > 0 Then
                Mid$(StrWork, intPos, intCurLen) = String$(intCurLen, &H0) 'corrupt found string
                intScore = intScore + 1     'Update current score
            End If
        Next intPtr
    Next intCurLen
End If

FuzzyPercent = intScore / intTotScore

End Function
Function FuzzyVLookup(ByVal LookupValue As String, _
                      ByVal TableArray As Range, _
                      ByVal IndexNum As Integer, _
                      Optional NFPercent As Single = 0.05, _
                      Optional Rank As Integer = 1, _
                      Optional Algorithm As Integer = 3) As Variant
'********************************************************************************
'** Function to Fuzzy match LookupValue with entries in                        **
'** column 1 of table specified by TableArray.                                 **
'** TableArray must specify the top left cell of the range to be searched      **
'** The function stops scanning the table when an empty cell in column 1       **
'** is found.                                                                  **
'** For each entry in column 1 of the table, FuzzyPercent is called to match   **
'** LookupValue with the Table entry.                                          **
'** 'Rank' is an optional parameter which may take any value > 0               **
'**        (default 1) and causes the function to return the 'nth' best        **
'**         match (where 'n' is defined by 'Rank' parameter)                   **
'** If the 'Rank' match percentage < NFPercent (Default 5%), #N/A is returned. **
'** IndexNum is the column number of the entry in TableArray required to be    **
'** returned, as follows:                                                      **
'** If IndexNum > 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the column entry indicated by IndexNum is     **
'**                 returned.                                                  **
'** if IndexNum = 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the offset row (starting at 1) is returned.   **
'**                 This value can be used directly in the 'Index' function.   **
'**                                                                            **
'** Algorithm can take one of the following values:                            **
'** Algorithm = 1:                                                             **
'**     For each character in 'String1', a search is performed on 'String2'.   **
'**     The search is deemed successful if a character is found in 'String2'   **
'**     within 3 characters of the current position.                           **
'**     A score is kept of matching characters which is returned as a          **
'**     percentage of the total possible score.                                **
'** Algorithm = 2:                                                             **
'**     A count of matching pairs, triplets, quadruplets etc. in 'String1' and **
'**     'String2' is returned as a percentage of the total possible.           **
'** Algorithm = 3: Both Algorithms 1 and 2 are performed.                      **
'********************************************************************************
Dim strLookupValue As String
Dim strListString As String
Dim StrWork As String

Dim sngMinPercent As Single
Dim sngWork As Single
Dim sngCurPercent  As Single
Dim intBestMatchPtr As Integer
Dim intPtr As Integer
Dim intRankPtr As Integer
Dim intRankPtr1 As Integer

Dim udRankData() As RankInfo
'--------------------------------------------------------------
'--    Validation                                            --
'--------------------------------------------------------------
strLookupValue = Trim$(LCase$(LookupValue))

If IsMissing(NFPercent) Then
    sngMinPercent = 0.05
Else
    If (NFPercent <= 0) Or (NFPercent > 1) Then
        FuzzyVLookup = "*** 'NFPercent' must be a percentage > zero ***"
        Exit Function
    End If
    sngMinPercent = NFPercent
End If

If Rank < 1 Then
    FuzzyVLookup = "*** 'Rank' must be an integer > 0 ***"
    Exit Function
End If

ReDim udRankData(1 To Rank)

'---------------
'-- Main loop --
'---------------
intPtr = 1
Do While VarType(TableArray.Cells(intPtr, 1)) <> vbEmpty
    If VarType(TableArray.Cells(intPtr, 1)) = vbString Then
        strListString = Trim$(LCase$(TableArray.Cells(intPtr, 1)))
        
        '------------------------------------------------
        '-- Fuzzy match strings & get percentage match --
        '------------------------------------------------
        sngCurPercent = FuzzyPercent(String1:=strLookupValue, _
                                     String2:=strListString, _
                                     Algorithm:=Algorithm)
        
        If sngCurPercent >= sngMinPercent Then
            '---------------------------
            '-- Store in ranked array --
            '---------------------------
            For intRankPtr = 1 To Rank
                If sngCurPercent > udRankData(intRankPtr).Percentage Then
                    For intRankPtr1 = Rank To intRankPtr + 1 Step -1
                        With udRankData(intRankPtr1)
                            .Offset = udRankData(intRankPtr1 - 1).Offset
                            .Percentage = udRankData(intRankPtr1 - 1).Percentage
                        End With
                    Next intRankPtr1
                    With udRankData(intRankPtr)
                        .Offset = intPtr
                        .Percentage = sngCurPercent
                    End With
                    Exit For
                End If
            Next intRankPtr
        End If
        
    End If
    intPtr = intPtr + 1
Loop

If udRankData(Rank).Percentage < sngMinPercent Then
    '--------------------------------------
    '-- Return '#N/A' if below NFPercent --
    '--------------------------------------
    FuzzyVLookup = CVErr(xlErrNA)
Else
    intBestMatchPtr = udRankData(Rank).Offset
    If IndexNum > 0 Then
        '-----------------------------------
        '-- Return column entry specified --
        '-----------------------------------
        FuzzyVLookup = TableArray.Cells(intBestMatchPtr, IndexNum)
    Else
        '-----------------------
        '-- Return offset row --
        '-----------------------
        FuzzyVLookup = intBestMatchPtr
    End If
End If
End Function
Function FuzzyHLookup(ByVal LookupValue As String, _
                      ByVal TableArray As Range, _
                      ByVal IndexNum As Integer, _
                      Optional NFPercent As Single = 0.05, _
                      Optional Rank As Integer = 1, _
                      Optional Algorithm As Integer = 3) As Variant
'********************************************************************************
'** Function to Fuzzy match LookupValue with entries in                        **
'** row 1 of table specified by TableArray.                                    **
'** TableArray must specify the top left cell of the range to be searched      **
'** The function stops scanning the table when an empty cell in row 1          **
'** is found.                                                                  **
'** For each entry in row 1 of the table, FuzzyPercent is called to match      **
'** LookupValue with the Table entry.                                          **
'** 'Rank' is an optional parameter which may take any value > 0               **
'**        (default 1) and causes the function to return the 'nth' best        **
'**         match (where 'n' is defined by 'Rank' parameter)                   **
'** If the 'Rank' match percentage < NFPercent (Default 5%), #N/A is returned. **
'** IndexNum is the row number of the entry in TableArray required to be       **
'** returned, as follows:                                                      **
'** If IndexNum > 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the row entry indicated by IndexNum is        **
'**                 returned.                                                  **
'** if IndexNum = 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the offset col (starting at 0) is returned.   **
'**                 This value can be used directly in the 'OffSet' function.  **
'**                                                                            **
'** Algorithm can take one of the following values:                            **
'** Algorithm = 1:                                                             **
'**     For each character in 'String1', a search is performed on 'String2'.   **
'**     The search is deemed successful if a character is found in 'String2'   **
'**     within 3 characters of the current position.                           **
'**     A score is kept of matching characters which is returned as a          **
'**     percentage of the total possible score.                                **
'** Algorithm = 2:                                                             **
'**     A count of matching pairs, triplets, quadruplets etc. in 'String1' and **
'**     'String2' is returned as a percentage of the total possible.           **
'** Algorithm = 3: Both Algorithms 1 and 2 are performed.                      **
'********************************************************************************
Dim strLookupValue As String
Dim strListString As String
Dim StrWork As String

Dim sngMinPercent As Single
Dim sngWork As Single
Dim sngCurPercent  As Single
Dim intBestMatchPtr As Integer
Dim intPtr As Integer
Dim intRankPtr As Integer
Dim intRankPtr1 As Integer

Dim udRankData() As RankInfo
'--------------------------------------------------------------
'--    Validation                                            --
'--------------------------------------------------------------
strLookupValue = Trim$(LCase$(LookupValue))

If IsMissing(NFPercent) Then
    sngMinPercent = 0.05
Else
    If (NFPercent <= 0) Or (NFPercent > 1) Then
        FuzzyHLookup = "*** 'NFPercent' must be a percentage > zero ***"
        Exit Function
    End If
    sngMinPercent = NFPercent
End If

If Rank < 1 Then
    FuzzyHLookup = "*** 'Rank' must be an integer > 0 ***"
    Exit Function
End If

ReDim udRankData(1 To Rank)

'---------------
'-- Main loop --
'---------------
intPtr = 1
Do While VarType(TableArray.Cells(1, intPtr)) <> vbEmpty
    If VarType(TableArray.Cells(1, intPtr)) = vbString Then
        strListString = Trim$(LCase$(TableArray.Cells(1, intPtr)))
        
        '------------------------------------------------
        '-- Fuzzy match strings & get percentage match --
        '------------------------------------------------
        sngCurPercent = FuzzyPercent(String1:=strLookupValue, _
                                     String2:=strListString, _
                                     Algorithm:=Algorithm)
        
        If sngCurPercent >= sngMinPercent Then
            '---------------------------
            '-- Store in ranked array --
            '---------------------------
            For intRankPtr = 1 To Rank
                If sngCurPercent > udRankData(intRankPtr).Percentage Then
                    For intRankPtr1 = Rank To intRankPtr + 1 Step -1
                        With udRankData(intRankPtr1)
                            .Offset = udRankData(intRankPtr1 - 1).Offset
                            .Percentage = udRankData(intRankPtr1 - 1).Percentage
                        End With
                    Next intRankPtr1
                    With udRankData(intRankPtr)
                        .Offset = intPtr
                        .Percentage = sngCurPercent
                    End With
                    Exit For
                End If
            Next intRankPtr
        End If
        
    End If
    intPtr = intPtr + 1
Loop

If udRankData(Rank).Percentage < sngMinPercent Then
    '--------------------------------------
    '-- Return '#N/A' if below NFPercent --
    '--------------------------------------
    FuzzyHLookup = CVErr(xlErrNA)
Else
    intBestMatchPtr = udRankData(Rank).Offset
    If IndexNum > 0 Then
        '-----------------------------------
        '-- Return row entry specified --
        '-----------------------------------
        FuzzyHLookup = TableArray.Cells(IndexNum, intBestMatchPtr)
    Else
        '-----------------------
        '-- Return offset col --
        '-----------------------
        FuzzyHLookup = intBestMatchPtr
    End If
End If
End Function

Parameters to FuzzyVLookup are:

LookupValue := Entry to be looked up
TableArray := Lookup Range
IndexNum := Zero to return row number, >0 to return specified column (as in VLookup)
NFPercent := If the %age match is less than this, return #N/A
Rank := Return the 'n'th highest match. Default 1
Algorithm := Must be 1, 2 or 3. 1 = 'Forward scan only', '2' = Match on pairs then triplets then quads etc, '3' = match on both

For your purposes, suggest a NFPercent of about 50% and Algorithm =1, but play!

HTH

Alan
 
Upvote 0
hi!
I come up here with something like character checking. but
at the end i come up with another question on how sensitve
you want it to be?
Please see below
Book3
ABCDE
1comingdata%matchpatternABBOTTLABS
2AbbottLabS100.00%
3ABBOTT54.55%
4AbbottLaboratoriesA/S90.91%
5aabbottlabs100.00%
6aaaabbot45.45%
Sheet1


the code:<font face=Courier New><SPAN style="color:#00007F">Private</SPAN><SPAN style="color:#00007F">Function</SPAN> PatternMatching(s1, s2<SPAN style="color:#00007F">As</SPAN><SPAN style="color:#00007F">String</SPAN>)<SPAN style="color:#00007F">As</SPAN><SPAN style="color:#00007F">Integer</SPAN><SPAN style="color:#00007F">Dim</SPAN> i2<SPAN style="color:#00007F">As</SPAN><SPAN style="color:#00007F">Integer</SPAN>

    Max = 0
    Count = 0
start:
    Count = 0
    <SPAN style="color:#00007F">For</SPAN> i = 1<SPAN style="color:#00007F">To</SPAN> Len(s1)
        i2 = i
        <SPAN style="color:#00007F">For</SPAN> j = 1<SPAN style="color:#00007F">To</SPAN> Len(s2)
            <SPAN style="color:#00007F">If</SPAN> UCase(Mid(s2, j, 1)) = UCase(Mid(s1, i2, 1))<SPAN style="color:#00007F">Then</SPAN>
                i2 = i2 + 1
                Count = Count + 1
                <SPAN style="color:#00007F">If</SPAN> Max< Count<SPAN style="color:#00007F">Then</SPAN>
                    Max = Count
                <SPAN style="color:#00007F">End</SPAN><SPAN style="color:#00007F">If</SPAN>
            <SPAN style="color:#00007F">Else</SPAN>
                Count = 0
                <SPAN style="color:#00007F">Exit</SPAN><SPAN style="color:#00007F">For</SPAN>
            <SPAN style="color:#00007F">End</SPAN><SPAN style="color:#00007F">If</SPAN>
        <SPAN style="color:#00007F">Next</SPAN> j
    <SPAN style="color:#00007F">Next</SPAN> i<SPAN style="color:#00007F">If</SPAN> Max< Len(s2)<SPAN style="color:#00007F">Then</SPAN>
    s2 = Mid(s2, 2, Len(s2))<SPAN style="color:#00007F">GoTo</SPAN> start<SPAN style="color:#00007F">End</SPAN><SPAN style="color:#00007F">If</SPAN>
    PatternMatching = Max<SPAN style="color:#00007F">End</SPAN><SPAN style="color:#00007F">Function</SPAN><SPAN style="color:#00007F">Private</SPAN><SPAN style="color:#00007F">Sub</SPAN> CommandButton1_Click()<SPAN style="color:#00007F">For</SPAN> i = 2<SPAN style="color:#00007F">To</SPAN> Range("A65536").End(xlUp).Row
    Range("b" & i).Value = PatternMatching(Range("e1"), Range("a" & i)) / Len(Range("e1"))<SPAN style="color:#00007F">Next</SPAN> i<SPAN style="color:#00007F">End</SPAN><SPAN style="color:#00007F">Sub</SPAN></FONT>
 
Upvote 0
I just tried out Alan's impressive function but got a compile error in the following line:

Code:
Dim udRankData() As RankInfo

It sais: User-defined type not defined.

Am I missing some lines?

Jon
 
Upvote 0
Jon Jagd said:
I just tried out Alan's impressive function but got a compile error in the following line:

Code:
Dim udRankData() As RankInfo

It sais: User-defined type not defined.

Am I missing some lines?

Jon

Yes, the first 5 lines of the code Alan posted.
 
Upvote 0
Did you copy over these lines as well as the functions --
Code:
Option Explicit 
Type RankInfo 
    Offset As Integer 
    Percentage As Single 
End Type
 
Upvote 0
I did actually. It was lying together with other functions in the same module. It worked when I placed all of the code in a seperate module. The strange thing is, that I only get acces to the FuzzyHLookup and not the FuzzyVLookup in the Functions library even though it is there in the code.

To answer the question of SIXTH SENSE on how sensitive I want it to be I think it would be great to be able to change the sensitivity according to the different needs. More or less like in Alans function.

Jon
 
Upvote 0
Note that you need FuzzyPercent too - it's called buy FuzzyVLookup.


Regarding SIXTH SENSE's comment 'how sensitive do you want it to be' - this is controlled by the NFPercent parameter.

HTH


Alan
 
Upvote 0

Forum statistics

Threads
1,214,583
Messages
6,120,378
Members
448,955
Latest member
BatCoder

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top