MrExcel Message Board


Go Back   MrExcel Message Board > Question Forums > Excel Questions

Excel Questions All Excel/VBA questions - formulas, macros, pivot tables, general help, etc. Please post to this forum in English only.

Reply
 
Thread Tools Display Modes
Old Dec 16th, 2003, 01:59 PM   #1
Jon Jagd
 
Join Date: Aug 2002
Posts: 69
Default Closest match

Hi,

I need a function or macro that helps me in matching manufacturer names from incoming data files with all the values in an internal standard list with valid manufacturer names. After the matching it gives me the internal standard value that comes closest to the incoming value. As an example all the incoming values: Abbott Labs, ABBOTT and Abbott Laboratories A/S must be translated to the internal standard value ABBBOTT LABS. I'm envisaging a function similar to this:

=CLOSESTMATCHES(IncomingValue, standardlist)

Ex. I write the following function in cell C3:
=CLOSESTMATCHES(A2;B2:B8)

When the function finds a value in the standardlist that looks like the incoming value it writes the internal standard value in cell C3. If it doesn't find anything it writes f.ex NEW and if there are several matches it writes all the values, either separated by semicolon or in the following cells in the same row.

I don't know if there is a native function that will meet my requirements and it's not that I don't know how to write a custom function. My problem is that I don't know how approach the problem of comparing and matching the incoming values with standard values. I've been considdering the function:
SEARCH(find_text,within_text,start_num)
but I don't feel that it meets my requirements.

Thanks.

Jon
Jon Jagd is offline   Reply With Quote
Old Dec 16th, 2003, 02:27 PM   #2
Andrew Poulsom
MrExcel MVP
 
Andrew Poulsom's Avatar
 
Join Date: Jul 2002
Posts: 32,030
Default Re: Closest match

See if you can adapt one of the entries here:

http://www.mrexcel.com/pc07.shtml
Andrew Poulsom is offline   Reply With Quote
Old Dec 16th, 2003, 02:44 PM   #3
Jon Jagd
 
Join Date: Aug 2002
Posts: 69
Default Re: Closest match

Thanks! Seems you understood my problem The solutions looks promising.

Jon
Jon Jagd is offline   Reply With Quote
Old Dec 16th, 2003, 02:55 PM   #4
al_b_cnu
 
Join Date: Jul 2003
Location: Manchester (UK)
Posts: 3,497
Default Re: Closest match

Hi,

Following on from that, how about my FuzzyVLookup UDF:
Code:
Option Explicit
Type RankInfo
    Offset As Integer
    Percentage As Single
End Type
Function FuzzyPercent(ByVal String1 As String, _
                      ByVal String2 As String, _
                      Optional Algorithm As Integer = 3) As Single
'*************************************
'** Return a % match on two strings **
'*************************************
Dim intLen1 As Integer
Dim intCurLen As Integer
Dim intTo As Integer
Dim intPos As Integer
Dim intPtr As Integer
Dim intScore As Integer
Dim intTotScore As Integer
Dim intStartPos As Integer
Dim StrWork As String
Dim Str1 As String
Dim Str2 As String

'---------------------------------------------------
'-- Remove surrounding spaces, ensure lower case  --
'-- and remove multiple internal spaces           --
'---------------------------------------------------
Str1 = LCase$(Trim$(String1))
Do
    StrWork = Str1
    Str1 = Application.WorksheetFunction.Substitute _
                                        (StrWork, "  ", " ")
Loop Until StrWork = Str1

Str2 = LCase$(Trim$(String2))
Do
    StrWork = Str2
    Str2 = Application.WorksheetFunction.Substitute _
                                        (StrWork, "  ", " ")
Loop Until StrWork = Str2

'----------------------------------------------
'-- Give 100% match if strings exactly equal --
'----------------------------------------------
If Str1 = Str2 Then
    FuzzyPercent = 1
    Exit Function
End If

intLen1 = Len(Str1)

'----------------------------------------
'-- Give 0% match if string length < 2 --
'----------------------------------------
If intLen1 < 2 Then
    FuzzyPercent = 0
    Exit Function
End If

intTotScore = 0                   'initialise total possible score
intScore = 0                      'initialise current score

'--------------------------------------------------------
'-- If Algorithm = 1 or 3, Search for single characters --
'--------------------------------------------------------
If (Algorithm And 1) <> 0 Then
    intTotScore = intLen1                   'initialise total possible score
    intPos = 0
    For intPtr = 1 To intLen1
        intStartPos = intPos + 1
        intPos = InStr(intStartPos, Str2, Mid$(Str1, intPtr, 1))
        If intPos > 0 Then
            If intPos > intStartPos + 3 Then     'No match if char is > 3 bytes away
                intPos = intStartPos
            Else
                intScore = intScore + 1          'Update current score
            End If
        Else
            intPos = intStartPos
        End If
    Next intPtr
End If

'-----------------------------------------------------------
'-- If Algorithm = 2 or 3, Search for pairs, triplets etc. --
'-----------------------------------------------------------
If (Algorithm And 2) <> 0 Then
    For intCurLen = 2 To intLen1
        StrWork = Str2                          'Get a copy of String2
        intTo = intLen1 - intCurLen + 1
        intTotScore = intTotScore + Int(intLen1 / intCurLen)  'Update total possible score
        For intPtr = 1 To intTo Step intCurLen
            intPos = InStr(StrWork, Mid$(Str1, intPtr, intCurLen))
            If intPos > 0 Then
                Mid$(StrWork, intPos, intCurLen) = String$(intCurLen, &H0) 'corrupt found string
                intScore = intScore + 1     'Update current score
            End If
        Next intPtr
    Next intCurLen
End If

FuzzyPercent = intScore / intTotScore

End Function
Function FuzzyVLookup(ByVal LookupValue As String, _
                      ByVal TableArray As Range, _
                      ByVal IndexNum As Integer, _
                      Optional NFPercent As Single = 0.05, _
                      Optional Rank As Integer = 1, _
                      Optional Algorithm As Integer = 3) As Variant
'********************************************************************************
'** Function to Fuzzy match LookupValue with entries in                        **
'** column 1 of table specified by TableArray.                                 **
'** TableArray must specify the top left cell of the range to be searched      **
'** The function stops scanning the table when an empty cell in column 1       **
'** is found.                                                                  **
'** For each entry in column 1 of the table, FuzzyPercent is called to match   **
'** LookupValue with the Table entry.                                          **
'** 'Rank' is an optional parameter which may take any value > 0               **
'**        (default 1) and causes the function to return the 'nth' best        **
'**         match (where 'n' is defined by 'Rank' parameter)                   **
'** If the 'Rank' match percentage < NFPercent (Default 5%), #N/A is returned. **
'** IndexNum is the column number of the entry in TableArray required to be    **
'** returned, as follows:                                                      **
'** If IndexNum > 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the column entry indicated by IndexNum is     **
'**                 returned.                                                  **
'** if IndexNum = 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the offset row (starting at 1) is returned.   **
'**                 This value can be used directly in the 'Index' function.   **
'**                                                                            **
'** Algorithm can take one of the following values:                            **
'** Algorithm = 1:                                                             **
'**     For each character in 'String1', a search is performed on 'String2'.   **
'**     The search is deemed successful if a character is found in 'String2'   **
'**     within 3 characters of the current position.                           **
'**     A score is kept of matching characters which is returned as a          **
'**     percentage of the total possible score.                                **
'** Algorithm = 2:                                                             **
'**     A count of matching pairs, triplets, quadruplets etc. in 'String1' and **
'**     'String2' is returned as a percentage of the total possible.           **
'** Algorithm = 3: Both Algorithms 1 and 2 are performed.                      **
'********************************************************************************
Dim strLookupValue As String
Dim strListString As String
Dim StrWork As String

Dim sngMinPercent As Single
Dim sngWork As Single
Dim sngCurPercent  As Single
Dim intBestMatchPtr As Integer
Dim intPtr As Integer
Dim intRankPtr As Integer
Dim intRankPtr1 As Integer

Dim udRankData() As RankInfo
'--------------------------------------------------------------
'--    Validation                                            --
'--------------------------------------------------------------
strLookupValue = Trim$(LCase$(LookupValue))

If IsMissing(NFPercent) Then
    sngMinPercent = 0.05
Else
    If (NFPercent <= 0) Or (NFPercent > 1) Then
        FuzzyVLookup = "*** 'NFPercent' must be a percentage > zero ***"
        Exit Function
    End If
    sngMinPercent = NFPercent
End If

If Rank < 1 Then
    FuzzyVLookup = "*** 'Rank' must be an integer > 0 ***"
    Exit Function
End If

ReDim udRankData(1 To Rank)

'---------------
'-- Main loop --
'---------------
intPtr = 1
Do While VarType(TableArray.Cells(intPtr, 1)) <> vbEmpty
    If VarType(TableArray.Cells(intPtr, 1)) = vbString Then
        strListString = Trim$(LCase$(TableArray.Cells(intPtr, 1)))
        
        '------------------------------------------------
        '-- Fuzzy match strings & get percentage match --
        '------------------------------------------------
        sngCurPercent = FuzzyPercent(String1:=strLookupValue, _
                                     String2:=strListString, _
                                     Algorithm:=Algorithm)
        
        If sngCurPercent >= sngMinPercent Then
            '---------------------------
            '-- Store in ranked array --
            '---------------------------
            For intRankPtr = 1 To Rank
                If sngCurPercent > udRankData(intRankPtr).Percentage Then
                    For intRankPtr1 = Rank To intRankPtr + 1 Step -1
                        With udRankData(intRankPtr1)
                            .Offset = udRankData(intRankPtr1 - 1).Offset
                            .Percentage = udRankData(intRankPtr1 - 1).Percentage
                        End With
                    Next intRankPtr1
                    With udRankData(intRankPtr)
                        .Offset = intPtr
                        .Percentage = sngCurPercent
                    End With
                    Exit For
                End If
            Next intRankPtr
        End If
        
    End If
    intPtr = intPtr + 1
Loop

If udRankData(Rank).Percentage < sngMinPercent Then
    '--------------------------------------
    '-- Return '#N/A' if below NFPercent --
    '--------------------------------------
    FuzzyVLookup = CVErr(xlErrNA)
Else
    intBestMatchPtr = udRankData(Rank).Offset
    If IndexNum > 0 Then
        '-----------------------------------
        '-- Return column entry specified --
        '-----------------------------------
        FuzzyVLookup = TableArray.Cells(intBestMatchPtr, IndexNum)
    Else
        '-----------------------
        '-- Return offset row --
        '-----------------------
        FuzzyVLookup = intBestMatchPtr
    End If
End If
End Function
Function FuzzyHLookup(ByVal LookupValue As String, _
                      ByVal TableArray As Range, _
                      ByVal IndexNum As Integer, _
                      Optional NFPercent As Single = 0.05, _
                      Optional Rank As Integer = 1, _
                      Optional Algorithm As Integer = 3) As Variant
'********************************************************************************
'** Function to Fuzzy match LookupValue with entries in                        **
'** row 1 of table specified by TableArray.                                    **
'** TableArray must specify the top left cell of the range to be searched      **
'** The function stops scanning the table when an empty cell in row 1          **
'** is found.                                                                  **
'** For each entry in row 1 of the table, FuzzyPercent is called to match      **
'** LookupValue with the Table entry.                                          **
'** 'Rank' is an optional parameter which may take any value > 0               **
'**        (default 1) and causes the function to return the 'nth' best        **
'**         match (where 'n' is defined by 'Rank' parameter)                   **
'** If the 'Rank' match percentage < NFPercent (Default 5%), #N/A is returned. **
'** IndexNum is the row number of the entry in TableArray required to be       **
'** returned, as follows:                                                      **
'** If IndexNum > 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the row entry indicated by IndexNum is        **
'**                 returned.                                                  **
'** if IndexNum = 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the offset col (starting at 0) is returned.   **
'**                 This value can be used directly in the 'OffSet' function.  **
'**                                                                            **
'** Algorithm can take one of the following values:                            **
'** Algorithm = 1:                                                             **
'**     For each character in 'String1', a search is performed on 'String2'.   **
'**     The search is deemed successful if a character is found in 'String2'   **
'**     within 3 characters of the current position.                           **
'**     A score is kept of matching characters which is returned as a          **
'**     percentage of the total possible score.                                **
'** Algorithm = 2:                                                             **
'**     A count of matching pairs, triplets, quadruplets etc. in 'String1' and **
'**     'String2' is returned as a percentage of the total possible.           **
'** Algorithm = 3: Both Algorithms 1 and 2 are performed.                      **
'********************************************************************************
Dim strLookupValue As String
Dim strListString As String
Dim StrWork As String

Dim sngMinPercent As Single
Dim sngWork As Single
Dim sngCurPercent  As Single
Dim intBestMatchPtr As Integer
Dim intPtr As Integer
Dim intRankPtr As Integer
Dim intRankPtr1 As Integer

Dim udRankData() As RankInfo
'--------------------------------------------------------------
'--    Validation                                            --
'--------------------------------------------------------------
strLookupValue = Trim$(LCase$(LookupValue))

If IsMissing(NFPercent) Then
    sngMinPercent = 0.05
Else
    If (NFPercent <= 0) Or (NFPercent > 1) Then
        FuzzyHLookup = "*** 'NFPercent' must be a percentage > zero ***"
        Exit Function
    End If
    sngMinPercent = NFPercent
End If

If Rank < 1 Then
    FuzzyHLookup = "*** 'Rank' must be an integer > 0 ***"
    Exit Function
End If

ReDim udRankData(1 To Rank)

'---------------
'-- Main loop --
'---------------
intPtr = 1
Do While VarType(TableArray.Cells(1, intPtr)) <> vbEmpty
    If VarType(TableArray.Cells(1, intPtr)) = vbString Then
        strListString = Trim$(LCase$(TableArray.Cells(1, intPtr)))
        
        '------------------------------------------------
        '-- Fuzzy match strings & get percentage match --
        '------------------------------------------------
        sngCurPercent = FuzzyPercent(String1:=strLookupValue, _
                                     String2:=strListString, _
                                     Algorithm:=Algorithm)
        
        If sngCurPercent >= sngMinPercent Then
            '---------------------------
            '-- Store in ranked array --
            '---------------------------
            For intRankPtr = 1 To Rank
                If sngCurPercent > udRankData(intRankPtr).Percentage Then
                    For intRankPtr1 = Rank To intRankPtr + 1 Step -1
                        With udRankData(intRankPtr1)
                            .Offset = udRankData(intRankPtr1 - 1).Offset
                            .Percentage = udRankData(intRankPtr1 - 1).Percentage
                        End With
                    Next intRankPtr1
                    With udRankData(intRankPtr)
                        .Offset = intPtr
                        .Percentage = sngCurPercent
                    End With
                    Exit For
                End If
            Next intRankPtr
        End If
        
    End If
    intPtr = intPtr + 1
Loop

If udRankData(Rank).Percentage < sngMinPercent Then
    '--------------------------------------
    '-- Return '#N/A' if below NFPercent --
    '--------------------------------------
    FuzzyHLookup = CVErr(xlErrNA)
Else
    intBestMatchPtr = udRankData(Rank).Offset
    If IndexNum > 0 Then
        '-----------------------------------
        '-- Return row entry specified --
        '-----------------------------------
        FuzzyHLookup = TableArray.Cells(IndexNum, intBestMatchPtr)
    Else
        '-----------------------
        '-- Return offset col --
        '-----------------------
        FuzzyHLookup = intBestMatchPtr
    End If
End If
End Function
Parameters to FuzzyVLookup are:

LookupValue := Entry to be looked up
TableArray := Lookup Range
IndexNum := Zero to return row number, >0 to return specified column (as in VLookup)
NFPercent := If the %age match is less than this, return #N/A
Rank := Return the 'n'th highest match. Default 1
Algorithm := Must be 1, 2 or 3. 1 = 'Forward scan only', '2' = Match on pairs then triplets then quads etc, '3' = match on both

For your purposes, suggest a NFPercent of about 50% and Algorithm =1, but play!

HTH

Alan
al_b_cnu is offline   Reply With Quote
Old Dec 16th, 2003, 03:09 PM   #5
SIXTH SENSE
 
SIXTH SENSE's Avatar
 
Join Date: Oct 2003
Posts: 1,818
Default

hi!
I come up here with something like character checking. but
at the end i come up with another question on how sensitve
you want it to be?
Please see below

******** ******************** ************************************************************************>
Microsoft Excel - Book3___Running: xl2000 : OS = Windows XP
(F)ile (E)dit (V)iew (I)nsert (O)ptions (T)ools (D)ata (W)indow (H)elp (A)bout
=

A
B
C
D
E
1
coming*data%*match*patternABBOTT*LABS
2
Abbott*LabS100.00%***
3
ABBOTT54.55%***
4
Abbott*Laboratories*A/S90.91%***
5
aabbott*labs100.00%***
6
aaaabbot45.45%***
Sheet1*

[HtmlMaker 2.42] To see the formula in the cells just click on the cells hyperlink or click the Name box
PLEASE DO NOT QUOTE THIS TABLE IMAGE ON SAME PAGE! OTHEWISE, ERROR OF JavaScript OCCUR.


the code:


Private Function PatternMatching(s1, s2 As String) As Integer
Dim i2 As Integer

****Max = 0
****Count = 0
start:
****Count = 0
****For i = 1 To Len(s1)
********i2 = i
********For j = 1 To Len(s2)
************If UCase(Mid(s2, j, 1)) = UCase(Mid(s1, i2, 1)) Then
****************i2 = i2 + 1
****************Count = Count + 1
****************If Max < Count Then
********************Max = Count
****************End If
************Else
****************Count = 0
****************Exit For
************End If
********Next j
****Next i
If Max < Len(s2) Then
****s2 = Mid(s2, 2, Len(s2))
GoTo start
End If
****PatternMatching = Max
End Function

Private Sub CommandButton1_Click()
For i = 2 To Range("A65536").End(xlUp).Row
****Range("b" & i).Value = PatternMatching(Range("e1"), Range("a" & i)) / Len(Range("e1"))
Next i
End Sub
__________________
There is always a better way!!
SIXTH SENSE is offline   Reply With Quote
Old Dec 16th, 2003, 03:12 PM   #6
Jon Jagd
 
Join Date: Aug 2002
Posts: 69
Default Re: Closest match

I just tried out Alan's impressive function but got a compile error in the following line:

Code:
Dim udRankData() As RankInfo
It sais: User-defined type not defined.

Am I missing some lines?

Jon
Jon Jagd is offline   Reply With Quote
Old Dec 16th, 2003, 03:17 PM   #7
Andrew Poulsom
MrExcel MVP
 
Andrew Poulsom's Avatar
 
Join Date: Jul 2002
Posts: 32,030
Default Re: Closest match

Quote:
Originally Posted by Jon Jagd
I just tried out Alan's impressive function but got a compile error in the following line:

Code:
Dim udRankData() As RankInfo
It sais: User-defined type not defined.

Am I missing some lines?

Jon
Yes, the first 5 lines of the code Alan posted.
Andrew Poulsom is offline   Reply With Quote
Old Dec 16th, 2003, 03:27 PM   #8
just_jon
MrExcel MVP
 
just_jon's Avatar
 
Join Date: Sep 2002
Location: Alabama/State of Disarray
Posts: 10,473
Default Re: Closest match

Did you copy over these lines as well as the functions --
Code:
Option Explicit 
Type RankInfo 
    Offset As Integer 
    Percentage As Single 
End Type
__________________
just_jon
Book of the Month: I'm Not Really an MVP, I Just Play One on TV [j. jon, 2004]
just_jon is offline   Reply With Quote
Old Dec 16th, 2003, 03:37 PM   #9
Jon Jagd
 
Join Date: Aug 2002
Posts: 69
Default Re: Closest match

I did actually. It was lying together with other functions in the same module. It worked when I placed all of the code in a seperate module. The strange thing is, that I only get acces to the FuzzyHLookup and not the FuzzyVLookup in the Functions library even though it is there in the code.

To answer the question of SIXTH SENSE on how sensitive I want it to be I think it would be great to be able to change the sensitivity according to the different needs. More or less like in Alans function.

Jon
Jon Jagd is offline   Reply With Quote
Old Dec 16th, 2003, 03:39 PM   #10
al_b_cnu
 
Join Date: Jul 2003
Location: Manchester (UK)
Posts: 3,497
Default Re: Closest match

Note that you need FuzzyPercent too - it's called buy FuzzyVLookup.


Regarding SIXTH SENSE's comment 'how sensitive do you want it to be' - this is controlled by the NFPercent parameter.

HTH


Alan
al_b_cnu is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On

Forum Jump


All times are GMT +1. The time now is 04:16 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
All contents Copyright 1998-2009 by MrExcel Consulting.