Fuzzy Matching - new version plus explanation

al_b_cnu

Well-known Member
Joined
Jul 18, 2003
Messages
4,545
It has been a while since I originally posted my Fuzzy matching UDF’s on the board, and several variants have appeared subsequently.

I thought it time to ‘put the record straight’ & post a definitive version which contains slightly more efficient code, and better matching algorithms, so here it is.

Firstly, I must state that the Fuzzy matching algorithms are very CPU hungry, and should be used sparingly. If for instance you require to lookup a match for a string which starts with, contains or ends with a specified value, this can be performed far more efficiently using the MATCH function:
Fuzzy Examples.xls
ABCDE
1Starts WithEndsContains
2BilljelenBill
3Mr Bill Jelen433
4Bill Jelen
5Joe Bloggs
6Fred Smith
MATCH Example


... Continued ...
 
Hi, welcome to the board :)

Would you like to email a sample of your s/sheet & I'll have a look.

Best wishes

Alan
 
Upvote 0

Excel Facts

Can Excel fill bagel flavors?
You can teach Excel a new custom list. Type the list in cells, File, Options, Advanced, Edit Custom Lists, Import, OK
WOW!!!! This is very impressive. I tried it and it's just amazing. You have no idea how much work this is going to save me, perhaps a few days in just a few hours!

Are you the author of this masterpiece?

Thanks again.:biggrin:

The code is not Normal Visual Basic!

It is Excel Visual Basic for Applications (Excel VBA)

It must be stored in a Standard code module, like: Module1, as is!

Alt + F11
Toolbar: Insert - Module
Paste code in the Editor window for that module!

Click the top-most-Right Close "X" to return to the sheet.
 
Upvote 0
Hi,

Does anyone know how to write vba for comparing exact and similar data in excel?

I have got 3 cols - A which has 1000 cells, B which has 400 cells and C

- if there are "123456ABC" in both col A and B, how can I make a program to match them exactly and put it in col C?

-if there is something like "123456ABC" in col A and something like "123456AB" in col B, how can I make the program to do near match? and put it in col C?

-also if there is something like "123456PLP" in col A and something like "123456L" in col B, how can I make the program to do near match? and put it in col C?

There are some functions in excel can do exact match but they cannot do near or similar match like these cases. Could anyone help me?
Thanks so much...
 
Upvote 0
Hi,

Your answer is in this thread:

http://www.mrexcel.com/forum/showthread.php?t=195635&highlight=fuzzyhttp://www.mrexcel.com/forum/showthread.php?t=195635&highlight=fuzzy&page=5

Copy and paste the code into a module on your sheet (Alt-F11, new module...)

For example, in my project, I used this formula:

FuzzyVLookup(B3,LIST!$P$2:$P$1680,1,40%,1,2,0)

which translates by: "search (B3) in range (LIST!$P$2:$P$1680), return 1st column, use a 40% percentage confidence index, use default Rank(1), use algorithm 2, no additional column"

Your CPU can run at 100% for a few minutes, just enable the system monitor (ctrl-shift-ESC).

Good luck.
 
Upvote 0
Hi,

Your answer is in this thread:

http://www.mrexcel.com/forum/showthread.php?t=195635&highlight=fuzzyhttp://www.mrexcel.com/forum/showthread.php?t=195635&highlight=fuzzy&page=5

Copy and paste the code into a module on your sheet (Alt-F11, new module...)

For example, in my project, I used this formula:

FuzzyVLookup(B3,LIST!$P$2:$P$1680,1,40%,1,2,0)

which translates by: "search (B3) in range (LIST!$P$2:$P$1680), return 1st column, use a 40% percentage confidence index, use default Rank(1), use algorithm 2, no additional column"

Your CPU can run at 100% for a few minutes, just enable the system monitor (ctrl-shift-ESC).

Good luck.
Hi,
sorry...which code should I use? there are many codes in the thread. which page should I use? thanks
 
Upvote 0
But functions are so much more flexible. Can you post an example of what are you trying to do?

because i got like 50000 data in colA and 1000 data in colB...if i used this function, it took me more than an hour...if its "sub", it should take that long.

say I'm trying to compare 1000 products, which from my customer in my 50000 products, which in my database.
Now I want to put the 50000 data in colA and the 1000 data in colB and compare which I have in my 50000 products.

<table style="border-collapse: collapse; width: 427px; height: 178px;" border="0" cellpadding="0" cellspacing="0"><col style="width: 55pt;" width="73"> <col style="width: 78pt;" width="104"> <col style="width: 67pt;" width="89"> <tbody><tr style="height: 15.75pt;" height="21"> <td style="height: 15.75pt; width: 55pt;" height="21" width="73">My Data</td> <td style="width: 78pt;" width="104">Customer Data</td> <td class="xl65" style="width: 67pt;" width="89">Result I want</td> </tr> <tr style="height: 15.75pt;" height="21"> <td style="height: 15.75pt;" height="21">ColA </td> <td>ColB</td> <td>ColC</td> </tr> <tr style="height: 15.75pt;" height="21"> <td style="height: 15.75pt;" height="21">1231-1 </td> <td>1231-1</td> <td>1231-1</td> </tr> <tr style="height: 15.75pt;" height="21"> <td style="height: 15.75pt;" height="21">542-series </td> <td>542-10 </td> <td>542-series</td> </tr> <tr style="height: 15.75pt;" height="21"> <td style="height: 15.75pt;" height="21">83736-1 </td> <td>83744-2</td> <td>empty cell ""</td> </tr> <tr style="height: 15.75pt;" height="21"> <td style="height: 15.75pt;" height="21">83736-1 </td> <td>83736-3 </td> <td>empty cell ""</td> </tr> </tbody></table>
I think I want a sub can do the exact match, detect when the data encounter "series", it can just re-match the cell before the "series" and also if it very different, give me empty cell.

I know it sounds troublesome but really hope someone could help me...
Thanks so much.
 
Last edited:
Upvote 0
hi,

This sub assumes that the input sheet is 'Sheet1', amend constants as appropriate.
Code:
Option Explicit

Sub MatchData()
Const sInputSheet As String = "Sheet1"
Const sMyDataCol As String = "A"
Const sCustCol As String = "B"
Const sResultCol As String = "C"

Dim iPtr As Integer
Dim lRow As Long, lRowEnd As Long, lResultRow As Long
Dim objCustDictionary As Object
Dim rCur As Range
Dim sCur As String, sCurSeries As String, sCurSplit() As String
Dim sRange As String
Dim wsInput As Worksheet

Set wsInput = Sheets(sInputSheet)

wsInput.Range(sResultCol & "2:" & sResultCol & wsInput.UsedRange.Rows.Count).ClearContents

Set objCustDictionary = Nothing
Set objCustDictionary = CreateObject("Scripting.Dictionary")
lRowEnd = wsInput.Cells(Rows.Count, sCustCol).End(xlUp).Row
For Each rCur In wsInput.Range(sCustCol & "2:" & Cells(lRowEnd, sCustCol).Address)
    sCur = CStr(rCur.Value)
    iPtr = InStr(sCur & "-", "-")
    If iPtr > 1 Then
        sCurSeries = Left$(sCur, iPtr - 1)
        On Error Resume Next
        objCustDictionary.Add key:=sCurSeries, Item:="XXX"
        On Error GoTo 0
    End If
Next rCur

lRowEnd = wsInput.Cells(Rows.Count, sMyDataCol).End(xlUp).Row
For Each rCur In wsInput.Range(sMyDataCol & "2:" & Cells(lRowEnd, sMyDataCol).Address)
    sCur = CStr(rCur.Value)
    lRow = 0
    On Error Resume Next
    lRow = WorksheetFunction.Match(sCur, wsInput.Columns(sCustCol), 0)
    On Error GoTo 0
    If lRow = 0 Then
        sCurSplit = Split(sCur, "-")
        If UBound(sCurSplit) = 1 _
        And LCase$(sCurSplit(1)) = "series" Then
            If objCustDictionary.exists(sCurSplit(0)) Then wsInput.Range(sResultCol & rCur.Row).Value = sCur
        End If
    Else
        wsInput.Range(sResultCol & rCur.Row).Value = sCur
    End If
Next rCur
End Sub
 
Upvote 0
hi,

This sub assumes that the input sheet is 'Sheet1', amend constants as appropriate.
Code:
Option Explicit

Sub MatchData()
Const sInputSheet As String = "Sheet1"
Const sMyDataCol As String = "A"
Const sCustCol As String = "B"
Const sResultCol As String = "C"

Dim iPtr As Integer
Dim lRow As Long, lRowEnd As Long, lResultRow As Long
Dim objCustDictionary As Object
Dim rCur As Range
Dim sCur As String, sCurSeries As String, sCurSplit() As String
Dim sRange As String
Dim wsInput As Worksheet

Set wsInput = Sheets(sInputSheet)

wsInput.Range(sResultCol & "2:" & sResultCol & wsInput.UsedRange.Rows.Count).ClearContents

Set objCustDictionary = Nothing
Set objCustDictionary = CreateObject("Scripting.Dictionary")
lRowEnd = wsInput.Cells(Rows.Count, sCustCol).End(xlUp).Row
For Each rCur In wsInput.Range(sCustCol & "2:" & Cells(lRowEnd, sCustCol).Address)
    sCur = CStr(rCur.Value)
    iPtr = InStr(sCur & "-", "-")
    If iPtr > 1 Then
        sCurSeries = Left$(sCur, iPtr - 1)
        On Error Resume Next
        objCustDictionary.Add key:=sCurSeries, Item:="XXX"
        On Error GoTo 0
    End If
Next rCur

lRowEnd = wsInput.Cells(Rows.Count, sMyDataCol).End(xlUp).Row
For Each rCur In wsInput.Range(sMyDataCol & "2:" & Cells(lRowEnd, sMyDataCol).Address)
    sCur = CStr(rCur.Value)
    lRow = 0
    On Error Resume Next
    lRow = WorksheetFunction.Match(sCur, wsInput.Columns(sCustCol), 0)
    On Error GoTo 0
    If lRow = 0 Then
        sCurSplit = Split(sCur, "-")
        If UBound(sCurSplit) = 1 _
        And LCase$(sCurSplit(1)) = "series" Then
            If objCustDictionary.exists(sCurSplit(0)) Then wsInput.Range(sResultCol & rCur.Row).Value = sCur
        End If
    Else
        wsInput.Range(sResultCol & rCur.Row).Value = sCur
    End If
Next rCur
End Sub

Thanks for your help
but it says "Subscript out of range"
By the way...the customer's data are randomly show in colB and the data which have series can be longer than the example I gave above.
e.g. 020-238-SERIES, 034100AA000PC0A-SERIES

Thank you
 
Upvote 0

Forum statistics

Threads
1,216,207
Messages
6,129,510
Members
449,514
Latest member
swillow

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top