VBA - Compare string of text to another cell with text and give matching percentage

Maarten321

New Member
Joined
Jul 7, 2016
Messages
16
Hi Guys,

I have an issue that i would like to solve. Have been looking on the internet but haven't been able to find anything that does the trick exactly.

To form the picture of the problem: I have a program that runs a query search for me, it gives me back the description and url of a page. I need to somehow get a automated validation method to see what results i want to use and which to reject.

Example:

Column A is the search query input used by another program. (It is B,C,D put together so the other program can run it.)
- Column B is the Brand (from my own database)
- Column C is the Productname (from my own database)
- Column D is the ID (from my own database)

Column E is the Description output from my other program
Column F is the URL output from my other program

What i need the vba to do is to look and see how appropriate the match is between either:

Option1
- Compare A to E and give percentage1
- Compare A to F and give percentage2

--or--

Option2
- Compare B,C,D separately to E and give percentage1
- Compare B,C,D separately to F and give percentage2

I don't mind option 1 or 2 it is more that I want the most accurate results.

Theoretical example:
Disney Pluto Dog Toy comparison to Disney Pluto Toy should give me a 75% matching percentage.


Real life example:
ABCDEFGH
QueryBrandProductnameIDDescriptionUrlPercentage1Percentage2
Ztahl - Dijkos Salerno 3100

<tbody>
</tbody>
Ztahl - Dijkos

<tbody>
</tbody>
Salerno

<tbody>
</tbody>
3100

<tbody>
</tbody>
hanglamp-salerno-rvs-ztahl-3100 ... Ztahl 5-L Hanglamp Salerno RVS. E-mail naar een vriend. Schrijf de eerste beoordeling van dit product. € 287,00. Aantal: ...

<tbody>
</tbody>
http://www.xxxxxxxxxx.nl/hanglamp-salerno-rvs-ztahl-3100.html

<tbody>
</tbody>
Ztahl - Dijkos Salerno 3100

<tbody>
</tbody>
Ztahl - Dijkos

<tbody>
</tbody>
Salerno

<tbody>
</tbody>
3100

<tbody>
</tbody>
Bekijk de meubels, meubelen, bankstellen en fauteuils in de showroom en laat u verrassen door de scherpe prijzen!

<tbody>
</tbody>
https://www.yyyyyyyyy.nl/hanglamp-salerno-3100-dijkos-ztahl.html

<tbody>
</tbody>

<tbody>
</tbody>


Sheetname = "Clean"
Tablename = "Clean"

The <acronym title="visual basic for applications" style="border-width: 0px 0px 1px; border-bottom-style: dotted; border-bottom-color: rgb(0, 0, 0); cursor: help; color: rgb(51, 51, 51); background-color: rgb(250, 250, 250);">vba</acronym> needs to be configured on the table data instead of ranges because they can change from time to time.
The header names in this example are also the real headers of the table.

Help me how i can put this in a <acronym title="visual basic for applications" style="border-width: 0px 0px 1px; border-bottom-style: dotted; border-bottom-color: rgb(0, 0, 0); cursor: help; color: rgb(51, 51, 51); background-color: rgb(250, 250, 250);">vba</acronym>. Your thoughts are appreciated.

Greetz Maarten
 

Excel Facts

Easy bullets in Excel
If you have a numeric keypad, press Alt+7 on numeric keypad to type a bullet in Excel.
only a suggestion:
First idea was to use Levenshtein formula which gives for Pluto match 80%. But now I guess your goal is to count how many words in b:d (regardless delimiter) are present in E or F. In column B you have "Ztahl - Dijkos" but is better to have 2 separate words "Ztahl" and "Dijkos". Using instr function will tell you if a word is present in description or url columns without having to care about delimiters used in url.
 
Upvote 0
Thank you for you suggestion,

I saw the Levenshtein formula and if remember correctly it also takes into account the combination of letters so it would be a close match as in word matching. At this moment i separate all words and use a string search to give me if he finds it or not. Than i have a ****load of filters (also some conditions to exclude the output) to finally come to a concrete result. (and slows my excel sheet down.)

Example:
Masterlight Industria 2007 GunMetal | White 2007-30-B-130-3

<colgroup><col width="355"></colgroup> <tbody>
</tbody>

Would be 6 columns of text to compare (and 6 filters).

The Levenshtein function would give me already more concrete results than i have right now.
 
Upvote 0
I also found the Fuzzy Matching on this forum would that do the trick maybe?: http://www.mrexcel.com/forum/excel-questions/195635-fuzzy-matching-new-version-plus-explanation.html

If i run the code from FuzzyPercent:

Code:
Option ExplicitType RankInfo
    Offset As Integer
    Percentage As Single
End Type


Function FuzzyPercent(ByVal String1 As String, _
                      ByVal String2 As String, _
                      Optional Algorithm As Integer = 3, _
                      Optional Normalised As Boolean = False) As Single
'*************************************
'** Return a % match on two strings **
'*************************************
Dim intLen1 As Integer, intLen2 As Integer
Dim intCurLen As Integer
Dim intTo As Integer
Dim intPos As Integer
Dim intPtr As Integer
Dim intScore As Integer
Dim intTotScore As Integer
Dim intStartPos As Integer
Dim strWork As String


'-------------------------------------------------------
'-- If strings havent been normalised, normalise them --
'-------------------------------------------------------
If Normalised = False Then
    String1 = LCase$(Application.Trim(String1))
    String2 = LCase$(Application.Trim(String2))
End If


'----------------------------------------------
'-- Give 100% match if strings exactly equal --
'----------------------------------------------
If String1 = String2 Then
    FuzzyPercent = 1
    Exit Function
End If


intLen1 = Len(String1)
intLen2 = Len(String2)


'----------------------------------------
'-- Give 0% match if string length < 2 --
'----------------------------------------
If intLen1 < 2 Then
    FuzzyPercent = 0
    Exit Function
End If


intTotScore = 0                   'initialise total possible score
intScore = 0                      'initialise current score


'--------------------------------------------------------
'-- If Algorithm = 1 or 3, Search for single characters --
'--------------------------------------------------------
If (Algorithm And 1) <> 0 Then
    FuzzyAlg1 String1, String2, intScore, intTotScore
    If intLen1 < intLen2 Then FuzzyAlg1 String2, String1, intScore, intTotScore
End If


'-----------------------------------------------------------
'-- If Algorithm = 2 or 3, Search for pairs, triplets etc. --
'-----------------------------------------------------------
If (Algorithm And 2) <> 0 Then
    FuzzyAlg2 String1, String2, intScore, intTotScore
    If intLen1 < intLen2 Then FuzzyAlg2 String2, String1, intScore, intTotScore
End If


FuzzyPercent = intScore / intTotScore


End Function
Private Sub FuzzyAlg1(ByVal String1 As String, _
                      ByVal String2 As String, _
                      ByRef Score As Integer, _
                      ByRef TotScore As Integer)
Dim intLen1 As Integer, intPos As Integer, intPtr As Integer, intStartPos As Integer


intLen1 = Len(String1)
TotScore = TotScore + intLen1              'update total possible score
intPos = 0
For intPtr = 1 To intLen1
    intStartPos = intPos + 1
    intPos = InStr(intStartPos, String2, Mid$(String1, intPtr, 1))
    If intPos > 0 Then
        If intPos > intStartPos + 3 Then     'No match if char is > 3 bytes away
            intPos = intStartPos
        Else
            Score = Score + 1          'Update current score
        End If
    Else
        intPos = intStartPos
    End If
Next intPtr
End Sub
Private Sub FuzzyAlg2(ByVal String1 As String, _
                        ByVal String2 As String, _
                        ByRef Score As Integer, _
                        ByRef TotScore As Integer)
Dim intCurLen As Integer, intLen1 As Integer, intTo As Integer, intPtr As Integer, intPos As Integer
Dim strWork As String


intLen1 = Len(String1)
For intCurLen = 2 To intLen1
    strWork = String2                          'Get a copy of String2
    intTo = intLen1 - intCurLen + 1
    TotScore = TotScore + Int(intLen1 / intCurLen)  'Update total possible score
    For intPtr = 1 To intTo Step intCurLen
        intPos = InStr(strWork, Mid$(String1, intPtr, intCurLen))
        If intPos > 0 Then
            Mid$(strWork, intPos, intCurLen) = String$(intCurLen, &H0) 'corrupt found string
            Score = Score + 1     'Update current score
        End If
    Next intPtr
Next intCurLen


End Sub

Than this is the result: The percentages are run with algorithm number 2
Example:ABCDsomHighest
Ztahl - Dijkos Salerno 3100Ztahl - DijkosSalerno
3100http://www.xxxxxxxxxx.nl/hanglamp-salerno-rvs-ztahl-3100.html9,63%3,49%7,11%3,40%23,63%23,63%
hanglamp-salerno-rvs-ztahl-3100 ... Ztahl 5-L Hanglamp Salerno RVS. E-mail naar een vriend. Schrijf de eerste beoordeling van dit product. € 287,00. Aantal: ...5,33%2,26%2,32%0,87%10,78%

<colgroup><col><col><col><col span="9"></colgroup><tbody>
</tbody>

So basically it doesn't give me high enough results, however with some tweaking it might work.

Anyone who can help me with their vba knowledge to tweak this function?
 
Upvote 0
Hi Marteen321, 2 solutions: 1st is a macro it is coded for a sheet and not a table (I hate tables) but code is easy and you can make amendments. 2nd is a function where ToFind is your query cell (may be a2) and DescUrl is what you look in, description or url.
Hope those help

Code:
Sub test()
firstrow = Sheets("Foglio1").Range("A2").Row
lastrow = Sheets("Foglio1").Range("A1").End(xlDown).Row

For x = firstrow To lastrow
'cleans query cell from strange char
MyText = Trim(Range("A" & x).Text)
    For a = 1 To Len(MyText)
        For c = 0 To 31
            MyText = Replace(MyText, Chr(c), "", 1, -1, vbTextCompare)
        Next c
        For c = 33 To 47
            MyText = Replace(MyText, Chr(c), "", 1, -1, vbTextCompare)
        Next c
        For c = 58 To 64
            MyText = Replace(MyText, Chr(c), "", 1, -1, vbTextCompare)
        Next c
        For c = 123 To 127
            MyText = Replace(MyText, Chr(c), "", 1, -1, vbTextCompare)
        Next c
        MyText = Replace(MyText, "  ", " ", 1, -1, vbTextCompare)
    Next a
'end clean
    sstrings = Split(MyText)
    For y = 0 To UBound(sstrings)
        DesMatch = DesMatch + IIf(InStr(1, Cells(x, 5).Text, sstrings(y), vbTextCompare) > 0, 1, 0)
        UrlMatch = UrlMatch + IIf(InStr(1, Cells(x, 6).Text, sstrings(y), vbTextCompare) > 0, 1, 0)
    Next y

Cells(x, 7).Value = DesMatch / (1 + UBound(sstrings))
Cells(x, 8).Value = UrlMatch / (1 + UBound(sstrings))
DesMatch = 0
UrlMatch = 0
Next x
End Sub

Code:
Public Function CountMatch(ToFind As String, DescUrl As String) As Double

'cleans query cell from strange char
MyText = Trim(ToFind)
    For a = 1 To Len(MyText)
        For c = 0 To 31
            MyText = Replace(MyText, Chr(c), "", 1, -1, vbTextCompare)
        Next c
        For c = 33 To 47
            MyText = Replace(MyText, Chr(c), "", 1, -1, vbTextCompare)
        Next c
        For c = 58 To 64
            MyText = Replace(MyText, Chr(c), "", 1, -1, vbTextCompare)
        Next c
        For c = 123 To 127
            MyText = Replace(MyText, Chr(c), "", 1, -1, vbTextCompare)
        Next c
        MyText = Replace(MyText, "  ", " ", 1, -1, vbTextCompare)
    Next a
'end clean
    sstrings = Split(MyText)
    For y = 0 To UBound(sstrings)
        CountMatch = CountMatch + IIf(InStr(1, DescUrl, sstrings(y), vbTextCompare) > 0, 1, 0)
    Next y

CountMatch = CountMatch / (1 + UBound(sstrings))
End Function
 
Upvote 0

Forum statistics

Threads
1,214,516
Messages
6,119,980
Members
448,934
Latest member
audette89

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top