Splitting a Text String into Identifiable Words and Comparing them to other Cells

chrisbwee

New Member
Joined
Sep 5, 2019
Messages
1
Hi! Newbie here, so please be gentle...

I have two columns of data where all the cells are text strings. I want to be able to compare each cell in the Column 2 with all of the cells in Column 1 to find which cell from column 1 has the most words that match the cell in column 2. None of them will exactly match, but some will match more than others (will have more words that match). I want to find the cell that best matches (has the most matching words) with the selected cell in column 2 and then pull off data relevant to the best matching cell, much like a vlookup would if it was a perfect match between the cells. I have provided a table below to try and better explain:

Column 1Info 1Column 2Info 2
The dog chased the cat down the road1The man loved his dog
The dog ate the man's shoe

<tbody>
</tbody>
2The dog and the cat shared the shoe
The cat and the man loved the shoe

<tbody>
</tbody>
3
A cat is a man's best friend

<tbody>
</tbody>
A dog is a man's best friend

<tbody>
</tbody>
4
The dog chased the man down the road

<tbody>
</tbody>
1

<tbody>
</tbody>

The fourth entry under column 2 best matches (has the most words that match) with the first entry under column 1, therefore it has pulled "1" from Info 1 into Info 2.

I understand that the VBA Split function would at least turn the text string into identifiable words, but where to go from there I do not know...

Your help would be greatly appreciated!
 

MickG

MrExcel MVP
Joined
Jan 9, 2008
Messages
14,841
Try this as a possibility :-
NB:- I see from your data that Your "column 2" is actually column 3.
Results in column 4.
Code:
[COLOR="Navy"]Sub[/COLOR] MG05Sep18
[COLOR="Navy"]Dim[/COLOR] RngA [COLOR="Navy"]As[/COLOR] Range, DnA [COLOR="Navy"]As[/COLOR] Range, DnC [COLOR="Navy"]As[/COLOR] Range, RngC [COLOR="Navy"]As[/COLOR] Range, Sp [COLOR="Navy"]As[/COLOR] Variant, Num [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long[/COLOR]
[COLOR="Navy"]Dim[/COLOR] n [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] c [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] p [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] oMax [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long[/COLOR]

[COLOR="Navy"]Set[/COLOR] RngA = Range(Range("A2"), Range("A" & Rows.Count).End(xlUp))
[COLOR="Navy"]Set[/COLOR] RngC = Range(Range("C2"), Range("c" & Rows.Count).End(xlUp))
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] DnA [COLOR="Navy"]In[/COLOR] RngA
   ReDim ray(1 To RngA.Count, 1 To 2)
    c = 0
   [COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] DnC [COLOR="Navy"]In[/COLOR] RngC
    Sp = Split(DnC, " ")
       [COLOR="Navy"]For[/COLOR] n = 0 To UBound(Sp)
        [COLOR="Navy"]If[/COLOR] InStr(1, DnA, Sp(n), vbTextCompare) > 0 [COLOR="Navy"]Then[/COLOR]
            Num = Num + 1
        [COLOR="Navy"]End[/COLOR] If
      [COLOR="Navy"]Next[/COLOR] n
        c = c + 1
        ray(c, 1) = Num: ray(c, 2) = DnC.Row - 1
        Num = 0
   [COLOR="Navy"]Next[/COLOR] DnC
       [COLOR="Navy"]For[/COLOR] n = 1 To UBound(ray)
         [COLOR="Navy"]If[/COLOR] ray(n, 1) > oMax [COLOR="Navy"]Then[/COLOR]
           oMax = ray(n, 1)
           p = ray(n, 2)
        [COLOR="Navy"]End[/COLOR] If
      [COLOR="Navy"]Next[/COLOR] n
     RngC(p).Offset(, 1) = RngC(p).Offset(, 1) & IIf(RngC(p).Offset(, 1) = "", DnA.Offset(, 1), "," & DnA.Offset(, 1))
    oMax = 0
[COLOR="Navy"]Next[/COLOR] DnA
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]
Regards Mick
 

Yongle

Well-known Member
Joined
Mar 11, 2015
Messages
4,672
Office Version
365
Platform
Windows
1. Not sure what "best match" means when there are multiple incidences of the same word
2. Do you want to exclude cetain words (like "the","is")

example
Which is the "best" match for The dog is happy below ? Is it cell1 or cell2

cell1 ( 13 matches , 8 matches if "the" excluded, 5 matches if "is" also excluded)
The head of a dog is in front of the tail of a dog and the dog is grateful that the tail of a dog is not in front of the head of a dog

cel 2 ( 4 matches )
The happy dog is asleep


3. I do not understand why there is nothing against 3 of the cells in column 2
- they all have some kind of match
- why does only one cell return a value ?


4 The dog is happy is being matched with The dog is very happy and The sleeping dog is very happy
- are they ranked equally ?

EDIT

I notice that @MickG has posted something whilst I was dozing!
- so I will stand back for the time being
 
Last edited:

mohadin

Active Member
Joined
Mar 22, 2015
Messages
361
Office Version
2013
Platform
Windows
How about
With the same result as MicKG
Could redefine if the result acceptable
Code:
Sub teste()
    gmth = 0
    For i = 2 To 5
        myt = "(" & Join(Split(Join(Split(Cells(i, 1), " "), " (")), ")|") & ")"
        With CreateObject("VBScript.RegExp")
            .Global = True
            .Pattern = myt
            For j = 2 To 5
                Set mtch = .Execute(Cells(j, 3))
                If mtch.Count > gmth Then
                    gmth = mtch.Count
                    l = j
                End If
            Next
            Cells(l, 4) = Cells(l, 4).Value & " " & i - 1
            gmth = 0
        End With
    Next
End Sub
 

Forum statistics

Threads
1,082,135
Messages
5,363,344
Members
400,729
Latest member
Lisa McConachy

Some videos you may like

This Week's Hot Topics

  • populate from drop list with multiple tables
    Hi All, i have a drop list that displays data, what i want is when i select one of those from the list to populate text from different tables on...
  • Find list of words from sheet2 in sheet1 before a comma and extract text vba
    Hi Friends, Trying to find the solution on my task. But did not find suitable one to the need. Here is my query and sample file with details...
  • Dynamic Formula entry - VBA code sought
    Hello, really hope one of you experts can help with this - i've spent hours on this and getting no-where. .I have a set of data (more rows than...
  • Listbox Header
    Have a named range called "AccidentsHeader" Within my code I have: [CODE]Private Sub CommandButton1_Click() ListBox1.RowSource =...
  • Complex Heat Map using conditional formatting
    Good day excel world. I have a concern. Below link have a list of countries that carries each country unique data. [URL...
  • Conditional formatting
    Hi good morning, hope you can help me please, I have cells P4:P54 and if this cell is equal to 1 then i want row O to say "Fully Utilised" and to...
Top