Results 1 to 4 of 4

Thread: Splitting a Text String into Identifiable Words and Comparing them to other Cells
Thanks Thanks: 0 Likes Likes: 0

  1. #1
    New Member
    Join Date
    Sep 2019
    Posts
    1
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Splitting a Text String into Identifiable Words and Comparing them to other Cells

    Hi! Newbie here, so please be gentle...

    I have two columns of data where all the cells are text strings. I want to be able to compare each cell in the Column 2 with all of the cells in Column 1 to find which cell from column 1 has the most words that match the cell in column 2. None of them will exactly match, but some will match more than others (will have more words that match). I want to find the cell that best matches (has the most matching words) with the selected cell in column 2 and then pull off data relevant to the best matching cell, much like a vlookup would if it was a perfect match between the cells. I have provided a table below to try and better explain:

    Column 1 Info 1 Column 2 Info 2
    The dog chased the cat down the road 1 The man loved his dog
    The dog ate the man's shoe
    2 The dog and the cat shared the shoe
    The cat and the man loved the shoe
    3
    A cat is a man's best friend
    A dog is a man's best friend
    4
    The dog chased the man down the road
    1

    The fourth entry under column 2 best matches (has the most words that match) with the first entry under column 1, therefore it has pulled "1" from Info 1 into Info 2.

    I understand that the VBA Split function would at least turn the text string into identifiable words, but where to go from there I do not know...

    Your help would be greatly appreciated!

  2. #2
    MrExcel MVP
    Join Date
    Jan 2008
    Posts
    14,837
    Post Thanks / Like
    Mentioned
    26 Post(s)
    Tagged
    12 Thread(s)

    Default Re: Splitting a Text String into Identifiable Words and Comparing them to other Cells

    Try this as a possibility :-
    NB:- I see from your data that Your "column 2" is actually column 3.
    Results in column 4.
    Code:
    Sub MG05Sep18
    Dim RngA As Range, DnA As Range, DnC As Range, RngC As Range, Sp As Variant, Num As Long
    Dim n As Long, c As Long, p As Long, oMax As Long
    
    Set RngA = Range(Range("A2"), Range("A" & Rows.Count).End(xlUp))
    Set RngC = Range(Range("C2"), Range("c" & Rows.Count).End(xlUp))
    For Each DnA In RngA
       ReDim ray(1 To RngA.Count, 1 To 2)
        c = 0
       For Each DnC In RngC
        Sp = Split(DnC, " ")
           For n = 0 To UBound(Sp)
            If InStr(1, DnA, Sp(n), vbTextCompare) > 0 Then
                Num = Num + 1
            End If
          Next n
            c = c + 1
            ray(c, 1) = Num: ray(c, 2) = DnC.Row - 1
            Num = 0
       Next DnC
           For n = 1 To UBound(ray)
             If ray(n, 1) > oMax Then
               oMax = ray(n, 1)
               p = ray(n, 2)
            End If
          Next n
         RngC(p).Offset(, 1) = RngC(p).Offset(, 1) & IIf(RngC(p).Offset(, 1) = "", DnA.Offset(, 1), "," & DnA.Offset(, 1))
        oMax = 0
    Next DnA
    End Sub
    Regards Mick

  3. #3
    Board Regular
    Join Date
    Mar 2015
    Posts
    3,901
    Post Thanks / Like
    Mentioned
    72 Post(s)
    Tagged
    7 Thread(s)

    Default Re: Splitting a Text String into Identifiable Words and Comparing them to other Cells

    1. Not sure what "best match" means when there are multiple incidences of the same word
    2. Do you want to exclude cetain words (like "the","is")

    example
    Which is the "best" match for The dog is happy below ? Is it cell1 or cell2

    cell1 ( 13 matches , 8 matches if "the" excluded, 5 matches if "is" also excluded)
    The head of a dog is in front of the tail of a dog and the dog is grateful that the tail of a dog is not in front of the head of a dog

    cel 2 ( 4 matches )
    The happy dog is asleep


    3. I do not understand why there is nothing against 3 of the cells in column 2
    - they all have some kind of match
    - why does only one cell return a value ?


    4 The dog is happy is being matched with The dog is very happy and The sleeping dog is very happy
    - are they ranked equally ?

    EDIT

    I notice that @MickG has posted something whilst I was dozing!
    - so I will stand back for the time being
    Last edited by Yongle; Sep 5th, 2019 at 09:49 AM.

  4. #4
    Board Regular
    Join Date
    Mar 2015
    Location
    Syria
    Posts
    256
    Post Thanks / Like
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Splitting a Text String into Identifiable Words and Comparing them to other Cells

    How about
    With the same result as MicKG
    Could redefine if the result acceptable
    Code:
    Sub teste()
        gmth = 0
        For i = 2 To 5
            myt = "(" & Join(Split(Join(Split(Cells(i, 1), " "), " (")), ")|") & ")"
            With CreateObject("VBScript.RegExp")
                .Global = True
                .Pattern = myt
                For j = 2 To 5
                    Set mtch = .Execute(Cells(j, 3))
                    If mtch.Count > gmth Then
                        gmth = mtch.Count
                        l = j
                    End If
                Next
                Cells(l, 4) = Cells(l, 4).Value & " " & i - 1
                gmth = 0
            End With
        Next
    End Sub

Some videos you may like

User Tag List

Tags for this Thread

Like this thread? Share it with others

Like this thread? Share it with others

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •