Extract Keyword from text string

LambertyE

New Member
Joined
Nov 19, 2020
Messages
9
Office Version
  1. 2016
Platform
  1. Windows
Hi,

I am looking on how to simplify this formula since the list of keywords keep growing and the formula longer.

What I am trying to do is extract a keyword from a free type text.

1. Create a list of keywords in A1:A36.
2. Enter unformatted text in B1, B2, to B100 (just as an example)
3. Return results of match of keywords in A1:A20 in C1, C2, etc to C100.

So, spreading this out:

1. Check the text string in B1 for the ONE text string in A1, and return this in C1.
2. Check the text string in B1 for the ONE text string in A2, and return this in C2, and continue so that C3, C4, C5 etc are the matches against A1:A20 in B1.
3. Repeat for entire range A1:A20 against B2 to B100.

So, I applied the following formula but as I mentioned above the keyword list keeps growing and concern about the accuracy of the formula if I keep adding to the function of it.

=TRIM(IF(ISNUMBER(SEARCH($Z$2,AA2)),$Z$2,"")&" "&IF(ISNUMBER(SEARCH($Z$3,AA2)),$Z$3,"")&" "&IF(ISNUMBER(SEARCH($Z$4,AA2)),$Z$4,"")&" "&IF(ISNUMBER(SEARCH($Z$5,AA2)),$Z$5,"")&" "&IF(ISNUMBER(SEARCH($Z$6,AA2)),$Z$6,"")&" "&IF(ISNUMBER(SEARCH($Z$7,AA2)),$Z$7,"")&" "&IF(ISNUMBER(SEARCH($Z$8,AA2)),$Z$8,"")&" "&IF(ISNUMBER(SEARCH($Z$9,AA2)),$Z$9,"")&" "&IF(ISNUMBER(SEARCH($Z$10,AA2)),$Z$10,"")&" "&IF(ISNUMBER(SEARCH($Z$10,AA2)),$Z$10,"")&" "&IF(ISNUMBER(SEARCH($Z$11,AA2)),$Z$11,"")&" "&IF(ISNUMBER(SEARCH($Z$12,AA2)),$Z$12,"")&" "&IF(ISNUMBER(SEARCH($Z$13,AA2)),$Z$13,"")&" "&IF(ISNUMBER(SEARCH($Z$14,AA2)),$Z$14,"")&" "&IF(ISNUMBER(SEARCH($Z$15,AA2)),$Z$15,"")&" "&IF(ISNUMBER(SEARCH($Z$16,AA2)),$Z$16,"")&" "&IF(ISNUMBER(SEARCH($Z$17,AA2)),$Z$17,"")&" "&IF(ISNUMBER(SEARCH($Z$18,AA2)),$Z$18,"")&" "&IF(ISNUMBER(SEARCH($Z$19,AA2)),$Z$19,"")&" "&IF(ISNUMBER(SEARCH($Z$20,AA2)),$Z$20,"")&" "&IF(ISNUMBER(SEARCH($Z$21,AA2)),$Z$21,"")&" "&IF(ISNUMBER(SEARCH($Z$22,AA2)),$Z$22,"")&" "&IF(ISNUMBER(SEARCH($Z$23,AA2)),$Z$23,"")&" "&IF(ISNUMBER(SEARCH($Z$24,AA2)),$Z$24,"")&" "&IF(ISNUMBER(SEARCH($Z$25,AA2)),$Z$25,"")&" "&IF(ISNUMBER(SEARCH($Z$26,AA2)),$Z$26,"")&" "&IF(ISNUMBER(SEARCH($Z$27,AA2)),$Z$27,""))&" "&IF(ISNUMBER(SEARCH($Z$28,AA2)),$Z$28,"")&" "&IF(ISNUMBER(SEARCH($Z$29,AA2)),$Z$29,"")&" "&IF(ISNUMBER(SEARCH($Z$30,AA2)),$Z$30,"") &" "&IF(ISNUMBER(SEARCH($Z$31,AA2)),$Z$31,"")&" "&IF(ISNUMBER(SEARCH($Z$32,AA2)),$Z$32,"")&" "&IF(ISNUMBER(SEARCH($Z$33,AA2)),$Z$33,"")&" "&IF(ISNUMBER(SEARCH($Z$34,AA2)),$Z$34,"")&" "&IF(ISNUMBER(SEARCH($Z$35,AA2)),$Z$35,"")&" "&IF(ISNUMBER(SEARCH($Z$36,AA2)),$Z$36,"")

Any way this can be simplified? Help please!

Thanks in advance!

Evelyn L
 

Excel Facts

Which lookup functions find a value equal or greater than the lookup value?
MATCH uses -1 to find larger value (lookup table must be sorted ZA). XLOOKUP uses 1 to find values greater and does not need to be sorted.
Here is a way using Power Query.

I created a table for Column A (tbl_Key). Created a table for Column B (tbl_Text).

No steps in PQ for tbl_Key.

Steps for tbl_Text:
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1)
in
    #"Added Index"

Then in PQ you can add a blank query, enter the Advanced Editor, paste the following code, then load the table to cell C1.
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"Text", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Text"),
    #"Replaced Value" = Table.TransformColumns(#"Split Column by Delimiter",{{"Text", each Text.Select(_, {"A".."Z","a".."z"})}}),
    #"Merged Queries" = Table.NestedJoin(#"Replaced Value",{"Text"},tbl_Key,{"Keywords"},"tbl_Key",JoinKind.LeftOuter),
    #"Expanded tbl_Key" = Table.ExpandTableColumn(#"Merged Queries", "tbl_Key", {"Keywords"}, {"Keywords"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded tbl_Key", each ([Keywords] <> null)),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Index"}, {{"Count", each _, type table}}),
    #"Distinct Rows" = Table.TransformColumns(#"Grouped Rows",{{"Count", each Table.Distinct(_)}}),
    #"Added Custom" = Table.AddColumn(#"Distinct Rows", "Match", each Text.Combine(Table.Column([Count],"Keywords"),", ")),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Match"})
in
    #"Removed Other Columns"

Book1
ABC
1KeywordsTextMatch
2catThe dog chased the cat.dog, cat
3dogThe cat ate the bird.cat, bird
4birdThe cat is a cat.cat
5The dog, bird, and cat.dog, bird, cat
Sheet1
 
Upvote 0
Here is another way using VBA.

Book1
ABCDE
1KeywordsTextMatchVBA
2catThe dog chased the cat.dog, catdog, cat
3dogThe cat ate the bird.cat, birdcat, bird
4birdThe cat is a cat.catcat, cat
5The dog, bird, and cat.dog, bird, catdog, bird, cat
6Dog's are big.Dog
7I am the lizard king.None Found
Sheet1
Cell Formulas
RangeFormula
E2:E7E2=getKeywords($A$2:$A$4,B2)


VBA Code:
Function getKeywords(keys As Range, s As String) As String
Dim AL As Object:   Set AL = CreateObject("System.Collections.ArrayList")
Dim RES As Object:  Set RES = CreateObject("System.Collections.ArrayList")
Dim SP() As String

For Each Key In keys
    AL.Add LCase(Key)
Next Key

With CreateObject("VBScript.RegExp")
    .Global = True
    .ignorecase = True
    .Pattern = "[A-Za-z]+"
    Set matches = .Execute(s)
    For i = 0 To matches.Count - 1
        SP = Split(matches(i), "'")
        For j = LBound(SP) To UBound(SP)
            If AL.contains(LCase(SP(j))) Then RES.Add SP(j)
        Next j
    Next i
End With

If RES.Count > 0 Then
    getKeywords = Join(RES.toArray(), ", ")
Else
    getKeywords = "None Found"
End If
End Function
 
Upvote 0
Welcome to the MrExcel board!

Could you give us a small set of sample data and expected results, preferably with XL2BB so that we can easily copy for testing?
The sample only needs, say, 4 - 6 key words and 8 - 10 sample texts.
However, things that the samples (or you directly) should make clear are ...
  • If a key word is "Cat" and the text is "The cat sat on the mat", do you want 'cat" (or "Cat") returned even though the match is not exact for upper/lower case?
  • If a key word is "cat" and the text is "The birds scattered", do you want "cat" returned even though it is not a word in the text?
  • If the key word list includes both "cat" and "mat" and the text is "The cat sat on the mat", what do you want returned?
  • Whether your texts include any punctuation and, if so, what sort of variety of punctuation is there?
 
Upvote 0
Here is another way using VBA.

Book1
ABCDE
1KeywordsTextMatchVBA
2catThe dog chased the cat.dog, catdog, cat
3dogThe cat ate the bird.cat, birdcat, bird
4birdThe cat is a cat.catcat, cat
5The dog, bird, and cat.dog, bird, catdog, bird, cat
6Dog's are big.Dog
7I am the lizard king.None Found
Sheet1
Cell Formulas
RangeFormula
E2:E7E2=getKeywords($A$2:$A$4,B2)


VBA Code:
Function getKeywords(keys As Range, s As String) As String
Dim AL As Object:   Set AL = CreateObject("System.Collections.ArrayList")
Dim RES As Object:  Set RES = CreateObject("System.Collections.ArrayList")
Dim SP() As String

For Each Key In keys
    AL.Add LCase(Key)
Next Key

With CreateObject("VBScript.RegExp")
    .Global = True
    .ignorecase = True
    .Pattern = "[A-Za-z]+"
    Set matches = .Execute(s)
    For i = 0 To matches.Count - 1
        SP = Split(matches(i), "'")
        For j = LBound(SP) To UBound(SP)
            If AL.contains(LCase(SP(j))) Then RES.Add SP(j)
        Next j
    Next i
End With

If RES.Count > 0 Then
    getKeywords = Join(RES.toArray(), ", ")
Else
    getKeywords = "None Found"
End If
End Function
This is brilliant! After searching on the web for a couple of weeks and trying all sorts of formulas, this is exactly what I need!

I have one question though and is it possible to limit the response to a single instance of the keyword? Example, if the text string has 6 instances of the word 'Cat', the function currently will return 'Cat, Cat, Cat, Cat, Cat, Cat'. Is it possible that it only shows the word once? I'm using it to quickly scan entries in a log for keywords (horizontal themes basically) that we can then filter quickly to identify entries with these themes.

Grateful for your help!
 
Last edited by a moderator:
Upvote 0
This should do the trick.

VBA Code:
Function getKeywords(keys As Range, s As String) As String
Dim AL As Object:   Set AL = CreateObject("System.Collections.ArrayList")
Dim RES As Object:  Set RES = CreateObject("System.Collections.ArrayList")
Dim SP() As String

For Each Key In keys
    AL.Add LCase(Key)
Next Key

With CreateObject("VBScript.RegExp")
    .Global = True
    .ignorecase = True
    .Pattern = "[A-Za-z]+"
    Set matches = .Execute(s)
    For i = 0 To matches.Count - 1
        SP = Split(matches(i), "'")
        For j = LBound(SP) To UBound(SP)
            If AL.contains(LCase(SP(j))) Then If Not RES.contains(SP(j)) Then RES.Add SP(j)
        Next j
    Next i
End With

If RES.Count > 0 Then
    getKeywords = Join(RES.toArray(), ", ")
Else
    getKeywords = "None Found"
End If
End Function
 
Upvote 0
just for fun
KeywordsTextTextList
catThe dog chased the cat.The dog chased the cat.cat, dog
dogThe cat ate the bird.The cat ate the bird.cat, bird
birdThe cat is a cat.The cat is a cat.cat
The dog, bird, and cat.The dog, bird, and cat.cat, dog, bird

Power Query:
let
    KW = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
    Join = Table.AddColumn(Source, "Custom", each KW),
    Expand = Table.ExpandTableColumn(Join, "Custom", {"Keywords"}, {"Keywords"}),
    IF = Table.AddColumn(Expand, "Custom", each if Text.Contains([Text], [Keywords]) then [Keywords] else null),
    Group = Table.Group(IF, {"Text"}, {{"Count", each _, type table}}),
    List = Table.AddColumn(Group, "List", each List.Distinct([Count][Custom])),
    Extract = Table.TransformColumns(List, {"List", each Text.Combine(List.Transform(_, Text.From), ", "), type text})
in
    Extract
 
Upvote 0
This would be my user-defined function.

VBA Code:
Function GetWords(KeyWds As Range, s As String) As String
  Dim RX As Object, d As Object, M As Object
  
  Set d = CreateObject("Scripting.Dictionary")
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = "\b(" & LCase(Join(Application.Transpose(KeyWds), "|")) & ")(?=\b|')"
  For Each M In RX.Execute(LCase(s))
    d(CStr(M)) = 1
  Next M
  GetWords = Join(d.keys, ", ")
End Function

ACUK.xlsm
ABC
1KeywordsTextResult
2catThe dog chased the catdog, cat
3dogThe cat ate the bird.cat, bird
4birdThe cat is a cat.cat
5The dog, bird, and cat.dog, bird, cat
6Dog's are big.dog
7I am the lizard king. 
8Big Dog small dog small cat medium dogdog, cat
Keywords
Cell Formulas
RangeFormula
C2:C8C2=GetWords(A$2:A$4,B2)
 
Upvote 0
Nice Pete. Looks the use of the negative lookahead.
 
Upvote 0
Nice Pete. Looks the use of the negative lookahead.
Thanks. It is 'lookahead' though, not 'negative lookahead'. ;)

.. and now that I think about it, lookahead is not required since an apostrophe triggers a word boundary anyway, so below code should do just as well.

BTW, your udf returns both 'dog' and 'Dog' for the sample data in my row 8. :)

Rich (BB code):
Function GetWords(KeyWds As Range, s As String) As String
  Dim RX As Object, d As Object, M As Object
 
  Set d = CreateObject("Scripting.Dictionary")
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = "\b(" & LCase(Join(Application.Transpose(KeyWds), "|")) & ")(?=\b|')"
  RX.Pattern = "\b(" & LCase(Join(Application.Transpose(KeyWds), "|")) & ")\b"
  For Each M In RX.Execute(LCase(s))
    d(CStr(M)) = 1
  Next M
  GetWords = Join(d.keys, ", ")
End Function
 
Upvote 0

Forum statistics

Threads
1,213,490
Messages
6,113,957
Members
448,535
Latest member
alrossman

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top