Extract Keyword from text string

LambertyE

New Member
Joined
Nov 19, 2020
Messages
9
Office Version
  1. 2016
Platform
  1. Windows
Hi,

I am looking on how to simplify this formula since the list of keywords keep growing and the formula longer.

What I am trying to do is extract a keyword from a free type text.

1. Create a list of keywords in A1:A36.
2. Enter unformatted text in B1, B2, to B100 (just as an example)
3. Return results of match of keywords in A1:A20 in C1, C2, etc to C100.

So, spreading this out:

1. Check the text string in B1 for the ONE text string in A1, and return this in C1.
2. Check the text string in B1 for the ONE text string in A2, and return this in C2, and continue so that C3, C4, C5 etc are the matches against A1:A20 in B1.
3. Repeat for entire range A1:A20 against B2 to B100.

So, I applied the following formula but as I mentioned above the keyword list keeps growing and concern about the accuracy of the formula if I keep adding to the function of it.

=TRIM(IF(ISNUMBER(SEARCH($Z$2,AA2)),$Z$2,"")&" "&IF(ISNUMBER(SEARCH($Z$3,AA2)),$Z$3,"")&" "&IF(ISNUMBER(SEARCH($Z$4,AA2)),$Z$4,"")&" "&IF(ISNUMBER(SEARCH($Z$5,AA2)),$Z$5,"")&" "&IF(ISNUMBER(SEARCH($Z$6,AA2)),$Z$6,"")&" "&IF(ISNUMBER(SEARCH($Z$7,AA2)),$Z$7,"")&" "&IF(ISNUMBER(SEARCH($Z$8,AA2)),$Z$8,"")&" "&IF(ISNUMBER(SEARCH($Z$9,AA2)),$Z$9,"")&" "&IF(ISNUMBER(SEARCH($Z$10,AA2)),$Z$10,"")&" "&IF(ISNUMBER(SEARCH($Z$10,AA2)),$Z$10,"")&" "&IF(ISNUMBER(SEARCH($Z$11,AA2)),$Z$11,"")&" "&IF(ISNUMBER(SEARCH($Z$12,AA2)),$Z$12,"")&" "&IF(ISNUMBER(SEARCH($Z$13,AA2)),$Z$13,"")&" "&IF(ISNUMBER(SEARCH($Z$14,AA2)),$Z$14,"")&" "&IF(ISNUMBER(SEARCH($Z$15,AA2)),$Z$15,"")&" "&IF(ISNUMBER(SEARCH($Z$16,AA2)),$Z$16,"")&" "&IF(ISNUMBER(SEARCH($Z$17,AA2)),$Z$17,"")&" "&IF(ISNUMBER(SEARCH($Z$18,AA2)),$Z$18,"")&" "&IF(ISNUMBER(SEARCH($Z$19,AA2)),$Z$19,"")&" "&IF(ISNUMBER(SEARCH($Z$20,AA2)),$Z$20,"")&" "&IF(ISNUMBER(SEARCH($Z$21,AA2)),$Z$21,"")&" "&IF(ISNUMBER(SEARCH($Z$22,AA2)),$Z$22,"")&" "&IF(ISNUMBER(SEARCH($Z$23,AA2)),$Z$23,"")&" "&IF(ISNUMBER(SEARCH($Z$24,AA2)),$Z$24,"")&" "&IF(ISNUMBER(SEARCH($Z$25,AA2)),$Z$25,"")&" "&IF(ISNUMBER(SEARCH($Z$26,AA2)),$Z$26,"")&" "&IF(ISNUMBER(SEARCH($Z$27,AA2)),$Z$27,""))&" "&IF(ISNUMBER(SEARCH($Z$28,AA2)),$Z$28,"")&" "&IF(ISNUMBER(SEARCH($Z$29,AA2)),$Z$29,"")&" "&IF(ISNUMBER(SEARCH($Z$30,AA2)),$Z$30,"") &" "&IF(ISNUMBER(SEARCH($Z$31,AA2)),$Z$31,"")&" "&IF(ISNUMBER(SEARCH($Z$32,AA2)),$Z$32,"")&" "&IF(ISNUMBER(SEARCH($Z$33,AA2)),$Z$33,"")&" "&IF(ISNUMBER(SEARCH($Z$34,AA2)),$Z$34,"")&" "&IF(ISNUMBER(SEARCH($Z$35,AA2)),$Z$35,"")&" "&IF(ISNUMBER(SEARCH($Z$36,AA2)),$Z$36,"")

Any way this can be simplified? Help please!

Thanks in advance!

Evelyn L
 

lrobbo314

Well-known Member
Joined
Jul 14, 2008
Messages
3,028
Office Version
  1. 365
Platform
  1. Windows
Yeah, didn't know if they really cared about case sensitivity or not. Thanks for the regex info. Advanced regex topics are a bit beyond me. But I totally dig it.
 

Some videos you may like

Excel Facts

Wildcard in VLOOKUP
Use =VLOOKUP("Apple*" to find apple, Apple, or applesauce

Peter_SSs

MrExcel MVP, Moderator
Joined
May 28, 2005
Messages
47,926
Office Version
  1. 365
Platform
  1. Windows
didn't know if they really cared about case sensitivity or not.
Going back to the original poster from post 1, they were not interested in case-sensitivity since they had a successful formula using SEARCH, not FIND.

As for the new poster (ACUK) we don't really know, but my guess would be that if they are interested in the case match then perhaps "Dog" should not be returned anywhere (row 8 or row 6) since "Dog" is not in the list of Keywords. We need to hear back from ACUK to be sure. :)
 

KP117

New Member
Joined
Oct 22, 2020
Messages
24
Office Version
  1. 365
  2. 2019
Platform
  1. Windows
At C2, try this
=SUBSTITUTE(TRIM(CONCAT(IFERROR(INDEX($A:$A,--RIGHT(AGGREGATE(15,6,FIND(($A$2:$A$4),B3)+ROW($A$2:$A$4)/100,ROW($A$2:$A$4)-1),2)),"")&" "))," ",", ")
 

Peter_SSs

MrExcel MVP, Moderator
Joined
May 28, 2005
Messages
47,926
Office Version
  1. 365
Platform
  1. Windows
At C2, try this
=SUBSTITUTE(TRIM(CONCAT(IFERROR(INDEX($A:$A,--RIGHT(AGGREGATE(15,6,FIND(($A$2:$A$4),B3)+ROW($A$2:$A$4)/100,ROW($A$2:$A$4)-1),2)),"")&" "))," ",", ")
Some comments on that formula:
  • It isn't entirely clear what either questioner wants but if looking for words in the text, i note that your formula returns both 'dog' and 'cat' for the text "The underdogs scattered" when neither 'dog' nor 'cat' is a word in that string.
  • After entering your formula, if a user inserts any new rows at the top of the sheet, it begins to lose results.

Given that ACUK has Excel 365, here is another formula using worksheet functions that could be considered. Punctuation is usually a problem when looking for whole words & I have allowed for ".", "'" and "," only. More could be added if required.

ACUK.xlsm
ABC
1KeywordsTextResult
2catThe dog chased the catcat, dog
3dogThe cat ate the bird.cat, bird
4birdThe cat is a cat.cat
5The dog, bird, and cat.cat, dog, bird
6Dog's are big.dog
7I am the lizard king. 
8Big Dog small dog small cat medium dogcat, dog
9The underdogs scattered 
10 
Keywords
Cell Formulas
RangeFormula
C2:C10C2=TEXTJOIN(", ",1,FILTER($A$2:$A$4,ISNUMBER(SEARCH(" "&A$2:A$4&" "," "&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"."," "),","," "),"'"," ")&" ")),""))
 

ACUK

New Member
Joined
Dec 23, 2020
Messages
3
Office Version
  1. 365
Platform
  1. Windows

ADVERTISEMENT

This should do the trick.

VBA Code:
Function getKeywords(keys As Range, s As String) As String
Dim AL As Object:   Set AL = CreateObject("System.Collections.ArrayList")
Dim RES As Object:  Set RES = CreateObject("System.Collections.ArrayList")
Dim SP() As String

For Each Key In keys
    AL.Add LCase(Key)
Next Key

With CreateObject("VBScript.RegExp")
    .Global = True
    .ignorecase = True
    .Pattern = "[A-Za-z]+"
    Set matches = .Execute(s)
    For i = 0 To matches.Count - 1
        SP = Split(matches(i), "'")
        For j = LBound(SP) To UBound(SP)
            If AL.contains(LCase(SP(j))) Then If Not RES.contains(SP(j)) Then RES.Add SP(j)
        Next j
    Next i
End With

If RES.Count > 0 Then
    getKeywords = Join(RES.toArray(), ", ")
Else
    getKeywords = "None Found"
End If
End Function
Thanks a million!
 

Peter_SSs

MrExcel MVP, Moderator
Joined
May 28, 2005
Messages
47,926
Office Version
  1. 365
Platform
  1. Windows
if the text string has 6 instances of the word 'Cat', the function currently will return 'Cat, Cat, Cat, Cat, Cat, Cat'. Is it possible that it only shows the word once?

Thanks a million!
Just checking then that if a keyword is "dog" and the text is "Big DOG small dog Medium Dog" that you want all three versions of the word returned by the function?
 

ACUK

New Member
Joined
Dec 23, 2020
Messages
3
Office Version
  1. 365
Platform
  1. Windows
Just checking then that if a keyword is "dog" and the text is "Big DOG small dog Medium Dog" that you want all three versions of the word returned by the function?
Ideally, it'll return just one instance of the word, regardless of the case used.

So for example, if the string of text has 'COVID' and 'Covid', the code will return just the single 'Covid' (based on how it was spelt under the keywords column).

Many thanks again.
 

Peter_SSs

MrExcel MVP, Moderator
Joined
May 28, 2005
Messages
47,926
Office Version
  1. 365
Platform
  1. Windows
Ideally, it'll return just one instance of the word, regardless of the case used.

So for example, if the string of text has 'COVID' and 'Covid', the code will return just the single 'Covid' (based on how it was spelt under the keywords column).
Might be worth trying some of the other suggestions made in the thread then since more than one of them seem to do exactly that? :)
 

Watch MrExcel Video

Forum statistics

Threads
1,122,777
Messages
5,598,020
Members
414,204
Latest member
Excelmee

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top