Extract multiple keywords from text string

HughT

Board Regular
I have searched the forum but I can't find anything that quite matches this.

This is for a healthy eating project. Students record their meals in a daily diary. This is random unformatted text in a single cell (B1, B2 etc). There is a keyword list of healthy foods, each item being in a separate cell. What I would like to do is search the random text for occurrences of the keywords and return the keywords in another cell adjacent to the text cell. I would then like to be able to search the returned cells by the keyword list.

So:

Keywords (each in a separate cell, but doesn't have to be in Column A):

A1 Apple
A2 Fries
A3 Salad
A4 Burger
etc

Text (in B1)
Today I ate a burger with fries, and had an apple afterwards.

Result (in C1)
Apple Fries Burger [order is not important]

C1 to C20 (etc) will be the searchable data. I want to be able to search this by each keyword in the range A1:A4, ie 'Apple', 'Fries', 'Salad' etc so I can see who has been eating Apples, Fries, etc. Using column filters will display the contents of every cell, so if some comedian enters the whole range A1:A4 (which will actually be much larger) the filter will also return the whole range, so I need an alternative method.

I possible I would like to do this by a formula rather than VBA as I have to hand this over to someone who will not understand VBA, and can add to or alter the contents of the lookup range (A1:A4) simply by adding to it or overtyping the existing contents.

Thank you for your help.
 

bosco_yip

Well-known Member
Try,

In C1 enter:

=TRIM(IF(ISNUMBER(SEARCH(A1,B1)),A1,"")&" "&IF(ISNUMBER(SEARCH(A2,B1)),A2,"")&" "&IF(ISNUMBER(SEARCH(A3,B1)),A3,"")&" "&IF(ISNUMBER(SEARCH(A4,B1)),A4,""))

Regards
 

HughT

Board Regular
Thank you very much for the quick reply!

I tried copying this down column C (ie to C2, C3, C4 etc) and it didn't work, so I fixed the references after the double brackets with $ signs eg (A1,B1)),$A$1,"") etc, and this didn't work either.

The keyword column could have 20 or more entries, how do I alter the formula to accommodate the range? When I tried to do this myself, I was wondering if a range statement eg '$A$1:$A$20' should be in there somewhere to capture the keyword criteria.
 

bosco_yip

Well-known Member
Try this to copy down,

=TRIM(IF(ISNUMBER(SEARCH($A$1,B1)),$A$1,"")&" "&IF(ISNUMBER(SEARCH($A$2,B1)),$A$2,"")&" "&IF(ISNUMBER(SEARCH($A$3,B1)),$A$3,"")&" "&IF(ISNUMBER(SEARCH($A$4,B1)),$A$4,""))

You can nest the formula by 20s in the way of : .....&" "&IF(ISNUMBER(SEARCH($A$5,B1)),$A$5,"").....until 20.....&" "&IF(ISNUMBER(SEARCH($A$20,B1)),$A$20,""))

Regards
 

Aladin Akyurek

MrExcel MVP
Row\Col
A​
B​
C​
1​
AppleToday I ate a burger with fries, and had an apple afterwards.Apple, Fries, Burger
2​
Fries
3​
Salad
4​
Burger
5​

In C1 control+shift+enter, not just enter:

=REPLACE(aconcat(IF(ISNUMBER(SEARCH($A$1:$A$4,B1)),", "&$A$1:$A$4,""),""),1,2,"")<strike></strike>

<strike></strike>For this formula to work, you need to add the <acronym title="visual basic for applications">VBA</acronym> code for ACONCAT to your workbook as a module, using Alt+F11...

Function aconcat(a As Variant, Optional sep As String = "") As String
' Harlan Grove, Mar 2002
Dim y As Variant
If TypeOf a Is Range Then
For Each y In a.Cells
aconcat = aconcat & y.Value & sep
Next y
ElseIf IsArray(a) Then
For Each y In a
aconcat = aconcat & y & sep
Next y
Else
aconcat = aconcat & a & sep
End If
aconcat = Left(aconcat, Len(aconcat) - Len(sep))
End Function
 

HughT

Board Regular
Thank you both, but I think I may have overcomplicated the question. The basic criteria is that this must be capable of being maintained by someone who has little or not Excel knowledge, other than entering text in a cell (incredibly, such people do exist!)

So thinking around the problem, what I want to do is:

1. Create a list of keywords in A1:A20.
2. Enter unformatted text in B1, B2, to B100 (for the sake of this example)
3. Return results of match of keywords in A1:A20 in C1, C2, etc to C100.

So, spreading this out:

1. Check the text string in B1 for the ONE text string in A1, and return this in D1.
2. Check the text string in B1 for the ONE text string in A2, and return this in E1, and continue so that D1, E1, F1 etc are the matches against A1:A20 in B1 (ie column D is headed with contents of A1, E with contents of A2, F = A3 etc).
3. Repeat for entire range A1:A20 against B2 to B100.
4. Concatenate results of D1, E1, F1, etc in C1 (ie next to the unformatted text) to show all the matches of A1:A20 in B1. This solves the 'how do I extract the keywords in A1:A20 into C1' qustion. The next bit is harder:
5. Enable search / match / filter function of some sort across range D1:W100 (ie 20 columns corresponding to A1:A20) by the number of rows of data entered (B1 to B100) so that if the word 'Apple' is entered as search criteria, it filters the rows to show all the instances of Apple in B1 to B100. (This is the tricky bit).


Sorry if this confuses things, but I have got lots of empty columns to play with!

Many thanks

Hugh
 

Peter_SSs

MrExcel MVP, Moderator
Edit: I hadn't seen your most recent post before submitting mine. I'll review mine once I've had a chance to study your last post.

I think you would be starting to see from bosco_yip's suggestion that dealing with, say, 20 keywords a standard formula would also be pretty horrendous to leave to somebody else later. In fact that formula would have to get even longer to work properly. With the keyword sample list given & text "Today I ate a burger with fries, and had a pineapple afterwards." the formula returns the keyword "Apple", even though none was eaten.

In fact that issue is the same with Aladin's formula at the moment so it also needs a tweak.

My suggestion below, which also uses vba, has something of the opposite problem - refer row 5 of my screen shot where mine did not return Apple because that 'word' was not found in the text.

There will also be the problem of other plurals where the initial ending changes. for example you may have "Cherry" in your list, so you would probably also need "Cherries"

All a minefield really. :eek:

Anyway, this is my attempt, also using a user-defined function. To implement ..

1. Right click the sheet name tab and choose "View Code".

2. In the Visual Basic window use the menu to Insert|Module

3. Copy and Paste the code below into the main right hand pane that opens at step 2.

4. Close the Visual Basic window.

5. Enter the formula as shown in cell C1 in the screen shot below and copy down.

6. If using Excel 2007 or later your workbook will need to be saved as a macro-enabled workbook (*.xlsm)

Rich (BB code):
Function Keywords(s As String, rWords As Range) As String
  Dim kw As Object
  Dim i As Long
  
  With CreateObject("VBScript.RegExp")
    .IgnoreCase = True
    .Global = True
    .Pattern = "\b(" & Join(Application.Transpose(rWords), "|") & ")\b"
    If .test(s) Then
      Set kw = .Execute(s)
      For i = 1 To kw.Count
        Keywords = Keywords & ", " & kw(i - 1)
      Next i
    End If
  End With
  Keywords = Mid(Keywords, 3)
End Function
For comparison of where all the suggestions may have issues, I've included Aladin's and bosco's suggestions in columns D & E respectively.

Excel Workbook
ABCDE
1AppleToday I ate a burger with fries, and had an apple afterwards.
2FriesToday I had eggs
3SaladapplesAppleApple
4BurgerSalad, pineapple & hamSaladApple, SaladApple Salad
5I ate 4 applesAppleApple
6
Sheet1
 

bosco_yip

Well-known Member
Thank you both, but I think I may have overcomplicated the question. The basic criteria is that this must be capable of being maintained by someone who has little or not Excel knowledge, other than entering text in a cell (incredibly, such people do exist!)

So thinking around the problem, what I want to do is:

1. Create a list of keywords in A1:A20.
2. Enter unformatted text in B1, B2, to B100 (for the sake of this example)
3. Return results of match of keywords in A1:A20 in C1, C2, etc to C100.

So, spreading this out:

1. Check the text string in B1 for the ONE text string in A1, and return this in D1.
2. Check the text string in B1 for the ONE text string in A2, and return this in E1, and continue so that D1, E1, F1 etc are the matches against A1:A20 in B1 (ie column D is headed with contents of A1, E with contents of A2, F = A3 etc).
3. Repeat for entire range A1:A20 against B2 to B100.
4. Concatenate results of D1, E1, F1, etc in C1 (ie next to the unformatted text) to show all the matches of A1:A20 in B1. This solves the 'how do I extract the keywords in A1:A20 into C1' qustion. The next bit is harder:
5. Enable search / match / filter function of some sort across range D1:W100 (ie 20 columns corresponding to A1:A20) by the number of rows of data entered (B1 to B100) so that if the word 'Apple' is entered as search criteria, it filters the rows to show all the instances of Apple in B1 to B100. (This is the tricky bit).


Sorry if this confuses things, but I have got lots of empty columns to play with!

Many thanks

Hugh
In D1, enter formula, copy across to W1 and all copy down to 100 rows :

=IF(ISNUMBER(SEARCH(INDEX($A$1:$A$20,COLUMNS($A1:A1)),$B1)),INDEX($A$1:$A$20,COLUMNS($A1:A1)),"")

In C1, enter formula and copy down :

=TRIM(CONCATENATE(D1," ",E1," ",F1," ",G1," ",H1," ",I1," ",J1," ",K1," ",L1," ",M1," ",N1," ",O1," ",P1," ",Q1," ",R1," ",S1," ",T1," ",U1," ",V1," ",W1))

Regards
 

HughT

Board Regular
Brilliant!

Thank you very much. It all works beautifully. I now have a 20 column x 100 row table each column headed (for convenience) with the keyword from the index range, and beneath that a cell containing the same word wherever that word is also contained in the freetext box, all concatenated into one cell. But how do I perform a search across this data, for example to find all occurrences of the word 'Burger' when a burger always has another item concatenated with it such as fries, salad etc?

Very many thanks.
 

Peter_SSs

MrExcel MVP, Moderator
I now have a 20 column x 100 row table each column headed (for convenience) with the keyword from the index range, ...
In that case, you can use them (headers) for a much simpler formula in your columns.

Note that you still have the issue of the apple showing up when pineapple is in the text & apple is not, but you seem happy to accept that. :confused: (same issue could occur with "ham" and "hamburger" etc)

Formula in D2 is copied across and down.

Excel Workbook
BCDEFG
1AppleFriesSaladBurger
2Today I ate a burger with fries, and had an apple afterwards. FriesBurger
3Today I had eggs
4applesApple
5Salad, pineapple & hamAppleSalad
6I ate 4 applesApple
Keywords
 

Some videos you may like

This Week's Hot Topics

  • Importing multiple excel files into one spreadsheet
    Hi, I'm trying to import multiple excel files (with the same format into a single spreadsheet) so that each day's file is listed underneath the...
  • find many based on a certain criteria
    good evening, I hope someone can help me? I have a workbook sheet 2 contains lots of data.... I would like to be able to find anything on sheet...
  • How to copy multiple rows using If
    Hi all, I'm very new to VBA and have written this simple code to copy certain cells if a certain cell within that row contains any data. I need...
  • VBA If statement
    Dear All, I have two dates, where I'd like a message box to pop, if the dates are between this criteria. [CODE] sDate1 = #10/1/2019#...
  • Text Format
    I have a sheet for user to keyin the data. The format of the data can be 451 / 1903, 0012 / 9908 or 00287 / 0099. The number after the "/" is...
  • Syntax errors
    Good Morning, Trying to compile a workbook, I keep getting a few errors. Here are the first two: [code=rich]Syntax Error: Function...
Top