Extract multiple keywords from text string

HughT

Board Regular
Joined
Jan 6, 2012
Messages
113
Office Version
  1. 365
Platform
  1. Windows
I have searched the forum but I can't find anything that quite matches this.

This is for a healthy eating project. Students record their meals in a daily diary. This is random unformatted text in a single cell (B1, B2 etc). There is a keyword list of healthy foods, each item being in a separate cell. What I would like to do is search the random text for occurrences of the keywords and return the keywords in another cell adjacent to the text cell. I would then like to be able to search the returned cells by the keyword list.

So:

Keywords (each in a separate cell, but doesn't have to be in Column A):

A1 Apple
A2 Fries
A3 Salad
A4 Burger
etc

Text (in B1)
Today I ate a burger with fries, and had an apple afterwards.

Result (in C1)
Apple Fries Burger [order is not important]

C1 to C20 (etc) will be the searchable data. I want to be able to search this by each keyword in the range A1:A4, ie 'Apple', 'Fries', 'Salad' etc so I can see who has been eating Apples, Fries, etc. Using column filters will display the contents of every cell, so if some comedian enters the whole range A1:A4 (which will actually be much larger) the filter will also return the whole range, so I need an alternative method.

I possible I would like to do this by a formula rather than VBA as I have to hand this over to someone who will not understand VBA, and can add to or alter the contents of the lookup range (A1:A4) simply by adding to it or overtyping the existing contents.

Thank you for your help.
 
Edit: I hadn't seen your most recent post before submitting mine. I'll review mine once I've had a chance to study your last post.

I think you would be starting to see from bosco_yip's suggestion that dealing with, say, 20 keywords a standard formula would also be pretty horrendous to leave to somebody else later. In fact that formula would have to get even longer to work properly. With the keyword sample list given & text "Today I ate a burger with fries, and had a pineapple afterwards." the formula returns the keyword "Apple", even though none was eaten.

In fact that issue is the same with Aladin's formula at the moment so it also needs a tweak.

My suggestion below, which also uses vba, has something of the opposite problem - refer row 5 of my screen shot where mine did not return Apple because that 'word' was not found in the text.

There will also be the problem of other plurals where the initial ending changes. for example you may have "Cherry" in your list, so you would probably also need "Cherries"

All a minefield really. :eek:

Anyway, this is my attempt, also using a user-defined function. To implement ..

1. Right click the sheet name tab and choose "View Code".

2. In the Visual Basic window use the menu to Insert|Module

3. Copy and Paste the code below into the main right hand pane that opens at step 2.

4. Close the Visual Basic window.

5. Enter the formula as shown in cell C1 in the screen shot below and copy down.

6. If using Excel 2007 or later your workbook will need to be saved as a macro-enabled workbook (*.xlsm)

Rich (BB code):
Function Keywords(s As String, rWords As Range) As String
  Dim kw As Object
  Dim i As Long
  
  With CreateObject("VBScript.RegExp")
    .IgnoreCase = True
    .Global = True
    .Pattern = "\b(" & Join(Application.Transpose(rWords), "|") & ")\b"
    If .test(s) Then
      Set kw = .Execute(s)
      For i = 1 To kw.Count
        Keywords = Keywords & ", " & kw(i - 1)
      Next i
    End If
  End With
  Keywords = Mid(Keywords, 3)
End Function

For comparison of where all the suggestions may have issues, I've included Aladin's and bosco's suggestions in columns D & E respectively.

Sheet1

ABCDE
1AppleToday I ate a burger with fries, and had an apple afterwards.burger, fries, appleApple, Fries, BurgerApple Fries Burger
2FriesToday I had eggs
3Saladapples AppleApple
4BurgerSalad, pineapple & hamSaladApple, SaladApple Salad
5 I ate 4 apples AppleApple
6

<colgroup><col style="font-weight:bold; width:30px; "><col style="width:56px;"><col style="width:399px;"><col style="width:142px;"><col style="width:146px;"><col style="width:141px;"></colgroup><tbody>
</tbody>

Spreadsheet Formulas
CellFormula
C1=Keywords(B1,$A$1:$A$4)
D1{=REPLACE(aconcat(IF(ISNUMBER(SEARCH($A$1:$A$4,B1)),", "&$A$1:$A$4,""),""),1,2,"")}
E1=TRIM(IF(ISNUMBER(SEARCH($A$1,B1)),$A$1,"")&" "&IF(ISNUMBER(SEARCH($A$2,B1)),$A$2,"")&" "&IF(ISNUMBER(SEARCH($A$3,B1)),$A$3,"")&" "&IF(ISNUMBER(SEARCH($A$4,B1)),$A$4,""))

<tbody>
</tbody>
Formula Array:
Produce enclosing
{ } by entering
formula with CTRL+SHIFT+ENTER!

<tbody>
</tbody>


Excel tables to the web >> Excel Jeanie HTML 4


Hello Peter,
I was looking for a solution to a requirement I have and I came across this thread.

I tried the suggestions given by you and other posters but it did not work for me. I wanted to request your help on this matter.

Kindly provide your inputs.

My Requirement:


1. Column A has Keywords. Example:
Cell A1: FIELD-NAME-1
Cell A2: FIELD-NAME-2
so on...
Cell A100: FIELD-NAME-100


2. Column W has free form text, that may or may not have keywords from Column A.

Cell W1: This is the free form text for FIELD-NAME-2
Cell W2: This is the free form text for FIELD-NAME-1 and FIELD-NAME-100.
Cell W3: This is the free form text for no fields.
So on..
Cell W20: This is the free form text for FIELD-NAME-1, FIELD-NAME-3, FIELD-NAME-10 and FIELD-NAME-50.


3. I want the below in Column X (Basically check the Keywords in Column A in each of the cells in Column W and give the list of keywords.

Cell X1: FIELD-NAME-2
Cell X2: FIELD-NAME-1, FIELD-NAME-100
Cell X3:
So on..
Cell X20: FIELD-NAME-1, FIELD-NAME-3, FIELD-NAME-10, FIELD-NAME-50.
Row NbrAWX
1FIELD-NAME-1This is the free form text for FIELD-NAME-2FIELD-NAME-2
2FIELD-NAME-2This is the free form text for FIELD-NAME-1 and FIELD-NAME-100FIELD-NAME-1, FIELD-NAME-100
3FIELD-NAME-3This is the free form text for no fields
So on..so on..so on..So on..

<tbody>
</tbody>


Solution (Did not work):

Followed your solution.

a. Save excel sheet in .xlsm type. (I have excel 2010).

b. Inserted the below module in vba

Function Keywords(s As String, rWords As Range) As String
Dim kw As Object
Dim i As Long

With CreateObject("VBScript.RegExp")
.IgnoreCase = True
.Global = True
.Pattern = "\b(" & Join(Application.Transpose(rWords), "|") & ")\b"
If .test(s) Then
Set kw = .Execute(s)
For i = 1 To kw.Count
Keywords = Keywords & ", " & kw(i - 1)
Next i
End If
End With
Keywords = Mid(Keywords, 3)
End Function

c. In X1 Cell I put the below formula
=Keywords(W1,$A$1:$A$100)
X2 Cell I put the below formula
=Keywords(W2,$A$1:$A$100)
So on...
W20 Cell I put the below forumula
=Keywords(W20,$A$1:$A$100)

d. However the Cells in Column X show blank and not the expected result.

Not sure what I am doing wrong. Kindly help.




 
Upvote 0

Excel Facts

When they said...
When they said you are going to "Excel at life", they meant you "will be doing Excel your whole life".
Dear Mr. Peter_SSs,

I have more or less same problem, want to extract the exact words from B1 by searching from A1:A10 list and display in column C. Please help me Sir.

It content ;

COLUMN A (LIST )

MDB
SMDB
ESMDB
FESMDB
/f

<colgroup><col width="64" style="width:48pt"> </colgroup><tbody>
</tbody>
SWITCH
SWITCHGEAR

COLUMN B

testing for MDB, DB
I am testing SMDB, MDB
Looking for ESMDB
Checking for FESMDB
This is for cable /f

<colgroup><col></colgroup><tbody>
</tbody>

COLUMN C (Result)


 
Upvote 0
I have more or less same problem, want to extract the exact words from B1 by searching from A1:A10 list and display in column C. Please help me Sir.
Your sample data is not entirely clear and you have not shown the expected results and their layout. Can you clarify both the sample data and the expected results?
 
Upvote 0
Your sample data is not entirely clear and you have not shown the expected results and their layout. Can you clarify both the sample data and the expected results?


Dear Sir,

Many thanks for your reply, Sir, actually I want to extract exact names from Description column and put in a Result Column (The Exacts Names are mentioned in the LIST below). Need command to read Item from Name List, search in Description and display in Result Column). Count column is only counting how many item there in Result i.e. Cable Ladder = 1 and MDB,Switch = 2 and so on......

Sir, Want to extract exact item name from the name list with comma in Result column. Many thanks in advance Sir.

Main Data sample With Result and Count of Result Item

DescriptionResultCount
HIGH VOLTAGE DISTRIBUTION INSTALLATION; EQUIPMENT; Containment including all fittings, couplers, supports, fixings and ancillary work as necessary; HV cable LadderCable Ladder1
LOW VOLTAGE DISTRIBUTION INSTALLATION; EQUIPMENT; Switch gear and equipment, including supports, final point-wiring, containment and ancillary work as necessary; MDB-P-GF-02(1887.7kW) 11 HV Switchgear11 HV Switchgear, MDB, Switch3
LOW VOLTAGE DISTRIBUTION INSTALLATION; EQUIPMENT; Switch gear and equipment, including supports, final point-wiring, containment and ancillary work as necessary; MDBMDB, Switch 2

<tbody>
</tbody>

Name List
11 HV Switchgear
MDB
Switch
Cable Ladder
Access control
ATS
BGM
Central battery
......... many more names are there

<tbody>
</tbody>
 
Upvote 0
Dear Sir,

Many thanks for your reply, Sir, actually I want to extract exact names from Description column and put in a Result Column (The Exacts Names are mentioned in the LIST below). Need command to read Item from Name List, search in Description and display in Result Column). Count column is only counting how many item there in Result i.e. Cable Ladder = 1 and MDB,Switch = 2 and so on......

Sir, Want to extract exact item name from the name list with comma in Result column. Many thanks in advance Sir.
Did you try the user-defined function from post #7 ?

Excel Workbook
ABC
1Name ListDescriptionResult
211 HV SwitchgearHIGH VOLTAGE DISTRIBUTION INSTALLATION; EQUIPMENT; Containment including all fittings, couplers, supports, fixings and ancillary work as necessary; HV cable Laddercable Ladder
3MDBLOW VOLTAGE DISTRIBUTION INSTALLATION; EQUIPMENT; Switch gear and equipment, including supports, final point-wiring, containment and ancillary work as necessary; MDB-P-GF-02(1887.7kW) 11 HV SwitchgearSwitch, MDB, 11 HV Switchgear
4SwitchLOW VOLTAGE DISTRIBUTION INSTALLATION; EQUIPMENT; Switch gear and equipment, including supports, final point-wiring, containment and ancillary work as necessary; MDBSwitch, MDB
5Cable Ladder
6Access control
7ATS
8BGM
9Central battery
10
Sheet1
 
Upvote 0
Fantastic Sir, but the only problem is that words are repeating(Please see below in the table) i.e. Security,Access control and Earthing is showing more then ONE TIME, I need them to appear only one time in the Result column.



Did you try the user-defined function from post #7 ?

Sheet1

ABC
1Name ListDescriptionResult
211 HV Switchgear SECURITY SYSTEM INSTALLATION; EQUIPMENT; CCTV - Switchgear and equipment, including supports and ancillary work as necessary; Hotel security control room - (including video wall monitors, anpr workstation, operator/supervisor workstation, badge
SECURITY, CCTV, security

<tbody>
</tbody>
3MDB EARTHING AND BONDING SYSTEM INSTALLATION; EQUIPMENT; Earthing and bonding system, including supports, final point-wiring, containment, test points, air termination network, strike pads, cables and accessories; Earthing for power
EARTHING, Earthing, Earthing

<tbody>
</tbody>
4Switch ACCESS CONTROL SYSTEM INSTALLATION; EQUIPMENT; Access control Panel; Access control Panel
ACCESS CONTROL, Access control, Access control

<tbody>
</tbody>
5Cable Ladder
6Access control
7ATS
8BGM
9Access control
10Earthing

<colgroup><col style="font-weight:bold; width:30px; "><col style="width:125px;"><col style="width:421px;"><col style="width:212px;"></colgroup><tbody>
</tbody>

Spreadsheet Formulas
CellFormula
C2=Keywords(B2,A$2:A$8)

<tbody>
</tbody>

<tbody>
</tbody>


Excel tables to the web >> Excel Jeanie HTML 4
 
Upvote 0
Fantastic Sir, but the only problem is that words are repeating(Please see below in the table) i.e. Security,Access control and Earthing is showing more then ONE TIME, I need them to appear only one time in the Result column.
Pity that wasn't stated, or an example like that given, earlier. ;)

Try changing the line in blue.
Rich (BB code):
For i = 1 To kw.Count
  If InStr(1, ", " & Keywords & ", ", ", " & kw(i - 1) & ", ", 1) = 0 Then Keywords = Keywords & ", " & kw(i - 1)
Next i

Also, I forgot to state earlier, that the column with the list of keywords must not contain any blank cells. If that could happen, then the code would need some further modifications.


BTW, best not to quote whole long posts as it makes the thread harder to read/navigate. If you want/need to quote, quote small, relevant parts only.
 
Last edited:
Upvote 0
Dear Sir,

I have data with description in column A, column B is Component name, Result will be in C column, need to search TEXT in Description (B Column) from given D,E,F,G,H and on..... If found any one of them in Description, then display Component name which is there in column B. Example below;

ABCDEFGH
DescriptionComponent Result
HIGH VOLTAGE DISTRIBUTION INSTALLATION; EQUIPMENT; Containment including all fittings, couplers, supports, fixings and ancillary work as necessary; HV cable Ladder, MV SwitchgearHV SwitchgearHV SwitchgearMV PanelsMV Switchgear11 KV SwitchgearRMUMVSAND ON...............
SECURITY SYSTEM INSTALLATION; EQUIPMENT; CCTV - Switchgear and equipment, Card Reader, including supports and ancillary work as necessary; Hotel security control room - (including video wall monitors, anpr workstation, operator/supervisor workstation, badge Access controlAccess controlAccess control panelCard ReaderPush button for exitMortise LockInput ModuleAND ON...............

<tbody>
</tbody>


Thank you Sir, in advance.
 
Upvote 0

Forum statistics

Threads
1,214,522
Messages
6,120,022
Members
448,939
Latest member
Leon Leenders

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top