search within a column for specific words and display the found words in the adjacent cells

sammy1981 · Jun 28, 2018

[FONT=Helvetica Neue, Helvetica, Arial, sans-serif]see attached file, im trying to find out if any of the ingredients from column D exists in column A and display whatever ingredient is found in column B, is this possible? need to go through lots of ingredients and find anything that needs special instructions [/FONT]
[FONT=Helvetica Neue, Helvetica, Arial, sans-serif]see below example[/FONT]

butylhydroxyanisole, lavender oil, methylparaben, purified landolin, purfied water		A-alpha-C (2-Amino-9H-pyrido[2,3-b]indole)
Corn Starch, d&c Red #27 Aluminum Lake, d & c Red #30 Aluminum Lake, Flavors, Saccharin Sodium A-alpha-C (2-Amino-9H-pyrido[2,3-b]indole		Abiraterone acetate
CarboxyMethylcellulose sodium, Microcrystalline Cellulose,flavor, Acetylaminofluorene, Purified Water, Red 22, Red 28, Salicylic Acid,		Acetaldehyde
FD&C red #40 , propylene glycol, flavoring, sucrose, and water, soybean, oil and corn starch used as processing aids.		Acetamide
Famotidine, USP 20mg...... Acid Reducer		Acetazolamide
Purified Water, Citric Acid, Sodium Benzoate Octoxynol-9.		Acetochlor
Adipic Acid, FD&C Blue 1, FD&C Red 27 , FD&C Yellow 6, Acetaldehyde FD&C Yellow 10		Acetohydroxamic acid
Polyehtylene, Sodium Sarcoisnate, EDTA, Quaternium-15, Carbomer, acetate		2-Acetylaminofluorene

[FONT=&quot]
[/FONT]

sammy1981 · Jun 29, 2018

Fluff said:
Yes it does.

ok, i see now why it wasnt working, its case sensitive, can you make to be not case sensitive?

Fluff · Jun 29, 2018

Add the line in blue as shown

Code:

   Set Dic = CreateObject("scripting.dictionary")
   [COLOR=#0000ff]Dic.comparemode = vbTextCompare[/COLOR]
   For Each cl In Range("D2", Range("D" & Rows.Count).End(xlUp))

sammy1981 · Jun 29, 2018

Fluff said:

Add the line in blue as shown

Code:

   Set Dic = CreateObject("scripting.dictionary")
   [COLOR=#0000ff]Dic.comparemode = vbTextCompare[/COLOR]
   For Each cl In Range("D2", Range("D" & Rows.Count).End(xlUp))

thank you

Fluff · Jun 29, 2018

You're welcome

Peter_SSs · Jun 30, 2018

Some further comments about your original data sample.

a) Column D contains about 150 rows that have duplicates (eg "Acrylamide" in D11 and D12), and even triplicates ("Ethylene oxide" in D405:407). Is there any reason for that? Can you avoid that with your data? Any simplification should help as that list in column D is very long.

b) You are likely to get unexpected results. In post #4 mumps raised the issue with "benzene" being part of a longer word "Trihydroxybenzene". I think I can deal with that instance (in a different way to what has been done in the thread so far) but there is a similar issue that is more of a problem as I see it. You have, for example, "Styrene" and "Styrene oxide" in column D. If column A contained the ingredient "Styrene oxide" any solution might return only "Styrene" in column B because that would be found first. That could be very hard to overcome.

c) Some of the items in column D contain one or more trailing spaces (eg D6 = "Acetazolamide " That is, 3 spaces at the end). As it also is making the checking more complex than it could be, can that get cleaned up in your data ?

Never-the-less, even with your original data format, I think this code gives fairly good results. Only fairly good because of point b) above. Note the "Styrene" issue in row 3 below.

Code:

Sub Find_Ingredients()
  Dim RX As Object, M As Object
  Dim a As Variant, b As Variant, itm As Variant
  Dim i As Long
  Dim Ingredients As String, tmp As String
  
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.IgnoreCase = True
  tmp = Replace(Application.Trim(Join(Application.Transpose(Range("D2", Range("D" & Rows.Count).End(xlUp))), "|")), " |", "|")
  RX.Pattern = "([\[\]\(\)])"
  tmp = RX.Replace(tmp, "\$1")
  RX.Pattern = "(\b| )(" & tmp & ")(?=\b|\W|$)"
  a = Range("A2", Range("A" & Rows.Count).End(xlUp)).Value
  ReDim b(1 To UBound(a), 1 To 1)
  For i = 1 To UBound(a)
    Ingredients = ""
    Set M = RX.Execute(Application.Trim(a(i, 1)))
    For Each itm In M
      Ingredients = Ingredients & "; " & Trim(itm)
    Next itm
    b(i, 1) = Mid(Ingredients, 3)
  Next i
  Range("B2").Resize(UBound(b)).Value = b
End Sub

My sample data has column D exactly as in your sample file from post #3 but in column A I have made up some data that contains a few matches.
Results of the above code in column B:

Book1

A

B

1

Ingredient:

2

ACTIVE INGREDIENTS: butylhydroxyanisole, lavender oil, methylparaben, purified landolin, purfied water

3

Zileuton, 1,4-Butanediol dimethanesulfonate (Busulfan) Glycerol N,N-Bis(2-chloroethyl)-2-naphthylamine (Chlornapazine) Styrene oxide

Zileuton; 1,4-Butanediol dimethanesulfonate (Busulfan); N,N-Bis(2-chloroethyl)-2-naphthylamine (Chlornapazine); Styrene

4

Active Ingredient:CarboxyMethylcellulose sodium, Microcrystalline Cellulose,flavor, Purified Water, Red 22, Red 28, Salicylic Acid,

5

AF-2;[2-(2-furyl)-3-(5-nitro-2-furyl)]acrylamide

6

Antimony oxide (Antimony trioxide), Alcohol, Acetochlor

Antimony oxide (Antimony trioxide); Acetochlor

7

1-Amino-2,4-dibromoanthraquinone

Data

sammy1981 · Jul 2, 2018

Peter_SSs said:
Some further comments about your original data sample.

a) Column D contains about 150 rows that have duplicates (eg "Acrylamide" in D11 and D12), and even triplicates ("Ethylene oxide" in D405:407). Is there any reason for that? Can you avoid that with your data? Any simplification should help as that list in column D is very long.

b) You are likely to get unexpected results. In post #4 mumps raised the issue with "benzene" being part of a longer word "Trihydroxybenzene". I think I can deal with that instance (in a different way to what has been done in the thread so far) but there is a similar issue that is more of a problem as I see it. You have, for example, "Styrene" and "Styrene oxide" in column D. If column A contained the ingredient "Styrene oxide" any solution might return only "Styrene" in column B because that would be found first. That could be very hard to overcome.

c) Some of the items in column D contain one or more trailing spaces (eg D6 = "Acetazolamide " That is, 3 spaces at the end). As it also is making the checking more complex than it could be, can that get cleaned up in your data ?

Never-the-less, even with your original data format, I think this code gives fairly good results. Only fairly good because of point b) above. Note the "Styrene" issue in row 3 below.

Code:

Sub Find_Ingredients() Dim RX As Object, M As Object Dim a As Variant, b As Variant, itm As Variant Dim i As Long Dim Ingredients As String, tmp As String Set RX = CreateObject("VBScript.RegExp") RX.Global = True RX.IgnoreCase = True tmp = Replace(Application.Trim(Join(Application.Transpose(Range("D2", Range("D" & Rows.Count).End(xlUp))), "|")), " |", "|") RX.Pattern = "([\[\]])" tmp = RX.Replace(tmp, "\$1") RX.Pattern = "(\b| )(" & tmp & ")(?=\b|\W|$)" a = Range("A2", Range("A" & Rows.Count).End(xlUp)).Value ReDim b(1 To UBound(a), 1 To 1) For i = 1 To UBound(a) Ingredients = "" Set M = RX.Execute(Application.Trim(a(i, 1))) For Each itm In M Ingredients = Ingredients & "; " & Trim(itm) Next itm b(i, 1) = Mid(Ingredients, 3) Next i Range("B2").Resize(UBound(b)).Value = b End Sub

My sample data has column D exactly as in your sample file from post #3 but in column A I have made up some data that contains a few matches.
Results of the above code in column B:

A B
1 Ingredient:
2 ACTIVE INGREDIENTS: butylhydroxyanisole, lavender oil, methylparaben, purified landolin, purfied water
3 Zileuton, 1,4-Butanediol dimethanesulfonate (Busulfan) Glycerol N,N-Bis(2-chloroethyl)-2-naphthylamine (Chlornapazine) Styrene oxide Zileuton; 1,4-Butanediol dimethanesulfonate (Busulfan); N,N-Bis(2-chloroethyl)-2-naphthylamine (Chlornapazine); Styrene
4 Active Ingredient:CarboxyMethylcellulose sodium, Microcrystalline Cellulose,flavor, Purified Water, Red 22, Red 28, Salicylic Acid,
5 AF-2;[2-(2-furyl)-3-(5-nitro-2-furyl)]acrylamide AF-2;[2-(2-furyl)-3-(5-nitro-2-furyl)]acrylamide
6 Antimony oxide (Antimony trioxide), Alcohol, Acetochlor Antimony oxide (Antimony trioxide); Acetochlor
7 1-Amino-2,4-dibromoanthraquinone 1-Amino-2,4-dibromoanthraquinone

<colgroup><col style="width: 25pxpx"><col><col></colgroup><thead>
</thead><tbody>
</tbody>
Data

thanks for your input

i will take out all the duplicates and spaces from column D, it was the raw data i received but needs some clean up

this macro seems to work fairly well and it only picks up complete words and not just if its found within a word

will do some more testing and advise

is there any particular format column A needs to be?

Peter_SSs · Jul 2, 2018

sammy1981 said:
is there any particular format column A needs to be?

No.

BTW, best not to fully quote long posts as it makes the thread harder to read/navigate. If you want to quote, just quote small, relevant parts only, as i have done here.

sammy1981 · Jul 3, 2018

thanks
sorry, Im new to this forum but i see what you mean

search within a column for specific words and display the found words in the adjacent cells

sammy1981

New Member

sammy1981

New Member

Excel Facts

Fluff

MrExcel MVP, Moderator

sammy1981

New Member

Fluff

MrExcel MVP, Moderator

Peter_SSs

MrExcel MVP, Moderator

sammy1981

New Member

Peter_SSs

MrExcel MVP, Moderator

sammy1981

New Member

Forum statistics

Share this page

search within a column for specific words and display the found words in the adjacent cells

New Member

New Member

Excel Facts

MrExcel MVP, Moderator

New Member

MrExcel MVP, Moderator

MrExcel MVP, Moderator

New Member

MrExcel MVP, Moderator

New Member

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock