search within a column for specific words and display the found words in the adjacent cells

sammy1981

New Member
Joined
Jun 28, 2018
Messages
20
[FONT=Helvetica Neue, Helvetica, Arial, sans-serif]see attached file, im trying to find out if any of the ingredients from column D exists in column A and display whatever ingredient is found in column B, is this possible? need to go through lots of ingredients and find anything that needs special instructions [/FONT]
[FONT=Helvetica Neue, Helvetica, Arial, sans-serif]see below example[/FONT]
butylhydroxyanisole, lavender oil, methylparaben, purified landolin, purfied water A-alpha-C (2-Amino-9H-pyrido[2,3-b]indole)
Corn Starch, d&c Red #27 Aluminum Lake, d & c Red #30 Aluminum Lake, Flavors, Saccharin Sodium A-alpha-C (2-Amino-9H-pyrido[2,3-b]indoleAbiraterone acetate
CarboxyMethylcellulose sodium, Microcrystalline Cellulose,flavor, Acetylaminofluorene, Purified Water, Red 22, Red 28, Salicylic Acid, Acetaldehyde
FD&C red #40 , propylene glycol, flavoring, sucrose, and water, soybean, oil and corn starch used as processing aids.Acetamide
Famotidine, USP 20mg...... Acid ReducerAcetazolamide
Purified Water, Citric Acid, Sodium Benzoate Octoxynol-9.Acetochlor
Adipic Acid, FD&C Blue 1, FD&C Red 27 , FD&C Yellow 6, Acetaldehyde FD&C Yellow 10Acetohydroxamic acid
Polyehtylene, Sodium Sarcoisnate, EDTA, Quaternium-15, Carbomer, acetate2-Acetylaminofluorene

<tbody>
</tbody>
[FONT=&quot]
[/FONT]
 

Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest
Add the line in blue as shown
Code:
   Set Dic = CreateObject("scripting.dictionary")
   [COLOR=#0000ff]Dic.comparemode = vbTextCompare[/COLOR]
   For Each cl In Range("D2", Range("D" & Rows.Count).End(xlUp))
 
Upvote 0
Some further comments about your original data sample.

a) Column D contains about 150 rows that have duplicates (eg "Acrylamide" in D11 and D12), and even triplicates ("Ethylene oxide" in D405:407). Is there any reason for that? Can you avoid that with your data? Any simplification should help as that list in column D is very long.

b) You are likely to get unexpected results. In post #4 mumps raised the issue with "benzene" being part of a longer word "Trihydroxybenzene". I think I can deal with that instance (in a different way to what has been done in the thread so far) but there is a similar issue that is more of a problem as I see it. You have, for example, "Styrene" and "Styrene oxide" in column D. If column A contained the ingredient "Styrene oxide" any solution might return only "Styrene" in column B because that would be found first. That could be very hard to overcome.

c) Some of the items in column D contain one or more trailing spaces (eg D6 = "Acetazolamide   " That is, 3 spaces at the end). As it also is making the checking more complex than it could be, can that get cleaned up in your data ?


Never-the-less, even with your original data format, I think this code gives fairly good results. Only fairly good because of point b) above. Note the "Styrene" issue in row 3 below.

Code:
Sub Find_Ingredients()
  Dim RX As Object, M As Object
  Dim a As Variant, b As Variant, itm As Variant
  Dim i As Long
  Dim Ingredients As String, tmp As String
  
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.IgnoreCase = True
  tmp = Replace(Application.Trim(Join(Application.Transpose(Range("D2", Range("D" & Rows.Count).End(xlUp))), "|")), " |", "|")
  RX.Pattern = "([\[\]\(\)])"
  tmp = RX.Replace(tmp, "\$1")
  RX.Pattern = "(\b| )(" & tmp & ")(?=\b|\W|$)"
  a = Range("A2", Range("A" & Rows.Count).End(xlUp)).Value
  ReDim b(1 To UBound(a), 1 To 1)
  For i = 1 To UBound(a)
    Ingredients = ""
    Set M = RX.Execute(Application.Trim(a(i, 1)))
    For Each itm In M
      Ingredients = Ingredients & "; " & Trim(itm)
    Next itm
    b(i, 1) = Mid(Ingredients, 3)
  Next i
  Range("B2").Resize(UBound(b)).Value = b
End Sub


My sample data has column D exactly as in your sample file from post #3 but in column A I have made up some data that contains a few matches.
Results of the above code in column B:


Book1
AB
1Ingredient:
2ACTIVE INGREDIENTS: butylhydroxyanisole, lavender oil, methylparaben, purified landolin, purfied water
3Zileuton, 1,4-Butanediol dimethanesulfonate (Busulfan) Glycerol N,N-Bis(2-chloroethyl)-2-naphthylamine (Chlornapazine) Styrene oxideZileuton; 1,4-Butanediol dimethanesulfonate (Busulfan); N,N-Bis(2-chloroethyl)-2-naphthylamine (Chlornapazine); Styrene
4Active Ingredient:CarboxyMethylcellulose sodium, Microcrystalline Cellulose,flavor, Purified Water, Red 22, Red 28, Salicylic Acid,
5AF-2;[2-(2-furyl)-3-(5-nitro-2-furyl)]acrylamideAF-2;[2-(2-furyl)-3-(5-nitro-2-furyl)]acrylamide
6Antimony oxide (Antimony trioxide), Alcohol, AcetochlorAntimony oxide (Antimony trioxide); Acetochlor
71-Amino-2,4-dibromoanthraquinone1-Amino-2,4-dibromoanthraquinone
Data
 
Upvote 0
Some further comments about your original data sample.

a) Column D contains about 150 rows that have duplicates (eg "Acrylamide" in D11 and D12), and even triplicates ("Ethylene oxide" in D405:407). Is there any reason for that? Can you avoid that with your data? Any simplification should help as that list in column D is very long.

b) You are likely to get unexpected results. In post #4 mumps raised the issue with "benzene" being part of a longer word "Trihydroxybenzene". I think I can deal with that instance (in a different way to what has been done in the thread so far) but there is a similar issue that is more of a problem as I see it. You have, for example, "Styrene" and "Styrene oxide" in column D. If column A contained the ingredient "Styrene oxide" any solution might return only "Styrene" in column B because that would be found first. That could be very hard to overcome.

c) Some of the items in column D contain one or more trailing spaces (eg D6 = "Acetazolamide " That is, 3 spaces at the end). As it also is making the checking more complex than it could be, can that get cleaned up in your data ?


Never-the-less, even with your original data format, I think this code gives fairly good results. Only fairly good because of point b) above. Note the "Styrene" issue in row 3 below.

Code:
Sub Find_Ingredients()
  Dim RX As Object, M As Object
  Dim a As Variant, b As Variant, itm As Variant
  Dim i As Long
  Dim Ingredients As String, tmp As String
  
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.IgnoreCase = True
  tmp = Replace(Application.Trim(Join(Application.Transpose(Range("D2", Range("D" & Rows.Count).End(xlUp))), "|")), " |", "|")
  RX.Pattern = "([\[\]\(\)])"
  tmp = RX.Replace(tmp, "\$1")
  RX.Pattern = "(\b| )(" & tmp & ")(?=\b|\W|$)"
  a = Range("A2", Range("A" & Rows.Count).End(xlUp)).Value
  ReDim b(1 To UBound(a), 1 To 1)
  For i = 1 To UBound(a)
    Ingredients = ""
    Set M = RX.Execute(Application.Trim(a(i, 1)))
    For Each itm In M
      Ingredients = Ingredients & "; " & Trim(itm)
    Next itm
    b(i, 1) = Mid(Ingredients, 3)
  Next i
  Range("B2").Resize(UBound(b)).Value = b
End Sub


My sample data has column D exactly as in your sample file from post #3 but in column A I have made up some data that contains a few matches.
Results of the above code in column B:

AB
1Ingredient:
2ACTIVE INGREDIENTS: butylhydroxyanisole, lavender oil, methylparaben, purified landolin, purfied water
3Zileuton, 1,4-Butanediol dimethanesulfonate (Busulfan) Glycerol N,N-Bis(2-chloroethyl)-2-naphthylamine (Chlornapazine) Styrene oxideZileuton; 1,4-Butanediol dimethanesulfonate (Busulfan); N,N-Bis(2-chloroethyl)-2-naphthylamine (Chlornapazine); Styrene
4Active Ingredient:CarboxyMethylcellulose sodium, Microcrystalline Cellulose,flavor, Purified Water, Red 22, Red 28, Salicylic Acid,
5AF-2;[2-(2-furyl)-3-(5-nitro-2-furyl)]acrylamideAF-2;[2-(2-furyl)-3-(5-nitro-2-furyl)]acrylamide
6Antimony oxide (Antimony trioxide), Alcohol, AcetochlorAntimony oxide (Antimony trioxide); Acetochlor
71-Amino-2,4-dibromoanthraquinone1-Amino-2,4-dibromoanthraquinone

<colgroup><col style="width: 25pxpx"><col><col></colgroup><thead>
</thead><tbody>
</tbody>
Data

thanks for your input

i will take out all the duplicates and spaces from column D, it was the raw data i received but needs some clean up

this macro seems to work fairly well and it only picks up complete words and not just if its found within a word

will do some more testing and advise

is there any particular format column A needs to be?
 
Upvote 0
is there any particular format column A needs to be?
No.

BTW, best not to fully quote long posts as it makes the thread harder to read/navigate. If you want to quote, just quote small, relevant parts only, as i have done here.
 
Upvote 0

Forum statistics

Threads
1,214,923
Messages
6,122,286
Members
449,076
Latest member
kenyanscott

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top