Text search


Posted by Eric B on December 14, 2001 10:59 AM

Sheet1, A1:A8000 contain text strings.
On sheet2 a1:a50 is a list of keywords. On the corresponding row in column b on sheet 2 I would like a formula that counted the number of rows from sheet1, col A that contained that keyword (ignore caps).

For Instance, if I had in sheet 1:
M16762 Mouse interleukin 2 (IL-2) gene, exon 4
M37897 Mouse interleukin 10 mRNA, complete cds
M25892 Mus musculus interleukin 4 (Il-4) mRNA, complete cds
M83649 Mus musculus Fas antigen mRNA, complete cds
J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively)
J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively)
J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively)
J04423 E coli bioC protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively)
J04423 E coli bioC protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively)
J04423 E coli bioD gene dethiobiotin synthetase (-5 and -3 represent transcript regions 5 prime and 3 prime respectively)

And in sheet 2,
Col(A):Col(B)
mouse: 2
E coli: 6

Thanks for any help

Posted by Rick E on December 14, 2001 11:52 AM

Macro for your key word count..

Here is the macro to do what you asked for. There are a few assumptions. First your sheet names are as named, Sheet1 and Sheet2, and you want to start in A1 for both sheets. The macro will look in all column A cells until it finds a "empty" cell. So you can have more than 8000 to search and more than 50 to compare. One last thing it will sopt looking in a string after finding the first match, if you want to keep counting, delete the line that I put a comment on and it will keep looking.

Sub findEm()
Dim findIt As String, lookAt As String
Dim cout As Integer, kout As Integer
Dim match As Integer
Sheets("Sheet2").Select
Range("A1").Select
Do Until ActiveCell.Value = ""
match = 0
findIt = LCase(ActiveCell.Value)
kout = Len(ActiveCell.Value)
Sheets("Sheet1").Select
Range("A1").Select
Do Until ActiveCell.Value = ""
cout = Len(ActiveCell.Value)
If kout > cout Then Exit Do
For i = 1 To cout - kout
lookAt = LCase(Mid(ActiveCell.Value, i, kout))
If lookAt = findIt Then ' match - stop looking at this cell
match = match + 1
Exit For ' Would take this line out if want to count more
' in the same line.
End If
Next i
ActiveCell.Offset(1, 0).Select
Loop
Sheets("Sheet2").Select
ActiveCell.Offset(0, 1).Value = match
ActiveCell.Offset(1, 0).Select
Loop

End Sub

Copy/paste this code into a module and run the macro, it will put the match counts in column B next to the search text.

Good luck and please let me know that you got this and that it worked. Rick E.

Posted by Rick E on December 14, 2001 12:03 PM

And another thing...

P.S. It was a big help that you provided example data so that I could test out the code without having to make stuff up.

Rick E

Posted by Eric B on December 14, 2001 12:28 PM

Say! That's close.

I pasted the code in and ran it with 8000 rows and 42 keywords. Interestingly it gave an accurate report of the count for the 13th and 39th search terms, but returned a zero for all of the other search terms. Any tips?

Posted by Rick E on December 14, 2001 12:35 PM

Need More Data

Can you post a data item or two and search item that was WRONG. That way I can see what might be going on in my workbook.

Rick E

Posted by Eric B on December 14, 2001 12:36 PM

Sorry, make that correct reports on the 14th and 41st search terms (NT)

.

Posted by Eric B on December 14, 2001 12:41 PM

Here's some

Keywords
extracell
proteoglycan

Text to search
M64300 Rat extracellular signal-related kinase (ERK2) mRNA, complete cds /cds=(171,1247) /gb=M64300 /gi=204055 /ug=Rn.13651 /len=1467
L19112 Rat (clone R2(B3C)) heparin-binding fibroblast growth factor receptor 2 (extracellular domain) mRNA, partial cds /cds=(0,1061) /gb=L19112 /gi=310150 /ug=Rn.12732 /len=1062
S61868 ryudocan=heparan sulfate proteoglycan core protein [rats, microvascular endothelial cells, mRNA, 2462 nt]
L02896 Rattus norvegicus major heparan sulfate proteoglycan (glypican) mRNA, complete cds /cds=(221,1897) /gb=L02896 /gi=204424 /ug=Rn.7044 /len=3497

It should find two of each of the keyword terms

Posted by Rick E. on December 14, 2001 12:47 PM

Found 2 each...?

I copied into my cell A's and ran the macro and it reported back 2 for eack key word.

Is there anything different about the words in Sheet1 and Sheet2 that may not of copied to this web site?

Rick E.

Posted by Eric B on December 14, 2001 12:57 PM

Well that leaves a couple of alternatives

Either gross incompetence on my part (which is more likely than I may be willing to admit!), or some unexpected effect of my spreadsheet layout. Would you be averse to my sending you an email attachment of the spreadsheet? If you don't want to post your email on the board you can just email me through my link (affypharm10@hotmail.com), and I'll reply with the attachment.

Thanks very much for your help on this.

Posted by Rick E on December 14, 2001 1:05 PM

Zeros for match

Just to check, I copied out my macro from this site and ran it against the data you provided and got the same matches, so the macro copied OK.

Do you know how to add a break point in a macro and check variable content as it is executing?

Here is my Sheet2

Mouse 2
E coli 6
RNA 7
(-5 6
extracell 2
proteoglycan 2

it works for me. I don't know what is wrong with yours, I am using Excel 2000 but that should not make a difference since the VBA code is just standard VBA without using anything special.

Rick E.

Posted by Rick E on December 14, 2001 1:11 PM

Please check your email

Go ahead and send me your file, I will be leaving here in an hour, or less but will email it back to you on Monday if it does not get here soon today. It should not take very long to debug.

Rick E.

Posted by Eric B on December 14, 2001 1:20 PM

Aladin's array solution

I found this post from Aladin that uses a CSE formula to do the same thing
18764.html

Posted by Rick E on December 14, 2001 1:53 PM

DeBugged Macro

The word All on the first line got me... but it should not have, I've fixed the macro, try this one - plus I have moved to A2 for both sheets, hope that's OK

Sub findEm()
Dim findIt As String, lookAt As String
Dim cout As Integer, kout As Integer
Dim match As Integer
Sheets("Sheet2").Select
Range("A2").Select
k = 0
Do Until ActiveCell.Value = ""
match = 0
findIt = LCase(ActiveCell.Value)
kout = Len(ActiveCell.Value)
Sheets("Sheet1").Select
Range("A2").Select
Do Until ActiveCell.Value = ""
cout = Len(ActiveCell.Value)
If kout > cout Then
ActiveCell.Offset(1, 0).Select
Exit Do
End If
For i = 1 To cout - kout
lookAt = LCase(Mid(ActiveCell.Value, i, kout))
If lookAt = findIt Then ' match - stop looking at this cell
match = match + 1
Exit For ' Would take this line out if want to count more
' in the same line.
End If
Next i
ActiveCell.Offset(1, 0).Select
Loop
Sheets("Sheet2").Select
ActiveCell.Offset(0, 1).Value = match
ActiveCell.Offset(1, 0).Select
k = k + 1
If k = 10 Then Exit Sub
Loop
End Sub

Enjoy...... Rick E.

Posted by Eric B on December 14, 2001 2:16 PM

Hmm, that takes me through the first 10 keywords and then stops (NT)

Posted by Rick E on December 15, 2001 6:06 AM

Here's the fix for that....

Sorry I put this in to test and stop after 10 keywords:

Delete the above line from the macro.

That will do it. Rick E.

Posted by Aladin Akyurek on December 15, 2001 11:24 AM

Why don't you just use it? [NT]

Posted by Eric B on December 17, 2001 5:46 AM

"Just use it", Nike ad campaign adapted for CSE formulae?

I appreciate very much all of the effort you guys have put into solutions on this board, and was thrilled to see both macro and formula based solutions to my keyword search problem. Right now I have the luxury of deciding which of the two working solutions will best fit my problem.

Both take a considerable amount of time to run. Since the keyword search is labile from use to use, converting the CSE results to values is not expedient, as I will just have to re-enter the formulas at a later date. If I pursue the CSE solution I will probably save a generic, formula-based workbook for initial searches and I will cut and paste the results into a "report" workbook.

The macro solution is nice because it is "no load" on the fpu if you don't initiate the macro. Also the extra workbook step is not necessary. However, I am more familiar with formulas, so it will be difficult for me to to tailor macro-based solutions to specific problems.
18764.html : On sheet2 a1:a50 is a list of keywords. On the corresponding row in column b on sheet 2 I would like a formula that counted the number of rows from sheet1, col A that contained that keyword (ignore caps). : M16762 Mouse interleukin 2 (IL-2) gene, exon 4 : M37897 Mouse interleukin 10 mRNA, complete cds : M25892 Mus musculus interleukin 4 (Il-4) mRNA, complete cds : M83649 Mus musculus Fas antigen mRNA, complete cds : J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) : J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) : J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) : J04423 E coli bioC protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) : J04423 E coli bioC protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) : J04423 E coli bioD gene dethiobiotin synthetase (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) : Col(A):Col(B) : mouse: 2 : E coli: 6

Posted by Eric B on December 17, 2001 5:47 AM

Thanks Rick E, that did the trick! [NT]

Posted by Aladin Akyurek on December 17, 2001 8:18 AM

Re: "Just use it", Nike ad campaign adapted for CSE formulae?

That's fine having more than one solution. The array formula doesn't take much time to finish (I checked this in a workbook otherwise empty). It's the recalcs that can be infuriating if you have lots of them. I had in mind computing once than doing a copy followed by Edit|Paste Special -> Values in place except for the first cell.

Lets say that you have the following formula in B2 in Sheet2:

=SUMPRODUCT(ISNUMBER(SEARCH(A1,Sheet1!$A$1:$A$4000))+0)

[ This is the non-CSE version of the array-formula you've discovered, which is of course equally costly. ]

Give a double click on the little black square of B2.
Copy B3 to the last cell of column B and Edit|Paste Special ->Values in place from B3.

So you keep the formula in just one cell. When you need it again, double click on the black square.

I thought this would be comparable to running a macro just when needed.

Nike ad campaign? Hardly. :)

Aladin

======= :



Posted by Eric B on December 17, 2001 9:54 AM

Thanks Aladin- that's great advice for managing array formulas

My apologies if the Nike bon mot offended you, I was poking fun at my own tentativeness :D