Nashpotatoes
New Member
- Joined
- Jun 21, 2022
- Messages
- 7
- Office Version
- 365
- Platform
- Windows
I am using excel to search multiple text reports (histology reports) that are normally 8-10 sentences long. I want to search for specific phrases such as "chronic scarring", or "calcification", include indeterminate findings as a separate categories, and exclude instances these phrases are preceded by things like "there is no",
So far, what I have is summarised below, but I feel there could be an easier way to:
1) Simplify my formula searching for multiple variations on "there is no sign of", so that if I come across a new phrase that I can add it easily
2) Or simplify the process so that I do not need to extract sentences from the full report into a separate column
I would be very grateful for any help.
My current formulas looks like this:
In B, to extract sentences I have:
=IFERROR(TEXTJOIN(". ",TRUE,FILTERXML("<t><s>"&SUBSTITUTE($A2,".","</s><s>")&"</s></t>","//s[contains(translate(.,'C','c'), 'calcification')]")),"")
In D column I have:
=IF(ISBLANK(B2),"no",IF(OR(ISNUMBER(SEARCH("there is no",B2)),ISNUMBER(SEARCH("no evidence",B2)),ISNUMBER(SEARCH("no features of",B2)),ISNUMBER(SEARCH("no sign of",B2))),"no",(IF(OR(ISNUMBER(SEARCH("borderline",B2)),ISNUMBER(SEARCH("some evidence of",B2))),"Borderline",(IF(OR(ISNUMBER(SEARCH("possibly",B2)),ISNUMBER(SEARCH("may be",B2)),ISNUMBER(SEARCH("suspicious",B2)), ISNUMBER(SEARCH("cannot be exluded",B2)), ISNUMBER(SEARCH("although cannot exclude",B2))),"Possibly","yes")))))
Nb. Sometimes the reports no not mention the phrases "calcification" or "chronic scarring" at all
So far, what I have is summarised below, but I feel there could be an easier way to:
1) Simplify my formula searching for multiple variations on "there is no sign of", so that if I come across a new phrase that I can add it easily
2) Or simplify the process so that I do not need to extract sentences from the full report into a separate column
I would be very grateful for any help.
- | A | B | C | D | E | F | |||
1 | Full Report | Extracted sentence: Calcification | Extracted sentence: Chronic scarring | Calcification? | Chronic scarring? | ||||
2 |
| There is calcification | There is no chronic scarring | Yes | No | ||||
3 |
| No evidence of calcification | There is borderline chronic scaring | No | Borderline | ||||
4 | Signs of possible calcification. Evidence of chronic scarring | Signs of possible calcification | Evidence of chronic scarring | Possibly | Yes |
My current formulas looks like this:
In B, to extract sentences I have:
=IFERROR(TEXTJOIN(". ",TRUE,FILTERXML("<t><s>"&SUBSTITUTE($A2,".","</s><s>")&"</s></t>","//s[contains(translate(.,'C','c'), 'calcification')]")),"")
In D column I have:
=IF(ISBLANK(B2),"no",IF(OR(ISNUMBER(SEARCH("there is no",B2)),ISNUMBER(SEARCH("no evidence",B2)),ISNUMBER(SEARCH("no features of",B2)),ISNUMBER(SEARCH("no sign of",B2))),"no",(IF(OR(ISNUMBER(SEARCH("borderline",B2)),ISNUMBER(SEARCH("some evidence of",B2))),"Borderline",(IF(OR(ISNUMBER(SEARCH("possibly",B2)),ISNUMBER(SEARCH("may be",B2)),ISNUMBER(SEARCH("suspicious",B2)), ISNUMBER(SEARCH("cannot be exluded",B2)), ISNUMBER(SEARCH("although cannot exclude",B2))),"Possibly","yes")))))
Nb. Sometimes the reports no not mention the phrases "calcification" or "chronic scarring" at all