String extraction with a number of conditions?

Rnkhch

Well-known Member
Joined
Apr 28, 2018
Messages
528
Office Version
  1. 365
Platform
  1. Windows
Hello,

I have a column of data with patient visit day information which looks like below:

Day x(text) - y min (text)
Day x(text) - y-z min (text)
Day x(text) - y hr (text)
Day x(text) - y-z hr (text)
Day x(text) - pre (text)
Day x(text) - baseline (text)

where (text) implies that some text may or may not be at those positions.

My goal is to come up with a formula to get the "min" and "hr" numbers, i.e. the numbers immediately preceding "min" and "hr". Also if "pre" or "baseline" are mentioned, I want those words as the output. And if "min", "hr", "pre" or "baseline" are not mentioned, I want an "N/A" output. One issue is that the data is entered sloppily and not all the spaces are necessarily there 😭 Also it would be really nice to get rid of the spaces in between the numbers and the hyphen for hour and min info. Here is an XL2BB sample of inputs and outputs that I need:

Book1
ABC
1InputDesired output
2Day 17N/A
3Day 1 - 0-60 min Post0-60 Min
4Day 1 - prePre
5Day 1 - 7 Hr Post1-7 Hr
6Day 1 - pre(visit1)Pre
7DAY 1(VISIT 1) - 2-4 hr Post2-4 Hr
8DAY 1(VISIT 2) - 0-1 Hr Post(only visit 2)0-1 Hr
9Baseline -Baseline
10DAY 33(VISIT 9) -N/A
11Day12(baseline)Baseline
12Day 11 - 7Hr Post7 Hr
13DAY 23 - 6 - 8 hr Post6-8 Hr
14
Sheet1



Thanks for any input! 🤗
 

Excel Facts

Which lookup functions find a value equal or greater than the lookup value?
MATCH uses -1 to find larger value (lookup table must be sorted ZA). XLOOKUP uses 1 to find values greater and does not need to be sorted.
Try in B2 the following formula:
Excel Formula:
=LET(sStr,A2,aStr,UPPER(TEXTAFTER(sStr,"-")),kW,TEXTSPLIT( UPPER("min,hr,pre,baseline"),,","),LFor,TEXTJOIN("",TRUE, IFERROR(IF(FIND(kW,aStr)>0,kW,""),"")),pOut,IFERROR(IF(FIND("BASELINE",UPPER(sStr))>0,"Baseline",""),TRIM(TEXTBEFORE(aStr,LFor)&" "&PROPER(LFor))),IF(LEN(pOut)>2,pOut,NA()))

BR1 Sales_Ledger RECON(9.2.3) ott-2022
ABCD
1InputDesired outputFormula Out
2Day 17N/A#N/D
3Day 1 - 0-60 min Post0-60 Min0-60 Min
4Day 1 - prePrePre
5Day 1 - 7 Hr Post1-7 Hr7 Hr
6Day 1 - pre(visit1)PrePre
7DAY 1(VISIT 1) - 2-4 hr Post2-4 Hr2-4 Hr
8DAY 1(VISIT 2) - 0-1 Hr Post(only visit 2)0-1 Hr0-1 Hr
9Baseline -BaselineBaseline
10DAY 33(VISIT 9) -N/A#N/D
11Day12(baseline)BaselineBaseline
12Day 11 - 7Hr Post7 Hr7 Hr
13DAY 23 - 6 - 8 hr Post6-8 Hr6 - 8 Hr
Foglio12
Cell Formulas
RangeFormula
D2:D13D2=LET(sStr,A2,aStr,UPPER(TEXTAFTER(sStr,"-")),kW,TEXTSPLIT( UPPER("min,hr,pre,baseline"),,","),LFor,TEXTJOIN("",TRUE, IFERROR(IF(FIND(kW,aStr)>0,kW,""),"")),pOut,IFERROR(IF(FIND("BASELINE",UPPER(sStr))>0,"Baseline",""),TRIM(TEXTBEFORE(aStr,LFor)&" "&PROPER(LFor))),IF(LEN(pOut)>2,pOut,NA()))
 
Upvote 0
Hi Anthony, thanks much! This is fabulous! I noticed one little thing that is puzzling me. I would need to use the FILTER function to get the input for your function. When the input of your function is changed to a FILTER or a range that corresponds to a FILTER output (rather than cell by cell dragging), then the function doesn't work. Any chance you can fix that? Here is an example XL2BB in which I duplicated the top range of cells at the bottom through a FILTER. In D17, I specified the input as A17#:

Book1
ABCDE
1InputDesired outputFormula Out
2Day 17N/A#N/A
3Day 1 - 0-60 min Post0-60 Min0-60 Min
4Day 1 - prePrePre
5Day 1 - 7 Hr Post1-7 Hr7 Hr
6Day 1 - pre(visit1)PrePre
7DAY 1(VISIT 1) - 2-4 hr Post2-4 Hr2-4 Hr
8DAY 1(VISIT 2) - 0-1 Hr Post(only visit 2)0-1 Hr0-1 Hr
9Baseline -BaselineBaseline
10DAY 33(VISIT 9) -N/A#N/A
11Day12(baseline)BaselineBaseline
12Day 11 - 7Hr Post7 Hr7 Hr
13DAY 23 - 6 - 8 hr Post6-8 Hr6 - 8 Hr
14
15Day 1 - (0-60 min Post)(0-60 Min
16
17Day 17#N/A
18Day 1 - 0-60 min Post#N/A
19Day 1 - prePre
20Day 1 - 7 Hr Post#N/A
21Day 1 - pre(visit1)Pre
22DAY 1(VISIT 1) - 2-4 hr Post#N/A
23DAY 1(VISIT 2) - 0-1 Hr Post(only visit 2)#N/A
24Baseline -Baseline
25DAY 33(VISIT 9) -#N/A
26Day12(baseline)Baseline
27Day 11 - 7Hr Post#N/A
28DAY 23 - 6 - 8 hr Post#N/A
29
Sheet1
Cell Formulas
RangeFormula
D2:D13,D15D2=LET(sStr,A2,aStr,UPPER(TEXTAFTER(sStr,"-")),kW,TEXTSPLIT( UPPER("min,hr,pre,baseline"),,","),LFor,TEXTJOIN("",TRUE, IFERROR(IF(FIND(kW,aStr)>0,kW,""),"")),pOut,IFERROR(IF(FIND("BASELINE",UPPER(sStr))>0,"Baseline",""),TRIM(TEXTBEFORE(aStr,LFor)&" "&PROPER(LFor))),IF(LEN(pOut)>2,pOut,NA()))
A17:A28A17=FILTER(A2:A13,1)
D17:D28D17=LET(sStr,A17#,aStr,UPPER(TEXTAFTER(sStr,"-")),kW,TEXTSPLIT( UPPER("min,hr,pre,baseline"),,","),LFor,TEXTJOIN("",TRUE, IFERROR(IF(FIND(kW,aStr)>0,kW,""),"")),pOut,IFERROR(IF(FIND("BASELINE",UPPER(sStr))>0,"Baseline",""),TRIM(TEXTBEFORE(aStr,LFor)&" "&PROPER(LFor))),IF(LEN(pOut)>2,pOut,NA()))
Dynamic array formulas.



In this scenario, your function outputs the pres and baselines, but misses the hours and minutes.

Also in the middle section, there is an input with parentheses and the open parenthesis appears in the output.

Thank you so much!
 
Upvote 0
=LET(sStr,A17#,aStr,etc etc
Why you used "A17#", that stands for "the expanded output that starts in A17", in the formula you set in D17?
It has to be simply sStr,A17:
Excel Formula:
=LET(sStr,A17,aStr,etc etc

If you wish to remove the opening parenthesis (example in D15) use SUBSTITUTE to define the value of sStr:
Excel Formula:
=LET(sStr,SUBSTITUTE(A17,"(",""),aStr,etc etc

Note: this is not an array formula; you set the formula in the first row (setting wich is the value to be decoded, ie "sStr,ItsAddress") and then copy the formula down for the remaining rows.
 
Last edited:
Upvote 0
@Rnkhch
Are you sure that your 'Desired output' column is correct?
Wondering why the day number appears in the expected output in B5 but the day number does not appear in the expected output in B12 when the column A data is virtually identical?

BTW, would it ever be possible that the text in column A could inadvertently include some of the short texts that you are searching for? Examples of the sorts of things I am wondering about would be
DAY 1 0-1 Hr Post(only 2 present) As well as the obvious "hr", this also include "pre"
Day 1 - pre(visit three) This also contains both "hr" and "pre"
If such things are possible, how do you envisage handling those?
 
Upvote 0
@Peter_SSs
Are you sure that your 'Desired output' column is correct?
Thanks for catching my typo for B5.

BTW, would it ever be possible that the text in column A could inadvertently include some of the short texts that you are searching for?
I haven't seen that happening yet luckily, but good points

@Anthony47
Why you used "A17#", that stands for "the expanded output that starts in A17", in the formula you set in D17?
I wanted to mimic a FILTER situation. Here is how my actual FILTER looks like, and your function doesn't catch the hours and minutes when I use this FILTER for the input:

Excel Formula:
=UNIQUE(LET(sStr,FILTER(FMBMcReqDay,(FMBMcGeqSID=IQ2)*(ISNUMBER(SEARCH("sample titer",FMBMcL)))*(ISNUMBER(SEARCH("sputum",FMBMcPeqSpec)))),aStr,UPPER(TEXTAFTER(sStr,"-")),kW,TEXTSPLIT( UPPER("min,hr,pre,baseline"),,","),LFor,TEXTJOIN("",TRUE, IFERROR(IF(FIND(kW,aStr)>0,kW,""),"")),pOut,IFERROR(IF(FIND("BASELINE",UPPER(sStr))>0,"Baseline",""),TRIM(TEXTBEFORE(aStr,LFor)&" "&PROPER(LFor))),IF(LEN(pOut)>2,pOut,NA())))

where the input is this FILTER: FILTER(FMBMcReqDay,(FMBMcGeqSID=IQ2)*(ISNUMBER(SEARCH("sample titer",FMBMcL)))*(ISNUMBER(SEARCH("sputum",FMBMcPeqSpec))))

Basically this filter looks at columnR (about 20,000 rows long) and filters based on columnG=IQ2 and if it finds the term "sample titer" in columnL and the term "sputum" in column P. At the end of the filtering process, it generates an array which roughly in the range of hundreds of elements (big decrease from 20,000), and then I want this array to go into your function to get the hour and minute numbers.

Please let me know if you guys need more information. Thanks 🤗
 
Upvote 0
I don't know how the formula will perform on 20k lines (and I had known that the data set was so long I should have proposed a macro or a user function); at the moment my suggestion is to embed the current formula into an additional IF and copy it down for 20k lines.
So, for line 2
Excel Formula:
=IF(A2<>"",LET(sStr,A2,aStr,UPPER(TEXTAFTER(sStr,"-")),kW,TEXTSPLIT(UPPER("min,hr,pre,baseline"),,","),LFor,TEXTJOIN("",TRUE, IFERROR(IF(FIND(kW,aStr)>0,kW,""),"")),pOut,IFERROR(IF(FIND("BASELINE",UPPER(sStr))>0,"Baseline",""),TRIM(TEXTBEFORE(aStr,LFor)&" "&PROPER(LFor))),IF(LEN(pOut)>2,pOut,NA())),"")

Modify according your starting line (in 2 positions)
 
Upvote 0
Solution
Thank you! 🤗 Today was jury day, so I'm gonna test tomorrow at work and see how it works.
 
Upvote 0
Power Query is built for cleaning up sloppy data. A series of conditional columns and extractions should be able to solve it.
 
Upvote 0
@Anthony47
It worked very well with my current dataset; thanks much! 🤗 I will let you know in case I need some adjustments if I can't figure out on my own (depending on what I will encounter in other data sets).

@Automatrix
Thank you! I haven't used Power Query yet, but I heard about it a few weeks ago, and I'm excited to try and learn it. If you know a website with a good intro/pointers, please share the link.
 
Upvote 0

Forum statistics

Threads
1,215,400
Messages
6,124,702
Members
449,180
Latest member
craigus51286

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top