Help Extracting Text from within a large, complicated String

chonchos

New Member
Joined
Nov 4, 2016
Messages
16
In it's most simplest form my need is simple... I have a cell that contains a "note field". I would like to search inside this note field and pull out a date that is before a specific sub-string. It gets tricky when I start to break it down and try to actually make it happen, however.

The data I need to extract is in this format at random places within the string: "4/21/14 EMAIL QT RECD". Sometimes it starts at the very first character, sometimes its 100 characters in.

A couple examples of note fields:

Example 1:

4/17/14 EMAIL QT RECD, NON STD OVHL, QT NOT APPROVED, COMPANY WILL PROVIDE PIECE PARTS. DETAILS: RECD MISSING HYDRAULIC INLET, 3 TIE BOLTS, 2 TIE BOLT NUTS, 1 BLEEDER SCREW, 4 TORQUE LINER SCREWS, 3 ADJUSTER PIN NUTS FAILED INSPECTION, PISTON HOUSING REQUIRES CORROSION REMOVAL, QT FOR ROUTINE PARTS & LABOR USING OEM PARTS $1234.56 + NON ROUTINE PARTS $123.45, BB 123456 $1234.56, QT STILL NOT APPROVED (AB).

Example 2:


** PLEASE EXPEDITE THE EVALUATION, INSPECT THOROUGHLY FOR CRACKS & DEFECTS (EARLY SERIAL NUMBER WHEEL) ** QUOTE OVERHAUL INCLUDING NEW TIRE (COMPANY PN: 12345-6). IN THIS SHIPMENT (ALL USED): WHEEL, HARDWARE, TIRE, SLEEVE. ** MISSING / NOT INCLUDED: BEARING CONES, SEALS, RETAINING CLIPS ** 12/23/15 EMAIL QT RECD, NON STD OVHL, REQURIES 3 EA PACKING $1.00 + DECAL $2.00 + TIRE $3.00 + WHEEL HALF $4.00 + 2 RING $5.00 + 2 BEARING $6.00 + 2 SEALS $7.00 + PACKING $8.00, QT NOT APPROVED (AB). ** SEE GLOBAL S.O. 123456 NOTES FOR FULL CUSTOMER DETAILS** 12/29/15 PER CD, COMPANY CAN PROVIDE IB WHEEL HALF PN 1234-567-8 AND WE CAN REDUCE BB TO $1.00 (WHEEL HALF) + $2.00 (MISSING SEALS) = $3.00 (AB). 1/22/16 CUSTOMER USED UP TIME ALLOWED TO PROVIDE REPLACEMENT CORE OR IB WHEEL HALF, FINALIZING BB TO 123456 AT $3.00 (AB). 1/22/16 QUOTE NOT YET APPROVED, CHECKING STATUS OF COMPANY PROVIDED IB WHEEL HALF WITH CD (AB). UPDATE: 4/25/16 EMAILED CUSTOMER REMINDER THAT WE NEED GOOD HALF OVERHAULED AND RETURNED, ALL REMAINING PARTS TO BE SEPARATED GOOD/BAD (AB). 4/27/16 APPROVED QT FOR $4.00 ONLY AND REQUESTED OTB WHEEL HALF BE RETURNED OVERHAULED WITH 8130-3, AND ALL REMAINING PARTS MARKED EITHER GOOD OR BAD (AB).




I got pretty close to what I wanted by scouring forums and eventually slightly modifying the following formula:

=MID(N1,SEARCH("EMAIL QT RECD",N1)-8,SEARCH(" ",N1,SEARCH("EMAIL QT RECD",N1)-8)-(SEARCH("EMAIL QT RECD",N1)-8))

A few problems with this solution; it cuts off the first digit of dates formatted as MM/DD/YY instead of M/DD/YY. Also, if the date is formatted as M/D/YY and is the first character in the string it returns #VALUE !.

I've tried playing around with it to make it work but I just can't get it there.

I'm using Excel 2010 on Windows 7 but this solution will eventually be used by coworkers running newer versions of Excel as well.


If I can get this part worked out I would also like to be able to extract other dates as well from within the same note field by searching for other sub-strings. For example, "APPROVED QT, QUOTE APPROVED, QUOTE NOT APPROVED, ETA SENT etc. I would like the date of whichever string is found last in the search to show in the result cell.

An example of this would be:

1/30/15 EMAIL QT RECD, PART WILL COST $X TO REPAIR
2/7/15 QT APPROVED, EMAILED SHOP FOR STANDARD OVERHAUL
2/25/15 ETA SENT, SHOP WILL RETURN PART ON 3/15/15

The return result would be 2/25/15 even though all three strings are found in the search.

I know this is a lot and it's complicated but I've banged my head on just the first part for the past week and half. So I'm hoping someone with more experience and know-how can help me out on this one. At least push me off in the right direction maybe! Thanks!
 

Excel Facts

Remove leading & trailing spaces
Save as CSV to remove all leading and trailing spaces. It is faster than using TRIM().
Hello... Try this out...

=IFERROR(MID(N1,SEARCH("EMAIL QT RECD",N1)-9,SEARCH(" ",N1,SEARCH("EMAIL QT RECD",N1)-9)-(SEARCH("EMAIL QT RECD",N1)-9)),IFERROR(MID(N1,SEARCH("EMAIL QT RECD",N1)-8,SEARCH(" ",N1,SEARCH("EMAIL QT RECD",N1)-8)-(SEARCH("EMAIL QT RECD",N1)-8)),MID(N1,SEARCH("EMAIL QT RECD",N1)-7,SEARCH(" ",N1,SEARCH("EMAIL QT RECD",N1)-7)-(SEARCH("EMAIL QT RECD",N1)-7))))

I see that it will not cut it off if you go up to 9 on your minus value

so if it errors out it will return to your previous value of 8

As for the second part im not sure how to go about it but i will try my best to find something

EDIT: added also the value 7 in order to compensate if the date has a 1 digit day as well like 2/2/17
 
Last edited:
Upvote 0
That was a quick response! Thank you!

I tried it out and got some strange results along with some good ones. It seemed to like the MM/DD/YY style dates just fine. But didn't do so well on some others.

For this note field:

PLEASE OVERHAUL & RETURN AS PN: 1234567-8. 5/6/15 EMAIL QT RECD, NON STD OVHL, REQUIRES SPRING PN 1234567-8 (MISSING) $12.34 + FITTING PN 1234567-8 (MISSING) $123.00, QT NOT APPROVED (AB).


The result returned just a period. Just "."

For this note field:

1/6/17 EMAIL QT RECD, NON-STD OVHL, REQUIRES DRIVE COUPLING $123.45, ROTOR (REWORKED) $1,234.56, LABOR $123.00, TOTAL OF $1,234.56, 1/9/17 QT NOT APPROVED, GETTING W/ CD REGARDING BB (EF). 1/11/17 PER CD, COMPANY SUPPLYING PMA ROTOR FROM PO# 123456, QT APPRD FOR $123.45 (EF).

The result returned #VALUE !.
 
Upvote 0
Welcome to the forum.

Maybe something like this:

ABCDEF
1EMAIL QT RECDEMAILED CUSTOMERAPPROVED QTETA SENT
24/17/14 EMAIL QT RECD, NON STD OVHL, QT NOT APPROVED, COMPANY WILL PROVIDE PIECE PARTS. DETAILS: RECD MISSING HYDRAULIC INLET, 3 TIE BOLTS, 2 TIE BOLT NUTS, 1 BLEEDER SCREW, 4 TORQUE LINER SCREWS, 3 ADJUSTER PIN NUTS FAILED INSPECTION, PISTON HOUSING REQUIRES CORROSION REMOVAL, QT FOR ROUTINE PARTS & LABOR USING OEM PARTS $1234.56 + NON ROUTINE PARTS $123.45, BB 123456 $1234.56, QT STILL NOT APPROVED (AB).4/17/14
3** PLEASE EXPEDITE THE EVALUATION, INSPECT THOROUGHLY FOR CRACKS & DEFECTS (EARLY SERIAL NUMBER WHEEL) ** QUOTE OVERHAUL INCLUDING NEW TIRE (COMPANY PN: 12345-6). IN THIS SHIPMENT (ALL USED): WHEEL, HARDWARE, TIRE, SLEEVE. ** MISSING / NOT INCLUDED: BEARING CONES, SEALS, RETAINING CLIPS ** 12/23/15 EMAIL QT RECD, NON STD OVHL, REQURIES 3 EA PACKING $1.00 + DECAL $2.00 + TIRE $3.00 + WHEEL HALF $4.00 + 2 RING $5.00 + 2 BEARING $6.00 + 2 SEALS $7.00 + PACKING $8.00, QT NOT APPROVED (AB). ** SEE GLOBAL S.O. 123456 NOTES FOR FULL CUSTOMER DETAILS** 12/29/15 PER CD, COMPANY CAN PROVIDE IB WHEEL HALF PN 1234-567-8 AND WE CAN REDUCE BB TO $1.00 (WHEEL HALF) + $2.00 (MISSING SEALS) = $3.00 (AB). 1/22/16 CUSTOMER USED UP TIME ALLOWED TO PROVIDE REPLACEMENT CORE OR IB WHEEL HALF, FINALIZING BB TO 123456 AT $3.00 (AB). 1/22/16 QUOTE NOT YET APPROVED, CHECKING STATUS OF COMPANY PROVIDED IB WHEEL HALF WITH CD (AB). UPDATE: 4/25/16 EMAILED CUSTOMER REMINDER THAT WE NEED GOOD HALF OVERHAULED AND RETURNED, ALL REMAINING PARTS TO BE SEPARATED GOOD/BAD (AB). 4/27/16 APPROVED QT FOR $4.00 ONLY AND REQUESTED OTB WHEEL HALF BE RETURNED OVERHAULED WITH 8130-3, AND ALL REMAINING PARTS MARKED EITHER GOOD OR BAD (AB).12/23/154/25/164/27/16
41/30/15 EMAIL QT RECD, PART WILL COST $X TO REPAIR
2/7/15 QT APPROVED, EMAILED SHOP FOR STANDARD OVERHAUL
2/25/15 ETA SENT, SHOP WILL RETURN PART ON 3/15/15
1/30/152/25/15
5

<tbody>
</tbody>
Sheet3

Worksheet Formulas
CellFormula
B2=IFERROR(TRIM(RIGHT(" "&TRIM(LEFT(SUBSTITUTE($A2,B$1,REPT(" ",LEN($A2)),(LEN($A2)-LEN(SUBSTITUTE($A2,B$1,"")))/LEN(B$1)),LEN($A2))),8)),"")

<tbody>
</tbody>

<tbody>
</tbody>


Put the phrase you're looking for in row 1. You can now copy the formula from B2 down and to the right as needed.
 
Last edited:
Upvote 0
I can't edit my posts for some reason... but I did see your edit after I posted that above. I tried it with the change (added the value 7).

It fixed the #Value problem on my second example.

The "period" issues are still there.

Also, some result fields are just showing blanks despite the "search string" being available. If that makes sense?
 
Upvote 0
Wow Eric... That is pretty awesome man.

Almost perfect actually.

It is picking up a few unneeded characters on some results though.

A few examples:

. 5/6/15
* 2/4/16

<tbody>
</tbody>

<tbody>
</tbody>
2/20/17:

<tbody>
</tbody>
R.6/8/17
! 6/5/17

<tbody>
</tbody>

<tbody>
</tbody>

Is there anyway to change the formula to remove those characters? Unfortunately I don't understand it well enough to try and alter it myself. :\
 
Upvote 0
It's always pretty iffy when you are looking at manual entry. There are typos, and different people have different styles. I had hoped that I could count on a space before the date, and a space after, but that doesn't seem to be the case. And to make things tougher, a date can be 6 characters (1/1/15) to 8 characters (12/31/15), and if someone uses a 4-digit year (12/31/2015), things get even worse.

I'd recommend sticking with my original formula if you can, and try to deal with the variations. Try to train your staff to put a space before and after any date. But if you really want to try to clean it up automatically, you can try this:

ABCDE
1EMAIL QT RECDEMAILED CUSTOMERAPPROVED QTETA SENT
24/17/14 EMAIL QT RECD, NON STD OVHL, QT NOT APPROVED, COMPANY WILL PROVIDE PIECE PARTS. DETAILS: RECD MISSING HYDRAULIC INLET, 3 TIE BOLTS, 2 TIE BOLT NUTS, 1 BLEEDER SCREW, 4 TORQUE LINER SCREWS, 3 ADJUSTER PIN NUTS FAILED INSPECTION, PISTON HOUSING REQUIRES CORROSION REMOVAL, QT FOR ROUTINE PARTS & LABOR USING OEM PARTS $1234.56 + NON ROUTINE PARTS $123.45, BB 123456 $1234.56, QT STILL NOT APPROVED (AB).7/14/2017
3** PLEASE EXPEDITE THE EVALUATION, INSPECT THOROUGHLY FOR CRACKS & DEFECTS (EARLY SERIAL NUMBER WHEEL) ** QUOTE OVERHAUL INCLUDING NEW TIRE (COMPANY PN: 12345-6). IN THIS SHIPMENT (ALL USED): WHEEL, HARDWARE, TIRE, SLEEVE. ** MISSING / NOT INCLUDED: BEARING CONES, SEALS, RETAINING CLIPS ** 12/23/15 EMAIL QT RECD, NON STD OVHL, REQURIES 3 EA PACKING $1.00 + DECAL $2.00 + TIRE $3.00 + WHEEL HALF $4.00 + 2 RING $5.00 + 2 BEARING $6.00 + 2 SEALS $7.00 + PACKING $8.00, QT NOT APPROVED (AB). ** SEE GLOBAL S.O. 123456 NOTES FOR FULL CUSTOMER DETAILS** 12/29/15 PER CD, COMPANY CAN PROVIDE IB WHEEL HALF PN 1234-567-8 AND WE CAN REDUCE BB TO $1.00 (WHEEL HALF) + $2.00 (MISSING SEALS) = $3.00 (AB). 1/22/16 CUSTOMER USED UP TIME ALLOWED TO PROVIDE REPLACEMENT CORE OR IB WHEEL HALF, FINALIZING BB TO 123456 AT $3.00 (AB). 1/22/16 QUOTE NOT YET APPROVED, CHECKING STATUS OF COMPANY PROVIDED IB WHEEL HALF WITH CD (AB). UPDATE: 4/25/16 EMAILED CUSTOMER REMINDER THAT WE NEED GOOD HALF OVERHAULED AND RETURNED, ALL REMAINING PARTS TO BE SEPARATED GOOD/BAD (AB). 4/27/16 APPROVED QT FOR $4.00 ONLY AND REQUESTED OTB WHEEL HALF BE RETURNED OVERHAULED WITH 8130-3, AND ALL REMAINING PARTS MARKED EITHER GOOD OR BAD (AB).12/23/20154/25/20164/27/2016
41/30/15 EMAIL QT RECD, PART WILL COST $X TO REPAIR
2/7/15 QT APPROVED, EMAILED SHOP FOR STANDARD OVERHAUL
2/25/15 ETA SENT, SHOP WILL RETURN PART ON 3/15/15
1/30/20152/25/2015
5PLEASE OVERHAUL & RETURN AS PN: 1234567-8. 5/6/15 EMAIL QT RECD, NON STD OVHL, REQUIRES SPRING PN 1234567-8 (MISSING) $12.34 + FITTING PN 1234567-8 (MISSING) $123.00, QT NOT APPROVED (AB).5/6/2015
61/6/17 EMAIL QT RECD, NON-STD OVHL, REQUIRES DRIVE COUPLING $123.45, ROTOR (REWORKED) $1,234.56, LABOR $123.00, TOTAL OF $1,234.56, 1/9/17 QT NOT APPROVED, GETTING W/ CD REGARDING BB (EF). 1/11/17 PER CD, COMPANY SUPPLYING PMA ROTOR FROM PO# 123456, QT APPRD FOR $123.45 (EF).6/17/2017
7

<tbody>
</tbody>
Sheet3

Array Formulas
CellFormula
B2{=IFERROR(LARGE(IFERROR(MID(IFERROR(TRIM(RIGHT(TRIM(LEFT(SUBSTITUTE($A2,B$1,REPT(" ",LEN($A2)),(LEN($A2)-LEN(SUBSTITUTE($A2,B$1,"")))/LEN(B$1)),LEN($A2))),9)),""),{1,2,3,4},{6;7;8})+0,""),1),"")}

<tbody>
</tbody>
Entered with Ctrl+Shift+Enter. If entered correctly, Excel will surround with curly braces {}.
Note: Do not try and enter the {} manually yourself

<tbody>
</tbody>



The look of the formula is not as clean :) , and it ramps the complexity up quite a bit. I can also guarantee that it still won't fix everything, but it should catch some extraneous characters.

This is an array formula, so you need to confirm it with Control+Shift+Enter. Also, this converts the character date into a number, which is how Excel internally stores dates. So to see it as a date, make sure to format the cells with the formula as a date. You can pick a MM/DD/YY format instead of the MM/DD/YYYY if you want.

Let me know what you do.
 
Last edited:
Upvote 0
Well Eric... this is awesome man. I ended up using the array formula and it worked out pretty well. It's not perfect, like you said, but it works as well as it can given the data it's working with! I did a pretty thorough spot check using both formulas and I like the results from the array versus the first formula given. It just returns better results for what I have. Both work, but for me, I'll take the array! :)

This is a report/workload that we're going to be working on changing. So moving forward I'll definitely help the staff with changing how they enter data in the notefield so we can get proper results back. Since they will be the ones using the report, I'm sure implementing the data changes won't be too difficult. They want this to work more efficiently as much as anyone.

I can't thank you enough for your help with this. I never would have whipped this one on my own. You have no idea how many man-hours this is going to save us, how much this is going to help this department. Thank you thank you thank you, a million times thank you!!
 
Upvote 0
I'm glad it works out for you! Good job doing a good test/comparison too.

And thanks for the feedback - a nice response like yours makes this all worthwhile! :cool:
 
Upvote 0

Forum statistics

Threads
1,214,883
Messages
6,122,077
Members
449,064
Latest member
MattDRT

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top