Help Extracting Text from within a large, complicated String

chonchos

New Member
Joined
Nov 4, 2016
Messages
16
In it's most simplest form my need is simple... I have a cell that contains a "note field". I would like to search inside this note field and pull out a date that is before a specific sub-string. It gets tricky when I start to break it down and try to actually make it happen, however.

The data I need to extract is in this format at random places within the string: "4/21/14 EMAIL QT RECD". Sometimes it starts at the very first character, sometimes its 100 characters in.

A couple examples of note fields:

Example 1:

4/17/14 EMAIL QT RECD, NON STD OVHL, QT NOT APPROVED, COMPANY WILL PROVIDE PIECE PARTS. DETAILS: RECD MISSING HYDRAULIC INLET, 3 TIE BOLTS, 2 TIE BOLT NUTS, 1 BLEEDER SCREW, 4 TORQUE LINER SCREWS, 3 ADJUSTER PIN NUTS FAILED INSPECTION, PISTON HOUSING REQUIRES CORROSION REMOVAL, QT FOR ROUTINE PARTS & LABOR USING OEM PARTS $1234.56 + NON ROUTINE PARTS $123.45, BB 123456 $1234.56, QT STILL NOT APPROVED (AB).

Example 2:


** PLEASE EXPEDITE THE EVALUATION, INSPECT THOROUGHLY FOR CRACKS & DEFECTS (EARLY SERIAL NUMBER WHEEL) ** QUOTE OVERHAUL INCLUDING NEW TIRE (COMPANY PN: 12345-6). IN THIS SHIPMENT (ALL USED): WHEEL, HARDWARE, TIRE, SLEEVE. ** MISSING / NOT INCLUDED: BEARING CONES, SEALS, RETAINING CLIPS ** 12/23/15 EMAIL QT RECD, NON STD OVHL, REQURIES 3 EA PACKING $1.00 + DECAL $2.00 + TIRE $3.00 + WHEEL HALF $4.00 + 2 RING $5.00 + 2 BEARING $6.00 + 2 SEALS $7.00 + PACKING $8.00, QT NOT APPROVED (AB). ** SEE GLOBAL S.O. 123456 NOTES FOR FULL CUSTOMER DETAILS** 12/29/15 PER CD, COMPANY CAN PROVIDE IB WHEEL HALF PN 1234-567-8 AND WE CAN REDUCE BB TO $1.00 (WHEEL HALF) + $2.00 (MISSING SEALS) = $3.00 (AB). 1/22/16 CUSTOMER USED UP TIME ALLOWED TO PROVIDE REPLACEMENT CORE OR IB WHEEL HALF, FINALIZING BB TO 123456 AT $3.00 (AB). 1/22/16 QUOTE NOT YET APPROVED, CHECKING STATUS OF COMPANY PROVIDED IB WHEEL HALF WITH CD (AB). UPDATE: 4/25/16 EMAILED CUSTOMER REMINDER THAT WE NEED GOOD HALF OVERHAULED AND RETURNED, ALL REMAINING PARTS TO BE SEPARATED GOOD/BAD (AB). 4/27/16 APPROVED QT FOR $4.00 ONLY AND REQUESTED OTB WHEEL HALF BE RETURNED OVERHAULED WITH 8130-3, AND ALL REMAINING PARTS MARKED EITHER GOOD OR BAD (AB).




I got pretty close to what I wanted by scouring forums and eventually slightly modifying the following formula:

=MID(N1,SEARCH("EMAIL QT RECD",N1)-8,SEARCH(" ",N1,SEARCH("EMAIL QT RECD",N1)-8)-(SEARCH("EMAIL QT RECD",N1)-8))

A few problems with this solution; it cuts off the first digit of dates formatted as MM/DD/YY instead of M/DD/YY. Also, if the date is formatted as M/D/YY and is the first character in the string it returns #VALUE !.

I've tried playing around with it to make it work but I just can't get it there.

I'm using Excel 2010 on Windows 7 but this solution will eventually be used by coworkers running newer versions of Excel as well.


If I can get this part worked out I would also like to be able to extract other dates as well from within the same note field by searching for other sub-strings. For example, "APPROVED QT, QUOTE APPROVED, QUOTE NOT APPROVED, ETA SENT etc. I would like the date of whichever string is found last in the search to show in the result cell.

An example of this would be:

1/30/15 EMAIL QT RECD, PART WILL COST $X TO REPAIR
2/7/15 QT APPROVED, EMAILED SHOP FOR STANDARD OVERHAUL
2/25/15 ETA SENT, SHOP WILL RETURN PART ON 3/15/15

The return result would be 2/25/15 even though all three strings are found in the search.

I know this is a lot and it's complicated but I've banged my head on just the first part for the past week and half. So I'm hoping someone with more experience and know-how can help me out on this one. At least push me off in the right direction maybe! Thanks!
 
Hey Eric. I'm kind of digging up the dead here a little bit but I'm hoping you might be able to help me - again.

So after a lot of hemming and hawing from one final employee (a manager :oops:) I finally got her on board with this project. We've been using a standardized date format for months though (shout-out to AutoHotKey for making that easy for everyone!).

So now I'm FINALLY back to putting some real work in on this report. I noticed an issue and no matter how much I play with the formula I just can't figure it out. Issue persists.

This solution works WONDERFULLY for almost anything I throw at it. It hiccups when I have a date like this 1/21/18 as the first set of characters in the text. It can read a date like this 1/2/18 or this 11/22/18 as the first characters fine. I already asked about changing the formatting to include a leading zero on dates but our company owner does not "like it that way". So we're kind of stuck with the date formatting the way it is.

So, to put it another way...

Formula works fine and pulls the correct date into the "QUOTE REQUESTED" result column if the text reads like this:
1/2/18 QUOTE REQUESTED BLAH BLAH BLAH
OR
11/22/18 QUOTE REQUESTED BLAH BLAH BLAH
OR
11/17/17 SOMETHING HAPPENED FIRST. RANDOM OTHER NOTES PRECEDE "SEARCH TEXT". 1/2/18 QUOTE REQUESTED BLAH BLAH BLAH

Formula kicks back an incorrect date on something like this:
1/25/18 QUOTE REQUESTED BLAH BLAH BLAH. 2/15/18 OTHER NOTES AND STUFF AFTER BLAH BLAH BLAH.

I'm completely stumped. I've given it all I've got, read every Excel help I can find trying to decipher this array formula, randomly and blindly changed things with hope in hand, and just... nothing. I'm outta steam.

Aside from using leading zeros in the initial note creation... what else can I do?
 
Upvote 0

Excel Facts

Save Often
If you start asking yourself if now is a good time to save your Excel workbook, the answer is Yes
Well, I warned you that trying to parse variable data was difficult! :eek:

I can see your problem, and I can see why it happens. In fact, now that I look at it, some of the examples in post #8 show the same problem, I didn't notice at the time. Now the problem is how to fix it. I tried several things, including adding a preceding 0 to the date internally to the formula. But it's like Whack-a-Mole. Every time I fix a specific case, another potential problem jumps up.

At this point, I think the best idea is to create a UDF (user-defined function). This uses VBA, the built-in scripting language of Excel. It will make your workbook a macro-enabled workbook though. But VBA gives you a lot more options. Try this:

Open a COPY of your workbook. Press Alt-F11 to open the VBA editor. From the menu, click Insert > Module. In the window that opens, paste this code:

Code:
Public Function GetDate(ByVal MyText As String, ByVal MyHdr As String) As String
Dim x As Long, y As Variant, z As String

    x = InStrRev(MyText, MyHdr)
    If x = 0 Then Exit Function
    
    y = Split(Trim(Left(MyText, x - 1)))
    z = y(UBound(y))
    For x = 1 To Len(z)
        If Mid(z, x, 1) Like "[0-9/]" Then GetDate = GetDate & Mid(z, x, 1)
    Next x
    
End Function
Press Alt-Q to close the editor. Now in B2 of the same layout from post #8 , put this formula:

=GetDate($A2,B$1)

Copy down and across as needed. We may still need to tweak it a bit, but there are a lot more people able to modify a UDF like this than a monster formula. Let me know how it works.
 
Upvote 0
Amazing. Beautiful. WONDERFUL!

THANK YOU!

It looks like its working a charm. And! If I want to change it I think I stand a better chance at figuring this one out once I study it for a moment. Little easier to read. I'm a long shot from a VBA expert... but I dabble. :P

I think I might make a special plaque at work for you. "Erik W from Mr. Excel Guy - Honorary Employee of the Year".
 
Upvote 0
Give this a try too: Enter as many strings in the first row as needed. Then enter the formula with Ctrl - Shift - Enter into cell B2, then drag it across and down.

Excel Workbook
ABCDE
1EMAIL QT RECDEMAILED CUSTOMERQUOTE REQUISTEDETA SENT
24/17/14 EMAIL QT RECD, NON STD OVHL, QT NOT APPROVED, COMPANY WILL PROVIDE PIECE PARTS. DETAILS: RECD MISSING HYDRAULIC INLET, 3 TIE BOLTS, 2 TIE BOLT NUTS, 1 BLEEDER SCREW, 4 TORQUE LINER SCREWS, 3 ADJUSTER PIN NUTS FAILED INSPECTION, PISTON HOUSING REQUIRES CORROSION REMOVAL, QT FOR ROUTINE PARTS & LABOR USING OEM PARTS $1234.56 + NON ROUTINE PARTS $123.45, BB 123456 $1234.56, QT STILL NOT APPROVED (AB).4/17/14000
3** PLEASE EXPEDITE THE EVALUATION, INSPECT THOROUGHLY FOR CRACKS & DEFECTS (EARLY SERIAL NUMBER WHEEL) ** QUOTE OVERHAUL INCLUDING NEW TIRE (COMPANY PN: 12345-6). IN THIS SHIPMENT (ALL USED): WHEEL, HARDWARE, TIRE, SLEEVE. ** MISSING / NOT INCLUDED: BEARING CONES, SEALS, RETAINING CLIPS ** 12/23/15 EMAIL QT RECD, NON STD OVHL, REQURIES 3 EA PACKING $1.00 + DECAL $2.00 + TIRE $3.00 + WHEEL HALF $4.00 + 2 RING $5.00 + 2 BEARING $6.00 + 2 SEALS $7.00 + PACKING $8.00, QT NOT APPROVED (AB). ** SEE GLOBAL S.O. 123456 NOTES FOR FULL CUSTOMER DETAILS** 12/29/15 PER CD, COMPANY CAN PROVIDE IB WHEEL HALF PN 1234-567-8 AND WE CAN REDUCE BB TO $1.00 (WHEEL HALF) + $2.00 (MISSING SEALS) = $3.00 (AB). 1/22/16 CUSTOMER USED UP TIME ALLOWED TO PROVIDE REPLACEMENT CORE OR IB WHEEL HALF, FINALIZING BB TO 123456 AT $3.00 (AB). 1/22/16 QUOTE NOT YET APPROVED, CHECKING STATUS OF COMPANY PROVIDED IB WHEEL HALF WITH CD (AB). UPDATE: 4/25/16 EMAILED CUSTOMER REMINDER THAT WE NEED GOOD HALF OVERHAULED AND RETURNED, ALL REMAINING PARTS TO BE SEPARATED GOOD/BAD (AB). 4/27/16 APPROVED QT FOR $4.00 ONLY AND REQUESTED OTB WHEEL HALF BE RETURNED OVERHAULED WITH 8130-3, AND ALL REMAINING PARTS MARKED EITHER GOOD OR BAD (AB).04/25/1600
41/2/18 QUOTE REQUESTED BLAH BLAH BLAH 1/12/18 EMAILED CUSTOMER 2/2/2011 EMAIL QT RECD2/2/2011000
51/2/18 QUOTE REQUESTED BLAH BLAH BLAH 1/12/18 EMAILED CUSTOMER 12/02/2011 ETA SENT00012/02/2011
Sheet2
 
Upvote 0
Using Eric's layout (Message #8 ), here is another UDF (user defined function) that you can consider. This UDF does not care if there are no spaces separating the date from the header text being searched for nor does it care if the date has spaces within its date parts nor does it care if extra text appears between the date and header text so long as that extra text has no digits in it (I am thinking of things like parentheses commas, colons, etc. next to the date). In other words, this function should work no matter what the text looks like.
Code:
[table="width: 500"]
[tr]
	[td]Public Function GetDate(ByVal MyText As String, ByVal MyHdr As String) As String
  Dim X As Long, Z As Long
  MyText = Replace(UCase(MyText), " ", "")
  MyHdr = Replace(UCase(MyHdr), " ", "")
  For X = InStr(MyText, MyHdr) To 1 Step -1
    If Mid(MyText, X, 1) Like "#" Then
      For Z = X To 1 Step -1
        If Mid("x" & MyText, Z, 1) Like "[!0-9/]" Then
          GetDate = CDate(Mid(MyText, Z, X - Z + 1))
          Exit Function
        End If
      Next
    End If
  Next
End Function[/td]
[/tr]
[/table]
 
Last edited:
Upvote 0

Forum statistics

Threads
1,214,861
Messages
6,121,969
Members
449,059
Latest member
oculus

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top