Extracting specific text from a thread

MajorTom

New Member
Joined
Feb 2, 2015
Messages
13
Office Version
  1. 365
Platform
  1. Windows
Hi, I'm really hoping someone could assist with a method to deal with an issue I'm having with some text extraction.

I have a column of data and need to extract the MOT information. This could be "Exempt" or a date in the format of "Month Year" and this is the only part of the full text string I want to return to another column.

The data can vary and I have no control over this - this is a sample of rows of 4 differing formats I have had so far:

Registration No: AN01 ABC Tax: N/A MOT: Exempt • Formerly owned by 2 people • Original signage • History is unknown • Many receipts etc •
Registration No: BR07 NIP Tax: 3 months • MOT: December 2021 • Current ownership since 2009 • Refurbished •
Registration No: BTU 218X Tax: 6 Months • MOT: July 2022 Much restored with a lot of history
Registration No: LUV 197T Tax: N/A CC: 175cc MOT: Exempt Motorcycle • Restored • Lightweight

I have had some success with the following formula: =IF(TRIM(SUBSTITUTE(MID(A2,SEARCH("MOT: ",A2)+LEN("MOT: "),9),"•"," "))="Exempt","Exempt",TRIM(SUBSTITUTE(MID(A2,SEARCH("MOT: ",A2)+LEN("MOT: "),9),"•"," "))) but the problem I'm having is that there is not always a bullet point to use in counting my number of characters for the MID formula and using a fixed number of characters misses the end of the longer months or gives me too many if the the status is exempt.

Any ideas or a different approach would be appreciated.

Thank you.
 

Excel Facts

Remove leading & trailing spaces
Save as CSV to remove all leading and trailing spaces. It is faster than using TRIM().
This seems to work.

MOT
AB
1Registration No: AN01 ABC Tax: N/A MOT: Exempt • Formerly owned by 2 people • Original signage • History is unknown • Many receipts etc •Exempt
2Registration No: BR07 NIP Tax: 3 months • MOT: December 2021 • Current ownership since 2009 • Refurbished •December 2021
3Registration No: BTU 218X Tax: 6 Months • MOT: July 2022 Much restored with a lot of historyJuly 2022 Much restored with a lot of history
4Registration No: LUV 197T Tax: N/A CC: 175cc MOT: Exempt Motorcycle • Restored • LightweightExempt Motorcycle
Sheet2
Cell Formulas
RangeFormula
B1:B4B1=mot(A1)


VBA Code:
Function MOT(s As String)
With CreateObject("VBScript.RegExp")
    .Pattern = "MOT: ((?:\w+\s)+(•|.+))"
    MOT = Replace(.Execute(s)(0).submatches(0), " •", vbNullString)
End With
End Function
 
Upvote 0
Try this formula, its the result of many small formulas pasted together.
Excel Formula:
= IF( MID( A1, SEARCH( "MOT:",A1 )+5, 6 )="Exempt", "Exempt", LEFT(RIGHT( A1, LEN(A1)-SEARCH( "MOT:",A1 )-4), SEARCH( " ", RIGHT( A1, LEN(A1)-SEARCH( "MOT:",A1 )-4) )+4) )
_______________________________________
EDIT:
What it does:
1. Search, if there is "MOT:" in the text and if that's true it checks if there is "Exempt" after "MOT:" followed by a space
2. If there's no "MOT:", it assumes that there's a MONTH followed by a SPACE Character and a YEAR (which must be 4 characters in length). So it looks for the second SPACE Character and extracts the month and the year.

Keep in mind that this formula only works with text that is strictly as you described, no exclusions allowed.

If you want it a bit more flexible, you should consider a user defined formula (VBA) that uses Regular Expressions (RegEx) or something similar.

Let me know if it works for you.
 
Last edited:
Upvote 0
Power Query Solution
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Extracted Text Between Delimiters" = Table.TransformColumns(Source, {{"Column1", each Text.BetweenDelimiters(_, "MOT:", "•"), type text}})
in
    #"Extracted Text Between Delimiters"
 
Upvote 0
Is this the result you are expecting
Book1
AB
1Original TextExtracted MOT Text
2Registration No: AN01 ABC Tax: N/A MOT: Exempt • Formerly owned by 2 people • Original signage • History is unknown • Many receipts etc •MOT: Exempt
3Registration No: BR07 NIP Tax: 3 months • MOT: December 2021 • Current ownership since 2009 • Refurbished •MOT: December 2021
4Registration No: BTU 218X Tax: 6 Months • MOT: July 2022 Much restored with a lot of historyMOT: July 2022 Much restored with a lot of history
5Registration No: LUV 197T Tax: N/A CC: 175cc MOT: Exempt Motorcycle • Restored • LightweightMOT: Exempt Motorcycle
Sheet1
Cell Formulas
RangeFormula
B2:B5B2=MID(A2,SEARCH("MOT:",A2),IFERROR(SEARCH("•",MID(A2,SEARCH("MOT:",A2),LEN(A2))),LEN(A2))-2)
 
Upvote 0
Try this formula, its the result of many small formulas pasted together.
Excel Formula:
= IF( MID( A1, SEARCH( "MOT:",A1 )+5, 6 )="Exempt", "Exempt", LEFT(RIGHT( A1, LEN(A1)-SEARCH( "MOT:",A1 )-4), SEARCH( " ", RIGHT( A1, LEN(A1)-SEARCH( "MOT:",A1 )-4) )+4) )
_______________________________________
EDIT:
What it does:
1. Search, if there is "MOT:" in the text and if that's true it checks if there is "Exempt" after "MOT:" followed by a space
2. If there's no "MOT:", it assumes that there's a MONTH followed by a SPACE Character and a YEAR (which must be 4 characters in length). So it looks for the second SPACE Character and extracts the month and the year.

Keep in mind that this formula only works with text that is strictly as you described, no exclusions allowed.

If you want it a bit more flexible, you should consider a user defined formula (VBA) that uses Regular Expressions (RegEx) or something similar.

Let me know if it works for you.
I was thinking use of a regular expression would make this very easy!!
 
Upvote 0
This seems to work.

MOT
AB
1Registration No: AN01 ABC Tax: N/A MOT: Exempt • Formerly owned by 2 people • Original signage • History is unknown • Many receipts etc •Exempt
2Registration No: BR07 NIP Tax: 3 months • MOT: December 2021 • Current ownership since 2009 • Refurbished •December 2021
3Registration No: BTU 218X Tax: 6 Months • MOT: July 2022 Much restored with a lot of historyJuly 2022 Much restored with a lot of history
4Registration No: LUV 197T Tax: N/A CC: 175cc MOT: Exempt Motorcycle • Restored • LightweightExempt Motorcycle
Sheet2
Cell Formulas
RangeFormula
B1:B4B1=mot(A1)


VBA Code:
Function MOT(s As String)
With CreateObject("VBScript.RegExp")
    .Pattern = "MOT: ((?:\w+\s)+(•|.+))"
    MOT = Replace(.Execute(s)(0).submatches(0), " •", vbNullString)
End With
End Function

Thank you. I'll try this out.
I was hoping to solve with formula but I'll definitely try this as an alternative.
 
Upvote 0
Try this formula, its the result of many small formulas pasted together.
Excel Formula:
= IF( MID( A1, SEARCH( "MOT:",A1 )+5, 6 )="Exempt", "Exempt", LEFT(RIGHT( A1, LEN(A1)-SEARCH( "MOT:",A1 )-4), SEARCH( " ", RIGHT( A1, LEN(A1)-SEARCH( "MOT:",A1 )-4) )+4) )
_______________________________________
EDIT:
What it does:
1. Search, if there is "MOT:" in the text and if that's true it checks if there is "Exempt" after "MOT:" followed by a space
2. If there's no "MOT:", it assumes that there's a MONTH followed by a SPACE Character and a YEAR (which must be 4 characters in length). So it looks for the second SPACE Character and extracts the month and the year.

Keep in mind that this formula only works with text that is strictly as you described, no exclusions allowed.

If you want it a bit more flexible, you should consider a user defined formula (VBA) that uses Regular Expressions (RegEx) or something similar.

Let me know if it works for you.

Thank you. This seems to work I think I covered all the variations in the text in my OP sample but will test it on a larger data sample to confirm.
 
Upvote 0
Power Query Solution
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Extracted Text Between Delimiters" = Table.TransformColumns(Source, {{"Column1", each Text.BetweenDelimiters(_, "MOT:", "•"), type text}})
in
    #"Extracted Text Between Delimiters"

Thank you.
I am hoping to solve the problem with formula but will look at this as an option.
 
Upvote 0
Is this the result you are expecting
Book1
AB
1Original TextExtracted MOT Text
2Registration No: AN01 ABC Tax: N/A MOT: Exempt • Formerly owned by 2 people • Original signage • History is unknown • Many receipts etc •MOT: Exempt
3Registration No: BR07 NIP Tax: 3 months • MOT: December 2021 • Current ownership since 2009 • Refurbished •MOT: December 2021
4Registration No: BTU 218X Tax: 6 Months • MOT: July 2022 Much restored with a lot of historyMOT: July 2022 Much restored with a lot of history
5Registration No: LUV 197T Tax: N/A CC: 175cc MOT: Exempt Motorcycle • Restored • LightweightMOT: Exempt Motorcycle
Sheet1
Cell Formulas
RangeFormula
B2:B5B2=MID(A2,SEARCH("MOT:",A2),IFERROR(SEARCH("•",MID(A2,SEARCH("MOT:",A2),LEN(A2))),LEN(A2))-2)

Thank you for the reply. Almost what I'm after but I would need to tidy the results in cell B4 to just "July 2022" and B5 to just "Exempt".

I have not used regular expression before - would that be using VBA?
 
Upvote 0

Forum statistics

Threads
1,215,094
Messages
6,123,071
Members
449,092
Latest member
ipruravindra

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top