Extracting the words around a word or a collection of words

pratheesh1983

Board Regular
Joined
Aug 13, 2015
Messages
55
Office Version
  1. 365
Platform
  1. Windows
I am looking for a formula that will help me extract one word before and two word after a specific word or a collection of words (Mentioned in HIGHLIGHT_TEXT column) . This is basically to check the surrounding text value of a matched text that created a wrong alert in our software. I have provided the sample text below for your reference.

HIGHLIGHT_TEXTSOFFSETEOFFSETHIGHLIGHT_TEXT_LINESurrounding Text Value (Formula Column)
ALUMINIUM
7​
15​
82
EMPTY ALUMINIUM CANS WITH 202 ENDS
(SOT) ,TIGER BLACK CANS,202 ENDS
(SOT) SILVER
EMPTY ALUMINIUM CANS
VALVE REMOTE
CONTROL
13​
33​
82
ONE SHIPSET VALVE REMOTE
CONTROL SYSTEM& TANK GAUGING
SYSTEM hs code- 90261029
SHIPSET VALVE REMOTE
CONTROL SYSTEM
NARD
103​
106​
110
CB Tilbering FORMTE 005555AOT5900200893654

wwww wwww wwww wwww wwww wwww wwww wwww wwww

xxxTPxxx
IN BARD /LB
IN BARD /LB
  • SOFFSET is the start position of the highlighted text in the complete message text.
  • EOFFSET is the end position of the highlighted text in the complete message text.

The column "surrounding the text value" is where the formula needs to be put to extract the value, and while extracting the value, one should not consider if one word preceding or two words following the highlighted text contain either "wwww" or "xxx" as this is not relevant.
So I am looking for a formula that could help me extract information from column HIGHLIGHT_TEXT_LINE," one word before and two words after a specific word or a collection of words (mentioned in the HIGHLIGHT_TEXT column) using SOFFSET or EOFFSET value and not considering 'wwww" or "xxx" as mentioned above. Would really appreciate if some one can help me in this? Thanks
 

Excel Facts

Remove leading & trailing spaces
Save as CSV to remove all leading and trailing spaces. It is faster than using TRIM().
I assume that NARD is supposed to be BARD?

one word before and two word after
Your sample, assuming the expected results are shown, shows one word before and one word after. Please clarify.

  • SOFFSET is the start position of the highlighted text in the complete message text.
  • EOFFSET is the end position of the highlighted text in the complete message text.
How are they calculated? Here are my calculations in columns C & E

pratheesh1983.xlsm
ABCDEF
1HIGHLIGHT_TEXTSOFFSETEOFFSETHIGHLIGHT_TEXT_LINE
2ALUMINIUM710151882 EMPTY ALUMINIUM CANS WITH 202 ENDS (SOT) ,TIGER BLACK CANS,202 ENDS (SOT) SILVER
3VALVE REMOTE CONTROL13#VALUE!33#VALUE!82 ONE SHIPSET VALVE REMOTE CONTROL SYSTEM& TANK GAUGING SYSTEM hs code- 90261029
4BARD103107106110110 CB Tilbering FORMTE 005555AOT5900200893654 wwww wwww wwww wwww wwww wwww wwww wwww wwww xxxTPxxx IN BARD /LB
Sheet1
Cell Formulas
RangeFormula
C2:C4C2=FIND(A2,F2)
E2:E4E2=FIND(A2,F2)+LEN(A2)-1
 
Upvote 0
I assume that NARD is supposed to be BARD?
Yes, that's correct it should be 'BARD'
Your sample, assuming the expected results are shown, shows one word before and one word after. Please clarify.
The intended formula we need to put in the column with the header "Surrounding Text Value" and the intended result for the sample I have provided as well for reference Right now, I am doing it manually, so I am looking for any option in Excel that will help pull this information.
How are they calculated? Here are my calculations in columns C & E

pratheesh1983.xlsm
ABCDEF
1HIGHLIGHT_TEXTSOFFSETEOFFSETHIGHLIGHT_TEXT_LINE
2ALUMINIUM710151882 EMPTY ALUMINIUM CANS WITH 202 ENDS (SOT) ,TIGER BLACK CANS,202 ENDS (SOT) SILVER
3VALVE REMOTE CONTROL13#VALUE!33#VALUE!82 ONE SHIPSET VALVE REMOTE CONTROL SYSTEM& TANK GAUGING SYSTEM hs code- 90261029
4BARD103107106110110 CB Tilbering FORMTE 005555AOT5900200893654 wwww wwww wwww wwww wwww wwww wwww wwww wwww xxxTPxxx IN BARD /LB
Sheet1
Cell Formulas
RangeFormula
C2:C4C2=FIND(A2,F2)
E2:E4E2=FIND(A2,F2)+LEN(A2)-1
 
Upvote 0
Try this then

pratheesh1983.xlsm
ADE
1HIGHLIGHT_TEXTHIGHLIGHT_TEXT_LINESurrounding Text Value (Formula Column)
2ALUMINIUM82 EMPTY ALUMINIUM CANS WITH 202 ENDS (SOT) ,TIGER BLACK CANS,202 ENDS (SOT) SILVEREMPTY ALUMINIUM CANS
3VALVE REMOTE CONTROL82 ONE SHIPSET VALVE REMOTE CONTROL SYSTEM& TANK GAUGING SYSTEM hs code- 90261029SHIPSET VALVE REMOTE CONTROL SYSTEM&
4BARD110 CB Tilbering FORMTE 005555AOT5900200893654 wwww wwww wwww wwww wwww wwww wwww wwww wwww xxxTPxxx IN BARD /LBIN BARD /LB
Sheet3
Cell Formulas
RangeFormula
E2:E4E2=LET(s," "&SUBSTITUTE(D2,CHAR(10)," ")&" ",TEXTJOIN(" ",,TAKE(TEXTSPLIT(TRIM(LEFT(s,FIND(" ",s,FIND(A2,s)+LEN(A2)+1)))," "),,-(ROWS(TEXTSPLIT(A2,," "))+2))))
 
Upvote 0
Try this then

pratheesh1983.xlsm
ADE
1HIGHLIGHT_TEXTHIGHLIGHT_TEXT_LINESurrounding Text Value (Formula Column)
2ALUMINIUM82 EMPTY ALUMINIUM CANS WITH 202 ENDS (SOT) ,TIGER BLACK CANS,202 ENDS (SOT) SILVEREMPTY ALUMINIUM CANS
3VALVE REMOTE CONTROL82 ONE SHIPSET VALVE REMOTE CONTROL SYSTEM& TANK GAUGING SYSTEM hs code- 90261029SHIPSET VALVE REMOTE CONTROL SYSTEM&
4BARD110 CB Tilbering FORMTE 005555AOT5900200893654 wwww wwww wwww wwww wwww wwww wwww wwww wwww xxxTPxxx IN BARD /LBIN BARD /LB
Sheet3
Cell Formulas
RangeFormula
E2:E4E2=LET(s," "&SUBSTITUTE(D2,CHAR(10)," ")&" ",TEXTJOIN(" ",,TAKE(TEXTSPLIT(TRIM(LEFT(s,FIND(" ",s,FIND(A2,s)+LEN(A2)+1)))," "),,-(ROWS(TEXTSPLIT(A2,," "))+2))))
Thanks for your response.
But when I used the formula that you shared, I noticed that if there is any line break or the word is concatenated, it throws up an error. Will you be able to help here?



HIGHLIGHT_TEXTHIGHLIGHT_TEXT_LINESURRONDING TEXT VALUE
SEAA1Nhxva Shbva Sea
Diso bornLae
#VALUE!​
ERSntral Beadquat
ers (QRS), NA, Ge
#VALUE!​
 
Upvote 0
I noticed that if there is any line break or the word is concatenated, it throws up an error.
It is not related to line breaks or concatenation, it is related to upper/lower case letters.
In your original sample everything was in upper case, now it is not.

Try this instead

pratheesh1983.xlsm
ADE
1HIGHLIGHT_TEXTHIGHLIGHT_TEXT_LINESURRONDING TEXT VALUE
2SEAA1Nhxva Shbva Sea Diso bornLaeShbva Sea Diso
3ERSntral Beadquat ers (QRS), NA, GeBeadquat ers (QRS),
Sheet2
Cell Formulas
RangeFormula
E2:E3E2=LET(s," "&SUBSTITUTE(D2,CHAR(10)," ")&" ",TEXTJOIN(" ",,TAKE(TEXTSPLIT(TRIM(LEFT(s,FIND(" ",s,SEARCH(A2,s)+LEN(A2)+1)))," "),,-(ROWS(TEXTSPLIT(A2,," "))+2))))
 
Upvote 0

Forum statistics

Threads
1,215,069
Messages
6,122,959
Members
449,096
Latest member
Anshu121

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top