How to extract a code from a sentence?

olympiac

Board Regular
Joined
Sep 26, 2010
Messages
158
How to extract a code from a sentence?
I have got a text narrative from which I need to extract a code made up by 4 letters in Capital letter.
The code can be at the beginning, in the middle or at the end of the sentence but it's always separated from the other words.
Example with the code ADBC into the sentence:
long sentence here ABCD sentence continues
long sentence here ABCD
ABCD long sentence here
I have got 50 different codes and more than 10,000 rows from which I need to extract the code and insert it into a new column.
Is there a way to find a formula that can do this work?
Thanks
 

Excel Facts

Format cells as time
Select range and press Ctrl+Shift+2 to format cells as time. (Shift 2 is the @ sign).
UDF. Fill other cells with this function.
Code:
[COLOR="Blue"]Function[/COLOR] ExtractCapitals(Str [COLOR="Blue"]As[/COLOR] [COLOR="Blue"]String[/COLOR]) [COLOR="Blue"]As[/COLOR] [COLOR="Blue"]String[/COLOR]
    [COLOR="Blue"]With[/COLOR] CreateObject("VBScript.RegExp")
        .Pattern = "[A-Z]+"
        ExtractCapitals = .Execute(Str)(0)
    [COLOR="Blue"]End[/COLOR] [COLOR="Blue"]With[/COLOR]
[COLOR="Blue"]End[/COLOR] [COLOR="Blue"]Function[/COLOR]
 
Upvote 0
Thanks for your answer.
I understand that I need to write the code in a module. What line am I suppose to add in the VBA module to fill in "column W" with the code extracted?
 
Upvote 0
After inserting this function into standard module, go to worksheet and type into a cell (supposing sentence with code is in A1 cell): =ExtractCapitals(A1)
Then you can fill it down.
Here's addition to function (for recalculation):
Code:
[COLOR="Blue"]Function[/COLOR] ExtractCapitals(Str [COLOR="Blue"]As[/COLOR] [COLOR="Blue"]String[/COLOR]) [COLOR="Blue"]As[/COLOR] [COLOR="Blue"]String[/COLOR]
    [B][COLOR="Red"]Application.Volatile[/COLOR][/B]
    [COLOR="Blue"]With[/COLOR] CreateObject("VBScript.RegExp")
        .Pattern = "[A-Z]+"
        ExtractCapitals = .Execute(Str)(0)
    [COLOR="Blue"]End[/COLOR] [COLOR="Blue"]With[/COLOR]
[COLOR="Blue"]End[/COLOR] [COLOR="Blue"]Function[/COLOR]
 
Upvote 0
Question 1:
There are some words that are in capital into the sentence and the function is restituing those words as well. Is there a way to get as a result only the existing code? I have got 50 different codes, if the code "ABCD" is found amongst the text in the sentence in cell "A1" then the result in cell "B1" will be "ABCD"; if not, the result should be "Code missing".
Is there a way to get it?
Question 2:
What about if I want to search the code only within the last 10 words of the sentence?
 
Upvote 0
I'm not very au fait with regular expressions but was playing with this and found that Sektor's answer would also return single capitals (I accidentally included
Example with the code ADBC into the sentence:
when trying it out)
This resulted in just E being returned.

Some research resulted in a possible tweak, changing:
.Pattern = "[A-Z]+"
to:
.Pattern = "[A-Z]{4,}"
which I think ensures there are at least 4 caps in a row.
 
Upvote 0
Good catch, p45scal! Yes, I didn't notice that the first letter is capital, so {4,} will do the work. :)

Code:
[COLOR="Blue"]Function[/COLOR] ExtractCapitals(Str [COLOR="Blue"]As[/COLOR] [COLOR="Blue"]String[/COLOR]) [COLOR="Blue"]As[/COLOR] [COLOR="Blue"]String[/COLOR]
    Application.Volatile
    [COLOR="Blue"]With[/COLOR] CreateObject("VBScript.RegExp")
        .Pattern = "[A-Z]{4,}"
        ExtractCapitals = .Execute(Str)(0)
    [COLOR="Blue"]End[/COLOR] [COLOR="Blue"]With[/COLOR]
[COLOR="Blue"]End[/COLOR] [COLOR="Blue"]Function[/COLOR]
 
Upvote 0
Question 1:
There are some words that are in capital into the sentence and the function is restituing those words as well. Is there a way to get as a result only the existing code? I have got 50 different codes, if the code "ABCD" is found amongst the text in the sentence in cell "A1" then the result in cell "B1" will be "ABCD"; if not, the result should be "Code missing".
Is there a way to get it?
Question 2:
What about if I want to search the code only within the last 10 words of the sentence?
Ha! I pre-empted your query!
try:
Code:
Function ExtractCapitals(Str As String) As String
On Error GoTo handler
    Application.Volatile
    With CreateObject("VBScript.RegExp")
       ' .Pattern = "[A-Z]+"
        .Pattern = "[A-Z]{4,}"
        ExtractCapitals = .Execute(Str)(0)
    End With
    Exit Function
handler:
    ExtractCapitals = "Code missing"
End Function
 
Upvote 0
I have tried it with to codes:

.Pattern = "[ASTP]{4,}"
.Pattern = "[ASTT]{4,}"

The function is returning the code SAAT that is included in the word INSAAT as well
Any clue?
If it can help, 90% of the codes are within the last wors in the sentence. Maybe we could narrow th e search at the end.
 
Upvote 0
.Pattern = "[ASTT]{4,}"
says: find me a match of any combination of the letters ASTT which is exactly 4 characters long
It would be identical to saying
.Pattern = "[AST]{4,}"

If you want to find exact matches, change it simply to
.Pattern = "ASTT"
You can also include spaces, like this:
.Pattern = " ASTT "
to avoid matching e.g. PLASTTER

From your initial question, it looked like you wanted to find all 4-letter allcaps words, not exact matches...
 
Upvote 0

Forum statistics

Threads
1,224,594
Messages
6,179,792
Members
452,942
Latest member
VijayNewtoExcel

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top