Extract mixed strings of capitals and numbers from text

Lolcc

New Member
Joined
Jan 9, 2021
Messages
5
Office Version
  1. 2010
Platform
  1. Windows
Hi, I'm trying to remove mixed strings of capitals and dates / numbers from other text. I have a lot of data showing information in the following format:

GLENKINCHIE 1986 DISTILLERS EDITION Limited edition single Lowland malt whisky. Special release G/273-7-D. 1L, 43% volume, in carton. DALWHINNIE 1988 DISTILLERS EDITION Single Highland malt whisky, bottled 2003. Special release D.SK.312. 70cl, 43% volume, in carton. DALWHINNIE 15 YEAR OLD CENTENARY EDITION Single Highland malt whisky. 70cl, 43% volume. In carton. 3 bottles

For the example shown, I would want to extract GLENKINCHIE 1986 DISTILLERS EDITION, DALWHINNIE 1988 DISTILLERS EDITION, DALWHINNIE 15 YEAR OLD CENTENARY EDITION into the same cell.

Is this at all possible to do? I had read a great forum (#684934) where Rick Rothstein provided a UDF for someone requesting to remove just capitals which worked really well but missed the numbers. Someone had asked for an example to also extract numbers as well though the amended UDF did not work when I applied this in VB.

Any ideas for what might make this work. I appreciate within the text it could be either a date (e.g. 1986) or a # YEAR (e.g. 15 YEAR). I'm new to VB though loving what it can do with a basic application.

Thanks Lolcc
 

Excel Facts

What is the shortcut key for Format Selection?
Ctrl+1 (the number one) will open the Format dialog for whatever is selected.
Here is some code that extract capitals , number, commas and spaces from a string in column A and puts the result in column B, you very easily add more characters to the "if" statement, if you need to
VBA Code:
Sub test()
Dim txt As String
'ascii 48 to 57 numbers and 65to 90 Capitals
'ascii  44 is a comma and ascii 32 is a space
lastrow = Cells(Rows.Count, "A").End(xlUp).Row
inarr = Range(Cells(1, 1), Cells(lastrow, 1))
outarr = Range(Cells(1, 2), Cells(lastrow, 2))
For i = 1 To lastrow
txt = inarr(i, 1)
tl = Len(txt)
tb = ""
For j = 1 To tl
 tc = Mid(txt, j, 1)
  ta = Asc(tc)
  If (ta >= 48 And ta <= 57) Or (ta >= 65 And ta <= 90) Or ta = 44 Or ta = 32 Then
   tb = tb & tc
  End If
Next j
outarr(i, 1) = tb
Next i
Range(Cells(1, 2), Cells(lastrow, 2)) = outarr

End Sub
 
Upvote 0
Welcome to the MrExcel board!

One example is not much to go on as it does not display any variety that may occur in your data. :)
However, you could try this UDF.

VBA Code:
Function ExtractCapNum(s As String) As String
  Dim RX As Object, M As Object
 
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = "( |^)[A-Z][A-Z \d]+\b"
  For Each M In RX.Execute(s)
    ExtractCapNum = ExtractCapNum & ", " & Trim(M)
  Next M
  ExtractCapNum = Mid(ExtractCapNum, 3)
End Function

Lolcc.xlsm
AB
1GLENKINCHIE 1986 DISTILLERS EDITION Limited edition single Lowland malt whisky. Special release G/273-7-D. 1L, 43% volume, in carton. DALWHINNIE 1988 DISTILLERS EDITION Single Highland malt whisky, bottled 2003. Special release D.SK.312. 70cl, 43% volume, in carton. DALWHINNIE 15 YEAR OLD CENTENARY EDITION Single Highland malt whisky. 70cl, 43% volume. In carton. 3 bottlesGLENKINCHIE 1986 DISTILLERS EDITION, DALWHINNIE 1988 DISTILLERS EDITION, DALWHINNIE 15 YEAR OLD CENTENARY EDITION
Sheet1
Cell Formulas
RangeFormula
B1B1=ExtractCapNum(A1)
 
Upvote 0
Thank you offthelip and Peter_SSs.

I've used both of your suggested resolutions and the UDF appears to work almost perfectly across 16000 rows with only 5 anomalies where some of the cells had joined capitals and numbers and or lower case - these were easily manually amended. e.g. I just wanted ARDBOG but it didn't pull anything.
ARDBOGActive. Port Ellen, Islay.NCF. 70cl, 52.1% volume, in carton.

When I ran the VB code it also brought back other capitals and numbers in the cell text though did extract the correct words as well as per below:

GLENKINCHIE 1986 DISTILLERS EDITION L L S G2737D 1L, 43 , DALWHINNIE 1988 DISTILLERS EDITION S H , 2003 S DSK312 70, 43 , DALWHINNIE 15 YEAR OLD CENTENARY EDITION S H 70, 43 I 3

Thank you very much to both of you for your help - I'll be sure to provide a few more examples next time as well as suggested.

Lolcc
 
Upvote 0
5 anomalies where some of the cells had joined capitals and numbers and or lower case - these were easily manually amended. e.g. I just wanted ARDBOG but it didn't pull anything.
ARDBOGActive. Port Ellen, Islay.NCF. 70cl, 52.1% volume, in carton.
I am not certain that this won't introduce other anomalies but you could try changing the Pattern line to

VBA Code:
RX.Pattern = "( |^)[A-Z][A-Z \d]+(?![^A-Z \d])"

Lolcc.xlsm
AB
1GLENKINCHIE 1986 DISTILLERS EDITION Limited edition single Lowland malt whisky. Special release G/273-7-D. 1L, 43% volume, in carton. DALWHINNIE 1988 DISTILLERS EDITION Single Highland malt whisky, bottled 2003. Special release D.SK.312. 70cl, 43% volume, in carton. DALWHINNIE 15 YEAR OLD CENTENARY EDITION Single Highland malt whisky. 70cl, 43% volume. In carton. 3 bottlesGLENKINCHIE 1986 DISTILLERS EDITION, DALWHINNIE 1988 DISTILLERS EDITION, DALWHINNIE 15 YEAR OLD CENTENARY EDITION
2ARDBOGActive. Port Ellen, Islay.NCF. 70cl, 52.1% volume, in carton.ARDBOG
Sheet1
Cell Formulas
RangeFormula
B1:B2B1=ExtractCapNum(A1)
 
Upvote 0
Perfect. Thank you Peter_SSs - it now recognises where the CAPS run into a new word separating it nicely - such a huge timesaver.
 
Upvote 0
You're welcome. Thanks for the follow-up. :)
 
Upvote 0
Hi again,

Sorry, I may have been too hasty - I've tried the UDF on a new data set and there are a few more anomalies which have been cropping up. It seems to be skipping the YEARS following a number and this data set now has apostrophes, ampersands and hyphens. Are there some recommended amendments to the UDF that could help with these?

The following are examples of some that are pulling through incorrectly or returning blanks:
Data SetWhat is being pulledDesired Pull
ALISTAIR CUNNINGHAM'S 50 YEARSBlended Scotch WhiskyA blend of 50 malt whiskies and Dumbarton grain whisky to commemorate Alistair Cunningham's 50 years career with the Hiram Walker group. 75cl, 40% volume, in carton.ALISTAIR CUNNINGHAMALISTAIR CUNNINGHAM'S 50 YEARS
CRAWFORD'S 3 STARBlended Scotch Whisky75cl, 40% volume.OLD SMUGGLER ONE LITREBlended Scotch WhiskyOne litre, 43% volume.GLENDROSTANBlended Scotch Whisky70cl, 40% volume.Three bottles.CRAWFORD, SMUGGLER ONECRAWFORD'S 3 STAR, OLD SMUGGLER ONE LITRE, GLENDROSTAN
GORDON HIGHLANDERSBlended Scotch Whisky70cl, 40% volume, in carton.GRANTS FAMILY RESERVE - ONE LITREBlended Scotch WhiskyOne litre, 43% volume.BELL'S AGED 8 YEARS MINIATUREBlended Scotch Whisky5cl, 40% volume, in tube with branded glass.Two bottles & one miniature.GORDON, FAMILY RESERVE, ONE, AGED 8 YEARSGORDON HIGHLANDERS, GRANTS FAMILY RESERVE - ONE LITRE, BELL'S AGED 8 YEARS MINIATURE
WHYTE & MACKAY 21 YEAR OLDBlended Scotch Whisky70cl, 43% volume, in box.WHYTE, MACKAY 21 YEARWHYTE & MACKAY 21 YEAR OLD
CUTTY SARK ART DECO AGED 33 YEARSBlended Scotch Whisky.700ml, 41.7% volume, in presentation box with two branded glasses and outer packaging.CUTTY SARK ART DECO AGED 33CUTTY SARK ART DECO AGED 33 YEARS
JOHNNIE WALKER KILMARNOCK 400 AGED 15 YEARSBlended Scotch Whisky.Bottled to celebrate the 400th anniversary of the granting of burgh status to Kilmarnock.JOHNNIE WALKER KILMARNOCK 400 AGED 15JOHNNIE WALKER KILMARNOCK 400 AGED 15 YEARS
FLOWER OF SCOTLAND ROYAL CELEBRATIONBlended Scotch WhiskyBottle no. 123 of 741. 70cl, 40% volume, in wooden presentation box.FLOWER OF SCOTLAND ROYALFLOWER OF SCOTLAND ROYAL CELEBRATION
WALLACE SINGLE MALT LIQUEUR 500MLSingle Malt Scotch Whisky Liquer500ml, 35% volume.WALLACE SINGLE MALT LIQUEURWALLACE SINGLE MALT LIQUEUR 500ML
JACK DANIELS ONE LITRETennessee Sour Mash Whiskey, USA.One litre, 43% volume, in carton.JACK DANIELS ONE, USAJACK DANIELS ONE LITRE, USA
CUTTY SARK 25 YEARS OLDBlended Scotch Whisky700ml, 45.7% volume, in box.CUTTY SARK 25 YEARSCUTTY SARK 25 YEARS OLD
KING'S RANSOM AGED 12 YEARSBlended Scotch Whisky26 2/3 fl.oz, 75° proof.KING, RANSOM AGED 12KING'S RANSOM AGED 12 YEARS
OLD GLASGOWBlended Scotch Whisky75cl, 40% volume, in ceramic flagon in boxOLDOLD GLASGOW
OLD MULLBlended Scotch WhiskyNo capacity or strength stated, in ceramic flagonOLDOLD MULL
MACKINLAY'S LEGACY 12 YEARS OLDBlended Scotch Whisky75cl, 40% volume.MACKINLAY, LEGACY 12 YEARSMACKINLAY'S LEGACY 12 YEARS OLD
GLAYVAScotch Whisky Liqueur.50cl, 35% volume, in tin, also includes a Drambuie Prince Charlie figure.GLAYVA
J&B ULTIMABlended Scotch WhiskyA blend of 128 Scotch whiskies, bottle no. 401. 70cl, 43% volume, in carton.J&B ULTIMA
CELTICBlended Scotch WhiskyBottled specially for the final Old Firm match of the century on 27 December 1999, bottle no. 439 of 600.. 70cl, 40% volume.CELTIC
TRIUMPHBlended Malt Scotch WhiskyA blend of 27 malt whiskies, bottle no. 792. 70cl, 43% volume.TRIUMPH
100 PIPERSBlended Scotch Whisky75cl, 40% volume.BLACK WATCHBlended Scotch Whisky70cl, 40% volume.Two bottles.100 PIPERS, BLACK WATCH
USQUAEBACHRESERVEBlended Scotch Whisky750ml, 43% volume.USQUAEBACHRESERVE

Thanks
Lolcc
 
Upvote 0
a few more anomalies which have been cropping up.
I have been expecting this post since the beginning. ;)

Using Excel to process text is not straight-forward. Punctuation is one of the significant stumbling blocks.
Another is that you are familiar with the text/words that you want to extract but Excel is not. If we want Excel to extract the same thing, we have to give it a logical set of rules as to what we are looking for.

Take this example of 3 parts:
  • In one of your samples is the text "40% volume.OLD SMUGGLER ONE LITREBlended". In this example you want to treat the "." before OLD effectively as a space so that we then have the upper case word OLD starting the text to be extracted.
  • In another sample is the text "Whiskey, USA.One litre, 43% volume". In this example you want to treat the "." after USA effectively as a space so that we then have the upper case word USA ending the text to be extracted.
  • In another sample there is "Islay.NCF. 70cl". But here you don't want to treat either the "." before or after the upper case text NCF as spaces making that 'word' to be extracted.
Or in relation to numbers followed by a space and upper case letter
  • "1986 D" is part of what should be extracted from the original sample but
  • "27 D" is not part of the extraction from the 4th last sample in post 8.

The upshot is that I have not been able to come up with anything that produces all your desired results for the samples given. This version matches the most I could manage. Ones that I can see are incorrect are highlighted.

VBA Code:
Function ExtractCapNum(s As String) As String
  Dim RX As Object, M As Object

  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = "( |^)[A-Z][A-Z '&\-\d]+(?![^A-Z '&\-\d])"
  For Each M In RX.Execute(Replace(s, ".", " . "))
    ExtractCapNum = ExtractCapNum & ", " & Trim(M)
  Next M
  ExtractCapNum = Mid(ExtractCapNum, 3)
End Function

Lolcc.xlsm
AB
1GLENKINCHIE 1986 DISTILLERS EDITION Limited edition single Lowland malt whisky. Special release G/273-7-D. 1L, 43% volume, in carton. DALWHINNIE 1988 DISTILLERS EDITION Single Highland malt whisky, bottled 2003. Special release D.SK.312. 70cl, 43% volume, in carton. DALWHINNIE 15 YEAR OLD CENTENARY EDITION Single Highland malt whisky. 70cl, 43% volume. In carton. 3 bottlesGLENKINCHIE 1986 DISTILLERS EDITION, DALWHINNIE 1988 DISTILLERS EDITION, SK, DALWHINNIE 15 YEAR OLD CENTENARY EDITION
2ARDBOGActive. Port Ellen, Islay.NCF. 70cl, 52.1% volume, in carton.ARDBOG, NCF
3ALISTAIR CUNNINGHAM'S 50 YEARSBlended Scotch WhiskyA blend of 50 malt whiskies and Dumbarton grain whisky to commemorate Alistair Cunningham's 50 years career with the Hiram Walker group. 75cl, 40% volume, in carton.ALISTAIR CUNNINGHAM'S 50 YEARS
4CRAWFORD'S 3 STARBlended Scotch Whisky75cl, 40% volume.OLD SMUGGLER ONE LITREBlended Scotch WhiskyOne litre, 43% volume.GLENDROSTANBlended Scotch Whisky70cl, 40% volume.Three bottles.CRAWFORD'S 3 STAR, OLD SMUGGLER ONE LITRE, GLENDROSTAN
5GORDON HIGHLANDERSBlended Scotch Whisky70cl, 40% volume, in carton.GRANTS FAMILY RESERVE - ONE LITREBlended Scotch WhiskyOne litre, 43% volume.BELL'S AGED 8 YEARS MINIATUREBlended Scotch Whisky5cl, 40% volume, in tube with branded glass.Two bottles & one miniature.GORDON HIGHLANDERS, GRANTS FAMILY RESERVE - ONE LITRE, BELL'S AGED 8 YEARS MINIATURE
6WHYTE & MACKAY 21 YEAR OLDBlended Scotch Whisky70cl, 43% volume, in box.WHYTE & MACKAY 21 YEAR OLD
7CUTTY SARK ART DECO AGED 33 YEARSBlended Scotch Whisky.700ml, 41.7% volume, in presentation box with two branded glasses and outer packaging.CUTTY SARK ART DECO AGED 33 YEARS
8JOHNNIE WALKER KILMARNOCK 400 AGED 15 YEARSBlended Scotch Whisky.Bottled to celebrate the 400th anniversary of the granting of burgh status to Kilmarnock.JOHNNIE WALKER KILMARNOCK 400 AGED 15 YEARS
9FLOWER OF SCOTLAND ROYAL CELEBRATIONBlended Scotch WhiskyBottle no. 123 of 741. 70cl, 40% volume, in wooden presentation box.FLOWER OF SCOTLAND ROYAL CELEBRATION
10WALLACE SINGLE MALT LIQUEUR 500MLSingle Malt Scotch Whisky Liquer500ml, 35% volume.WALLACE SINGLE MALT LIQUEUR 500ML
11JACK DANIELS ONE LITRETennessee Sour Mash Whiskey, USA.One litre, 43% volume, in carton.JACK DANIELS ONE LITRE, USA
12CUTTY SARK 25 YEARS OLDBlended Scotch Whisky700ml, 45.7% volume, in box.CUTTY SARK 25 YEARS OLD
13KING'S RANSOM AGED 12 YEARSBlended Scotch Whisky26 2/3 fl.oz, 75° proof.KING'S RANSOM AGED 12 YEARS
14OLD GLASGOWBlended Scotch Whisky75cl, 40% volume, in ceramic flagon in boxOLD GLASGOW
15OLD MULLBlended Scotch WhiskyNo capacity or strength stated, in ceramic flagonOLD MULL
16MACKINLAY'S LEGACY 12 YEARS OLDBlended Scotch Whisky75cl, 40% volume.MACKINLAY'S LEGACY 12 YEARS OLD
17GLAYVAScotch Whisky Liqueur.50cl, 35% volume, in tin, also includes a Drambuie Prince Charlie figure.GLAYVA
18J&B ULTIMABlended Scotch WhiskyA blend of 128 Scotch whiskies, bottle no. 401. 70cl, 43% volume, in carton.J&B ULTIMA
19CELTICBlended Scotch WhiskyBottled specially for the final Old Firm match of the century on 27 December 1999, bottle no. 439 of 600.. 70cl, 40% volume.CELTIC
20TRIUMPHBlended Malt Scotch WhiskyA blend of 27 malt whiskies, bottle no. 792. 70cl, 43% volume.TRIUMPH
21100 PIPERSBlended Scotch Whisky75cl, 40% volume.BLACK WATCHBlended Scotch Whisky70cl, 40% volume.Two bottles.PIPERS, BLACK WATCH
22USQUAEBACHRESERVEBlended Scotch Whisky750ml, 43% volume.USQUAEBACHRESERVE
Sheet2
Cell Formulas
RangeFormula
B1:B22B1=ExtractCapNum(A1)
 
Upvote 0
Solution
Thanks Peter. This has resulted in the best capture. I can see there are so many variables of character / number / case combinations. Really appreciate your support.
 
Upvote 0

Forum statistics

Threads
1,214,611
Messages
6,120,510
Members
448,967
Latest member
screechyboy79

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top