Understanding/using RegExp for string processing in VBA

JackDanIce

Well-known Member
Joined
Feb 3, 2010
Messages
9,922
Office Version
  1. 365
Platform
  1. Windows
Hi,

Imported data (csv file) has a column of strings and I need to extract 11 characters after a particular value, as long as those 11 characters are alpha-numeric, i.e. A-Z, a-z and 0-9 (no special characters).

The string possibly contains an ISIN code value preceded by the term "ANY." (constant in every string).

Example with required extraction in blue:
Rich (BB code):
RMDS_IDN_EM::IDN_RDF.ANY.AT000034269=TE.NaE
(I already have a check function to verify it's an ISIN and generate a 12th digit check digit)

However, the column can contain false ISIN values (in red) such as:
Rich (BB code):
RMDS_IDN_EM::IDN_RDF.ANY.ITEEU3Y=TE.NaE

I can already extract the 11 characters using a combination of MID and FIND and currently my code eliminates any extraction that contains the character "=".


I'd like to learn how Reg Exp object can be used to extract this value, I suspect it will shorten the code as well for the matching part.

Can you suggest the required code please?

Thank you in advance,
Jack

Alternatively, every string contains one and only one instance of "=", so another test might be extract all the characters between "ANY." and "=" (discard row if length <> 11 or contains symbols not alpha-numeric)
 
Last edited:
The blue line is causing an error because you have the return value of your function testRegExp as a string.

.test in regular expressions returns a boolean True or False, it tests whether there's a match for your pattern and simply returns True or False, which is not a string.
.execute creates the matches collection.
 
Upvote 0

Excel Facts

Which lookup functions find a value equal or greater than the lookup value?
MATCH uses -1 to find larger value (lookup table must be sorted ZA). XLOOKUP uses 1 to find values greater and does not need to be sorted.
Try this:

Code:
Function testRegExp(ByVal str As String) As String
    Dim matches, subMatches
    
    testRegExp = ""
    With CreateObject("VBScript.RegExp")
        .Pattern = "(.*ANY\.)([A-Z]{2}\d{9})(=.*)"
        .IgnoreCase = True
        .Global = True
    
        Set matches = .Execute(str)
        If matches.Count > 0 Then Set subMatches = matches(0).subMatches: testRegExp = subMatches(1)
    End With
End Function
 
Upvote 0
Well I still think you should use the look behind.

Regardless of what you use, there is no reason for unnecessary capturing groups, capture the data you want and use it.
 
Upvote 0
I know you said you wanted to practice using RegExp, but this problem is relatively simple to solve with using it. Consider the following UDF (which should be faster than the RegExp version)...

Code:
Function GetID(S As String) As String
  GetID = Split(Replace(S, ".ANY.", "="), "=")(1)
  If Not GetID Like "[A-Za-z][A-Za-z]#########" Then GetID = ""
End Function
 
Upvote 0
@skywriter now I understand for .test data type error, should have realised that, thank you (as well as perseverance with me to solve this)

I'm still unclear about why use lookbehind and capturing groups, I only want to extract a 'middle' part of a string that is inbetween two markers. The positions of the markers or length inbetween them are always variable. Some strings won't contain a full 11 chars for an ISIN (they maybe SEDOLs of 6 digits length) so it's capturing a single group that fits inside the markers AND is of length 11.

@V_Malkoti, that's what I wanted expression wise but now looking back I see why skywriter considers it as capturing groups (I'm assuming subMatches(0) is the left part of the string upto "ANY." and subMatches(2) is the right part of the string from "=" to the end.

The middle capture (subMatches(1)) can be of variable length so this will only return if the middle part is 11 characters in length, else a null string

@Rick grrrr! You are very good at providing alternative suggestions and it's a faster one that I see is easier to understand, thank you for providing :)

Enjoy your weekend all,
Jack
 
Upvote 0
@Rick One small adjustment, the 9 characters after the first two alpha characters are alphanumeric and not numeric alone (discovered after the initial post), I believe # denotes numeric chars only - is there a symbol for alphanumeric? Seaching "Custom Excel Cell Formatting" but not quite finding what I'm looking for (yet)..
 
Upvote 0
@Rick One small adjustment, the 9 characters after the first two alpha characters are alphanumeric and not numeric alone (discovered after the initial post), I believe # denotes numeric chars only - is there a symbol for alphanumeric? Seaching "Custom Excel Cell Formatting" but not quite finding what I'm looking for (yet)..
This should work...
Code:
[table="width: 500"]
[tr]
	[td]Function GetID(S As String) As String
  GetID = Split(Replace(S, ".ANY.", "="), "=")(1)
  If Not GetID Like "[A-Za-z][A-Za-z]" & Application.Rept("[A-Za-z0-9]", 9) Then GetID = ""
End Function[/td]
[/tr]
[/table]
 
Upvote 0
@V_Malkoti, that's what I wanted expression wise
Yes thank you Istvan, though ended up using Rick's suggestion, due to need for workbook to be shared with others and if others ever need to amend or change, it's a little easier for them to.
 
Upvote 0

Forum statistics

Threads
1,215,452
Messages
6,124,916
Members
449,195
Latest member
Stevenciu

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top