Regular expression pattern

danoneill

New Member
Joined
Mar 23, 2004
Messages
23
Can someone please advise what the correct pattern is to return the text between two search strings?

For example, in the html string below, I want to return the text between "selected>" and "<". In this example I want "NA".

HTML:
"<SELECT class=nfinput name=pace0><OPTION value=L>L</OPTION><OPTION value=P>P</OPTION><OPTION value=OP>OP</OPTION><OPTION value=M>M</OPTION><OPTION value=OM>OM</OPTION><OPTION value=BK>BK</OPTION><OPTION value=NA selected>NA</OPTION><OPTION value=SCR>SCR</OPTION></SELECT>"

I've tried using the lookahead/lookbehind;
(?<=selected>)(.*?)(?=<)
but it returns a syntax error in regular expression (5017).

I used Tushar Mehta's test website (http://www.tmehta.com/regexp/regexpfind.asp) as well as in VBA / Excel 2002 and get the same error.
 

Excel Facts

Test for Multiple Conditions in IF?
Use AND(test, test, test, test) or OR(test, test, test, ...) as the logical_test argument of IF.

cgcamal

Active Member
Joined
May 2, 2007
Messages
472
Hi danoneill,

You can try with the following to isolate part you want you can do:

1-) Search and replace with nothing this Regex
Code:
.*selected>

2-) Search and replace with nothing this Regex
Code:
<.*

3-) you'll get the text you want

Hope this helps,

Regards
 

pgc01

MrExcel MVP
Joined
Apr 25, 2006
Messages
19,884
Hi

Try:

Code:
Function GetSelected(s As String) As String
 
With CreateObject("VBSCript.RegExp")
    .Pattern = ".*selected>([^<]+)<.*"
    GetSelected = .Replace(s, "$1")
End With
End Function
 

danoneill

New Member
Joined
Mar 23, 2004
Messages
23
Thanks pgc01, that works.

Would you mind explaining the pattern and the .Replace?

Dan
 

pgc01

MrExcel MVP
Joined
Apr 25, 2006
Messages
19,884

ADVERTISEMENT

Hi Dan

I - the solution posted

The pattern translates to:

1 - any number of any characters

followed by

2 - selected>

followed by

3 - a string of characters not equal to the character "<"

followed by

4 - the character "<"

followed by

5 - any number of any characters

This pattern reproduces your string. It includes all the characters of the string.

I enclosed part 3 in parentheses. This means that I'm capturing this part into a submatch, in this case it's the first submatch.

I then use the .Replace() method of the RegExp object to replace the whole string by the first submatch and that's what the function returns.

Notice that the pattern assumes that the sequence "selected>" + "some characters" +"<" exists. If that's not the case the function returns the whole string. You may want to tweak the pattern so that if the sequence does not exist, then you return an empty string, or else test first if the there's a pattern match and only then use the .Replace().


II - Another solution

As usual there would be other approaches possible. For ex. in this next one I'm focusing only on the part "selected>" + "some characters" +"<". If this part exists I return the value of the first submatch.


Code:
Function GetSelected(s As String) As String
 
With CreateObject("VBSCript.RegExp")
    .Pattern = "selected>([^<]+)<"
    If .Test(s) Then GetSelected = .Execute(s)(0).submatches(0)
End With
End Function


Remark: in your post you say you tried a solution using a LookBehind. LookBehind is not yet implemented in the RegExp object you use in vba.

HTH
 

Forum statistics

Threads
1,136,852
Messages
5,678,120
Members
419,746
Latest member
tysonboy82

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top