Regular expression pattern

danoneill

New Member
Joined
Mar 23, 2004
Messages
23
Can someone please advise what the correct pattern is to return the text between two search strings?

For example, in the html string below, I want to return the text between "selected>" and "<". In this example I want "NA".

HTML:
"<SELECT class=nfinput name=pace0><OPTION value=L>L</OPTION><OPTION value=P>P</OPTION><OPTION value=OP>OP</OPTION><OPTION value=M>M</OPTION><OPTION value=OM>OM</OPTION><OPTION value=BK>BK</OPTION><OPTION value=NA selected>NA</OPTION><OPTION value=SCR>SCR</OPTION></SELECT>"

I've tried using the lookahead/lookbehind;
(?<=selected>)(.*?)(?=<)
but it returns a syntax error in regular expression (5017).

I used Tushar Mehta's test website (http://www.tmehta.com/regexp/regexpfind.asp) as well as in VBA / Excel 2002 and get the same error.
 

Excel Facts

Formula for Yesterday
Name Manager, New Name. Yesterday =TODAY()-1. OK. Then, use =YESTERDAY in any cell. Tomorrow could be =TODAY()+1.
Hi danoneill,

You can try with the following to isolate part you want you can do:

1-) Search and replace with nothing this Regex
Code:
.*selected>

2-) Search and replace with nothing this Regex
Code:
<.*

3-) you'll get the text you want

Hope this helps,

Regards
 
Upvote 0
Hi

Try:

Code:
Function GetSelected(s As String) As String
 
With CreateObject("VBSCript.RegExp")
    .Pattern = ".*selected>([^<]+)<.*"
    GetSelected = .Replace(s, "$1")
End With
End Function
 
Upvote 0
Hi Dan

I - the solution posted

The pattern translates to:

1 - any number of any characters

followed by

2 - selected>

followed by

3 - a string of characters not equal to the character "<"

followed by

4 - the character "<"

followed by

5 - any number of any characters

This pattern reproduces your string. It includes all the characters of the string.

I enclosed part 3 in parentheses. This means that I'm capturing this part into a submatch, in this case it's the first submatch.

I then use the .Replace() method of the RegExp object to replace the whole string by the first submatch and that's what the function returns.

Notice that the pattern assumes that the sequence "selected>" + "some characters" +"<" exists. If that's not the case the function returns the whole string. You may want to tweak the pattern so that if the sequence does not exist, then you return an empty string, or else test first if the there's a pattern match and only then use the .Replace().


II - Another solution

As usual there would be other approaches possible. For ex. in this next one I'm focusing only on the part "selected>" + "some characters" +"<". If this part exists I return the value of the first submatch.


Code:
Function GetSelected(s As String) As String
 
With CreateObject("VBSCript.RegExp")
    .Pattern = "selected>([^<]+)<"
    If .Test(s) Then GetSelected = .Execute(s)(0).submatches(0)
End With
End Function


Remark: in your post you say you tried a solution using a LookBehind. LookBehind is not yet implemented in the RegExp object you use in vba.

HTH
 
Upvote 0

Forum statistics

Threads
1,224,521
Messages
6,179,291
Members
452,902
Latest member
Knuddeluff

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top