Use RegEx to creat in cell fuctions that find substrings and modify them in the string.

dagda13

Board Regular
Joined
May 18, 2019
Messages
52
Hi,

So I have a list if alphanumeric IDs that I'd like to modify using RegEx to create an in-cell function. I've already enabled Microsoft VBScript Regular Expression 5.5, and I'm using the below code:

VBA Code:
Function simpleCellRegex(Myrange As Range) As String
    Set regEx = CreateObject("VBScript.RegExp")
    Dim strPattern As String
    Dim strPattern2 As String
    Dim strInput As String
    Dim strReplace As String
    Dim strOutput As String
    
    strPattern = "_[a-zA-Z][0-9]"
        
       If strPattern <> "" Then
        strInput = Myrange.Value
        strReplace = "_"

        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
            
        End With

        If regEx.test(strInput) Then
            simpleCellRegex = regEx.Replace(strInput, strReplace)
        Else
            simpleCellRegex = "Not matched"
        End If
    End If
End Function

I have a list of IDs that use "_" as a delimiter:

lchr_01_b82_lc_enus
lchr_01_b50_lc_enus
lchr_01_a50_lc_enus
lchr_01_b01_lc_enus

Within each ID is a substring that has an alpha and a number between delimiters, ie., "_b82"_, "_b50_", "_a50_", "b01_".

Right now the above code will find the alphanumberic substrings (the string pattern " strPattern = "_[a-zA-Z][0-9]"), but I'd like to just delete the alpha and not anything else. I'm not sure how to make the strReplace value dynamic since the IDs are also dynamic. I'd also like to get rid of the the rest of the text after the last delimter. The desired output in the above example would be:

lchr_01_b82_lc_enuslchr_01_82_lc
lchr_01_b50_lc_enuslchr_01_50_lc
lchr_01_a50_lc_enuslchr_01_50_lc
lchr_01_b01_lc_enuslchr_01_01_lc

Is using RegEx the right way to do this, and if so, is the above the right way to go about this? Thanks in advance!


 

Excel Facts

Copy PDF to Excel
Select data in PDF. Paste to Microsoft Word. Copy from Word and paste to Excel.
Regex is not necessary if the ID's are always the same structure and length as your examples.

Worksheet formula :
=LEFT(SUBSTITUTE(A1,MID(A1,9,1),""),LEN(A1)-6)

VBA :
VBA Code:
Sub v()
Dim myRange As Range
Set myRange = [A1:A4] 'Or whatever
myRange = Evaluate(Replace("left(substitute(#,mid(#,9,1),""""),len(#)-6)", "#", myRange.Address))
End Sub



 
Upvote 0
Regex is not necessary if the ID's are always the same structure and length as your examples.

Worksheet formula :
=LEFT(SUBSTITUTE(A1,MID(A1,9,1),""),LEN(A1)-6)

VBA :
VBA Code:
Sub v()
Dim myRange As Range
Set myRange = [A1:A4] 'Or whatever
myRange = Evaluate(Replace("left(substitute(#,mid(#,9,1),""""),len(#)-6)", "#", myRange.Address))
End Sub


Hi Footoo,

Thanks very much for this solution, and so quickly. However, and this is my fault, yes, the IDs do have varying lengths. Another (better) sample is below:

bs_ald14_a01_enus
bs_ald14_a02_enus
bs_ald14_c03_enus
bs_oper46_a02_enus
bs_oper46_b03_enus
it_a01_dstrdwdj_enus
it_b01_dsudfddj_enus
it_c02_dsudfddj_enus

The only constant in all the IDs (and there are thousands) is the sub string in the IDs that has the pattern "_[a-z][0-9]_", ie., "_a01_", "_c03_"... it is a sub string that always has ony one letter and then only two numbers right after, with no spaces.

I'm sorry again for not having provided a better sample before. Thanks again.
 
Upvote 0
What about something like this?

VBA Code:
Function DelAlpha(s As String) As String
  Dim RX As Object
  
  Set RX = CreateObject("VBScript.Regexp")
  RX.IgnoreCase = True
  RX.Pattern = "(.*_)([a-z])(\d\d_.*)"
  DelAlpha = RX.Replace(s, "$1$3")
End Function

dagda13 2020-03-24 1.xlsm
AB
1bs_ald14_a01_enusbs_ald14_01_enus
2bs_ald14_a02_enusbs_ald14_02_enus
3bs_ald14_c03_enusbs_ald14_03_enus
4bs_oper46_a02_enusbs_oper46_02_enus
5bs_oper46_b03_enusbs_oper46_03_enus
6it_a01_dstrdwdj_enusit_01_dstrdwdj_enus
7it_b01_dsudfddj_enusit_01_dsudfddj_enus
8it_c02_dsudfddj_enusit_02_dsudfddj_enus
Sheet1
Cell Formulas
RangeFormula
B1:B8B1=DelAlpha(A1)
 
Upvote 0
In case you might like a non-RegExp solution, here is one you can consider...
VBA Code:
Function DelAlpha(S As String) As String
  Dim X As Long
  For X = 1 To Len(S)
    If Mid(S, X, 3) Like "_[A-Za-z]#" Then
      S = Application.Replace(S, X + 1, 1, "")
      Exit For
    End If
  Next
  DelAlpha = Left(S, InStrRev(S, "_") - 1)
End Function
 
Upvote 0
Just looking back over the thread after reading Rick's code, I realised that I missed ..
I'd also like to get rid of the the rest of the text after the last delimter.
.. though your samples make it look like the last delimiter should also be deleted, not just the text after it as stated.
The adjustment to my code is

VBA Code:
Function DelAlpha(s As String) As String
  Dim RX As Object
  
  Set RX = CreateObject("VBScript.Regexp")
  RX.Pattern = "(.*_)([a-zA-Z])(\d\d.*)"
  DelAlpha = RX.Replace(Left(s, InStrRev(s, "_") - 1), "$1$3")
End Function
 
Upvote 0
Possible alternative :

Worksheet formula :
=LEFT(REPLACE(A1,SEARCH("_???_",A1)+1,1,""),LEN(A1)-6)

VBA :
VBA Code:
Sub v()
Dim myRange As Range
Set myRange = [A1:A4] 'Or whatever
myRange = Evaluate(Replace("left(replace(#,search(""_???_"",#)+1,1,""""),len(#)-6)", "#", myRange.Address))
End Sub
 
Upvote 0
Just looking back over the thread after reading Rick's code, I realised that I missed ..
.. though your samples make it look like the last delimiter should also be deleted, not just the text after it as stated.
The adjustment to my code is

VBA Code:
Function DelAlpha(s As String) As String
  Dim RX As Object

  Set RX = CreateObject("VBScript.Regexp")
  RX.Pattern = "(.*_)([a-zA-Z])(\d\d.*)"
  DelAlpha = RX.Replace(Left(s, InStrRev(s, "_") - 1), "$1$3")
End Function
Hi Peter,

Thanks very much, this works perfectly! Just a quick question. Would it be possible to introduce a conditional depending upon the a particular pattern in the string, as to what gets deleted at the end. In other words, would it be possible to delete everything after the last 2 "_" delimiters if a particular string (ie., ?ID) has a particular pattern in a sub-string?

In this case, if an ID has a sub-string with pattern like the first 3 IDs, that contain "_[a-z][0-9][0-9]_[0-9][0-9]"? The first 3 IDs would lose everything past the last 2 delimiters, where as the second three would be like before, losing the last delimiter and the text after it?

So in other words, using a conditional depending upon a sub-string with ,ie, "a01_10)" (lose everything after the last two delimiters) or just "_a01", as already in the above function, where it's just the last delimiter?

ds_swaywapp_a01_10_enusds_swaywapp_01
ds_swaywapp_b02_09_enusds_swaywapp_02
ds_swaywapp_c03_09_enusds_swaywapp_03
ds_swayw10_a01_enusds_swayw10_01
ds_swayw10_b02_enusds_swayw10_02
ds_swayw10_c03_enusds_swayw10_03
 
Upvote 0
In case you might like a non-RegExp solution, here is one you can consider...
VBA Code:
Function DelAlpha(S As String) As String
  Dim X As Long
  For X = 1 To Len(S)
    If Mid(S, X, 3) Like "_[A-Za-z]#" Then
      S = Application.Replace(S, X + 1, 1, "")
      Exit For
    End If
  Next
  DelAlpha = Left(S, InStrRev(S, "_") - 1)
End Function
Thanks Rick, this works great.

Just a quick question. Would it be possible to introduce a conditional depending upon the a particular pattern in the string, as to what gets deleted at the end. In other words, would it be possible to delete everything after the last 2 "_" delimiters if a particular string (ie., ?ID) has a particular pattern in a sub-string?

In this case, if an ID has a sub-string with pattern like the first 3 IDs, that contain "_[a-z][0-9][0-9]_[0-9][0-9]"? The first 3 IDs would lose everything past the last 2 delimiters, where as the second three would be like before, losing the last delimiter and the text after it?

So in other words, using a conditional depending upon a sub-string with ,ie, "a01_10)" (lose everything after the last two delimiters) or just "_a01", as already in the above function, where it's just the last delimiter?


ds_swaywapp_a01_10_enusds_swaywapp_01
ds_swaywapp_b02_09_enusds_swaywapp_02
ds_swaywapp_c03_09_enusds_swaywapp_03
ds_swayw10_a01_enusds_swayw10_01
ds_swayw10_b02_enusds_swayw10_02
ds_swayw10_c03_enusds_swayw10_03
 
Upvote 0
The letter that we are removing... is it always followed by exactly two digits and an underline character? If so, is it always after that second digit that you want the text deleted (so that the resulting text always ends with the two-digit number after the underline character followed by a single letter? If so, give this function a try...
VBA Code:
Function DelAlpha(S As String) As String
  Dim X As Long
  For X = 1 To Len(S)
    If Mid(S, X, 3) Like "_[A-Za-z]#" Then
      S = Left(S, X) & Mid(S, X + 2, 2)
      Exit For
    End If
  Next
  DelAlpha = Left(S, InStrRev(S, "_") + 3)
End Function
 
Upvote 0

Forum statistics

Threads
1,213,534
Messages
6,114,184
Members
448,554
Latest member
Gleisner2

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top