Excel Pattern Matching

MartinAustin

New Member
Joined
Mar 31, 2006
Messages
10
Hello everyone --

I have a deadline on monday for a project that seems as though regular expressions would fit the bill (though I am willing to try anything at this point).

The issue is cleaning a certain column of data of erroneous characters. The column represents an inbound shipments PO number, and can only be comprised of 7 numerical characters in a row. Here are some examples of what we get in this field:

7777-1234567-99
77,1234567-00

Etc, etc. The 1234567 is the valid PO number, and all I want to do is find the first occurence of 7 numerical characters in a row, store it in a variable, erase the current cell value, and place the 7 digit PO number back into the same field.

Ones that do not match should be copied into another sheet in Excel (which I can do no problem), but I'm having a hell of a time figuring this one out.

Can someone out there get me started? Thanks, and God Bless!

[edit: updated title since it doesn't have to be a regular expression solution, as long as it works! :eek:)]
 

Excel Facts

Test for Multiple Conditions in IF?
Use AND(test, test, test, test) or OR(test, test, test, ...) as the logical_test argument of IF.
I'm sure there are more eloquent ways to approach this, but I think this is good enough. Download the example if you have any problems.

MartinAustin.zip

<table width="100%" border="1" bgcolor="White" style="filter:progid:DXImageTransform.Microsoft.Gradient(endColorstr='#C0CFE2', startColorstr='#FFFFFF', gradientType='0');"><tr><TD><font size="2" face=Courier New>  <font color="#0000A0">Sub</font> Example()
       <font color="#0000A0">Dim</font> MyPurchaseOrderRange <font color="#0000A0">As</font> Range, Ec <font color="#0000A0">As</font> Range

       <font color="#0000A0">Set</font> MyPurchaseOrderRange = Application.InputBox( _
           "Select the range containing your Purchase Order Id's", , , , , , , 8)

       <font color="#0000A0">If</font> MyPurchaseOrderRange <font color="#0000A0">Is</font> <font color="#0000A0">Nothing</font> <font color="#0000A0">Then</font> <font color="#0000A0">Exit</font> <font color="#0000A0">Sub</font>

       <font color="#0000A0">For</font> <font color="#0000A0">Each</font> Ec <font color="#0000A0">In</font> MyPurchaseOrderRange
           Ec = CleanPO(Ec.Text)
       <font color="#0000A0">Next</font>
  <font color="#0000A0">End</font> <font color="#0000A0">Sub</font>

  <font color="#0000A0">Function</font> CleanPO(DirtyPO <font color="#0000A0">As</font> String) <font color="#0000A0">As</font> <font color="#0000A0">String</font>

       <font color="#0000A0">Dim</font> CharPos <font color="#0000A0">As</font> Long, TestChar <font color="#0000A0">As</font> <font color="#0000A0">String</font>
       <font color="#0000A0">Dim</font> LenOfString <font color="#0000A0">As</font> Long, TestString <font color="#0000A0">As</font> <font color="#0000A0">String</font>

       LenOfString = Len(DirtyPO)

       <font color="#0000A0">For</font> CharPos = 1 <font color="#0000A0">To</font> LenOfString
           TestChar = Mid(DirtyPO, CharPos, 1)
           <font color="#0000A0">If</font> IsNumeric(TestChar) <font color="#0000A0">Then</font>
               TestString = TestString & "#"
           <font color="#0000A0">Else</font>
               TestString = TestString & "?"
           <font color="#0000A0">End</font> <font color="#0000A0">If</font>
       <font color="#0000A0">Next</font>

       CleanPO = Mid(DirtyPO, InStr(TestString, "#######"), 7)

  <font color="#0000A0">End</font> <font color="#0000A0">Function</font>
</FONT></td></tr></table><button onclick='document.all("9162006164535978").value=document.all("9162006164535978").value.replace(/<br \/>\s\s/g,"");document.all("9162006164535978").value=document.all("9162006164535978").value.replace(/<br \/>/g,"");window.clipboardData.setData("Text",document.all("9162006164535978").value);'>Copy to Clipboard</BUTTON><textarea style="position:absolute;visibility:hidden" name="9162006164535978" wrap="virtual">
Sub Example()
Dim MyPurchaseOrderRange As Range, Ec As Range

Set MyPurchaseOrderRange = Application.InputBox( _
"Select the range containing your Purchase Order Id's", , , , , , , 8)

If MyPurchaseOrderRange Is Nothing Then Exit Sub

For Each Ec In MyPurchaseOrderRange
Ec = CleanPO(Ec.Text)
Next
End Sub

Function CleanPO(DirtyPO As String) As String

Dim CharPos As Long, TestChar As String
Dim LenOfString As Long, TestString As String

LenOfString = Len(DirtyPO)

For CharPos = 1 To LenOfString
TestChar = Mid(DirtyPO, CharPos, 1)
If IsNumeric(TestChar) Then
TestString = TestString & "#"
Else
TestString = TestString & "?"
End If
Next

CleanPO = Mid(DirtyPO, InStr(TestString, "#######"), 7)

End Function</textarea>

MartinAustin.zip
 
Upvote 0
Hi

This CleanPO returns the first 7 digit sequence in the string. I used replace.

Code:
  Function CleanPO(DirtyPO As String) As String

      With CreateObject("VBSCRIPT.REGEXP")
          .Pattern = ".*?(\d{7}).*"
          If .test(DirtyPO) Then
              CleanPO = .Replace(DirtyPO, "$1")
          Else
              CleanPO = "Not found PO number in :" & DirtyPO
          End If
      End With
  End Function

HTH
PGC
 
Upvote 0
Is more concise but very slow.

Faster.
<table width="100%" border="1" bgcolor="White" style="filter:progid:DXImageTransform.Microsoft.Gradient(endColorstr='#C0CFE2', startColorstr='#FFFFFF', gradientType='0');"><tr><TD><font size="2" face=Courier New>  <font color="#0000A0">Private</font> ovr <font color="#0000A0">As</font> <font color="#0000A0">Object</font>

  <font color="#0000A0">Sub</font> Example()
       <font color="#0000A0">Dim</font> MyPurchaseOrderRange <font color="#0000A0">As</font> Range, Ec <font color="#0000A0">As</font> Range

       <font color="#0000A0">Set</font> MyPurchaseOrderRange = Application.InputBox( _
           "Select the range containing your Purchase Order Id's", , "$A$1:$A$3437", , , , , 8)

       <font color="#0000A0">If</font> MyPurchaseOrderRange <font color="#0000A0">Is</font> <font color="#0000A0">Nothing</font> <font color="#0000A0">Then</font> <font color="#0000A0">Exit</font> <font color="#0000A0">Sub</font>

       <font color="#0000A0">Set</font> ovr = CreateObject("VBSCRIPT.REGEXP")

       <font color="#0000A0">For</font> <font color="#0000A0">Each</font> Ec <font color="#0000A0">In</font> MyPurchaseOrderRange
           Ec = CleanPO(Ec.Text)
       <font color="#0000A0">Next</font>
  <font color="#0000A0">End</font> <font color="#0000A0">Sub</font>

  <font color="#0000A0">Function</font> CleanPO(DirtyPO <font color="#0000A0">As</font> String) <font color="#0000A0">As</font> <font color="#0000A0">String</font>

       <font color="#0000A0">With</font> ovr
           .Pattern = ".*?(\d{7}).*"
           <font color="#0000A0">If</font> .test(DirtyPO) <font color="#0000A0">Then</font>
               CleanPO = .Replace(DirtyPO, "$1")
           <font color="#0000A0">Else</font>
               CleanPO = "Not found PO number in :" & DirtyPO
           <font color="#0000A0">End</font> <font color="#0000A0">If</font>
       <font color="#0000A0">End</font> <font color="#0000A0">With</font>
  <font color="#0000A0">End</font> <font color="#0000A0">Function</font>

</FONT></td></tr></table><button onclick='document.all("9162006175523275").value=document.all("9162006175523275").value.replace(/<br \/>\s\s/g,"");document.all("9162006175523275").value=document.all("9162006175523275").value.replace(/<br \/>/g,"");window.clipboardData.setData("Text",document.all("9162006175523275").value);'>Copy to Clipboard</BUTTON><textarea style="position:absolute;visibility:hidden" name="9162006175523275" wrap="virtual">
Private ovr As Object

Sub Example()
Dim MyPurchaseOrderRange As Range, Ec As Range

Set MyPurchaseOrderRange = Application.InputBox( _
"Select the range containing your Purchase Order Id's", , "$A$1:$A$3437", , , , , 8)

If MyPurchaseOrderRange Is Nothing Then Exit Sub

Set ovr = CreateObject("VBSCRIPT.REGEXP")

For Each Ec In MyPurchaseOrderRange
Ec = CleanPO(Ec.Text)
Next
End Sub

Function CleanPO(DirtyPO As String) As String

With ovr
.Pattern = ".*?(\d{7}).*"
If .test(DirtyPO) Then
CleanPO = .Replace(DirtyPO, "$1")
Else
CleanPO = "Not found PO number in :" & DirtyPO
End If
End With
End Function
</textarea>

Is very fast.
Set a reference to Microsoft VBScript Regular Expressions 5.5
<table width="100%" border="1" bgcolor="White" style="filter:progid:DXImageTransform.Microsoft.Gradient(endColorstr='#C0CFE2', startColorstr='#FFFFFF', gradientType='0');"><tr><TD><font size="2" face=Courier New>  <font color="#0000A0">Sub</font> Example()
       <font color="#0000A0">Dim</font> MyPurchaseOrderRange <font color="#0000A0">As</font> Range, Ec <font color="#0000A0">As</font> Range
       <font color="#0000A0">Dim</font> VbRegEx <font color="#0000A0">As</font> VBScript_RegExp_55.RegExp

       <font color="#0000A0">Set</font> MyPurchaseOrderRange = Application.InputBox( _
           "Select the range containing your Purchase Order Id's", , "$A$1:$A$3437", , , , , 8)

       <font color="#0000A0">If</font> MyPurchaseOrderRange <font color="#0000A0">Is</font> <font color="#0000A0">Nothing</font> <font color="#0000A0">Then</font> <font color="#0000A0">Exit</font> <font color="#0000A0">Sub</font>

       <font color="#0000A0">Set</font> VbRegEx = <font color="#0000A0">New</font> VBScript_RegExp_55.RegExp

       <font color="#0000A0">For</font> <font color="#0000A0">Each</font> Ec <font color="#0000A0">In</font> MyPurchaseOrderRange
           Ec = CleanPO(Ec.Text, VbRegEx)
       <font color="#0000A0">Next</font>
  <font color="#0000A0">End</font> <font color="#0000A0">Sub</font>

  <font color="#0000A0">Function</font> CleanPO(DirtyPO <font color="#0000A0">As</font> String, VbRegEx <font color="#0000A0">As</font> VBScript_RegExp_55.RegExp) <font color="#0000A0">As</font> <font color="#0000A0">String</font>

       <font color="#0000A0">With</font> VbRegEx
           .Pattern = ".*?(\d{7}).*"
           <font color="#0000A0">If</font> .test(DirtyPO) <font color="#0000A0">Then</font>
               CleanPO = .Replace(DirtyPO, "$1")
           <font color="#0000A0">Else</font>
               CleanPO = "Not found PO number in :" & DirtyPO
           <font color="#0000A0">End</font> <font color="#0000A0">If</font>
       <font color="#0000A0">End</font> <font color="#0000A0">With</font>
  <font color="#0000A0">End</font> <font color="#0000A0">Function</font>
</FONT></td></tr></table><button onclick='document.all("916200618510243").value=document.all("916200618510243").value.replace(/<br \/>\s\s/g,"");document.all("916200618510243").value=document.all("916200618510243").value.replace(/<br \/>/g,"");window.clipboardData.setData("Text",document.all("916200618510243").value);'>Copy to Clipboard</BUTTON><textarea style="position:absolute;visibility:hidden" name="916200618510243" wrap="virtual">
Sub Example()
Dim MyPurchaseOrderRange As Range, Ec As Range
Dim VbRegEx As VBScript_RegExp_55.RegExp

Set MyPurchaseOrderRange = Application.InputBox( _
"Select the range containing your Purchase Order Id's", , "$A$1:$A$3437", , , , , 8)

If MyPurchaseOrderRange Is Nothing Then Exit Sub

Set VbRegEx = New VBScript_RegExp_55.RegExp

For Each Ec In MyPurchaseOrderRange
Ec = CleanPO(Ec.Text, VbRegEx)
Next
End Sub

Function CleanPO(DirtyPO As String, VbRegEx As VBScript_RegExp_55.RegExp) As String

With VbRegEx
.Pattern = ".*?(\d{7}).*"
If .test(DirtyPO) Then
CleanPO = .Replace(DirtyPO, "$1")
Else
CleanPO = "Not found PO number in :" & DirtyPO
End If
End With
End Function</textarea>
 
Upvote 0
Hi Tom

"Is more concise but very slow."

Sometimes regular expressions are slow. If badly specified (as in this case) they can be very slow.

In fact I posted a dumb solution. I should have used a simple pattern and search it.

Maybe faster (surely more logical):

Code:
  Function CleanPO(DirtyPO As String) As String
  Dim oMatches As Object
  
      With CreateObject("VBSCRIPT.REGEXP")
          .Pattern = "\d{7}"
          If .test(DirtyPO) Then
              Set oMatches = .Execute(DirtyPO)
              CleanPO = oMatches(0)
          Else
              CleanPO = "Not found PO number in :" & DirtyPO
          End If
      End With
  End Function


About you question: "Can you return a substring using a regular expression?"

Yes you can. Each Match has a Submatches collection associated. To get submatches just include inside the pattern substrings in parenthesis. They will be the submatches starting in 0.

I did a small example just to illustrate it's use.

Cheers
PGC

Code:
Sub SubmatchesTest()
Dim s As String, oMatches As Object

s = "Start date: 12:24, end date 14:52"

With CreateObject("VBSCRIPT.REGEXP")
    .Pattern = "(\d+):(\d+)"
    .Global = True
    Set oMatches = .Execute(s)
End With

MsgBox "Start date: " & oMatches(0).submatches(0) & ":" & oMatches(0).submatches(1)
MsgBox "End date: " & oMatches(1).submatches(0) & ":" & oMatches(1).submatches(1)

End Sub
 
Upvote 0
Thanks for the info on returning substrings. I need to familiarize myself with reg expressions. I know very little about them.
 
Upvote 0
If you are comfortable with regular expressions, you can use the UDFs at
http://www.tmehta.com/regexp/
specifically,
http://www.tmehta.com/regexp/add_code.htm

Hello everyone --

I have a deadline on monday for a project that seems as though regular expressions would fit the bill (though I am willing to try anything at this point).

The issue is cleaning a certain column of data of erroneous characters. The column represents an inbound shipments PO number, and can only be comprised of 7 numerical characters in a row. Here are some examples of what we get in this field:

7777-1234567-99
77,1234567-00

Etc, etc. The 1234567 is the valid PO number, and all I want to do is find the first occurence of 7 numerical characters in a row, store it in a variable, erase the current cell value, and place the 7 digit PO number back into the same field.

Ones that do not match should be copied into another sheet in Excel (which I can do no problem), but I'm having a hell of a time figuring this one out.

Can someone out there get me started? Thanks, and God Bless!

[edit: updated title since it doesn't have to be a regular expression solution, as long as it works! :eek:)]
 
Upvote 0
I'd like to thank you all for your help on this project. I can't express how relieved I am that this is going to be done on time.

By the way, all solutions seemed to work equally well, as I'm only parsing about 500 records per week.

Thanks again!
 
Upvote 0

Forum statistics

Threads
1,213,539
Messages
6,114,221
Members
448,554
Latest member
Gleisner2

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top