Excel Pattern Matching

MartinAustin

New Member
Joined
Mar 31, 2006
Messages
10
Hello everyone --

I have a deadline on monday for a project that seems as though regular expressions would fit the bill (though I am willing to try anything at this point).

The issue is cleaning a certain column of data of erroneous characters. The column represents an inbound shipments PO number, and can only be comprised of 7 numerical characters in a row. Here are some examples of what we get in this field:

7777-1234567-99
77,1234567-00

Etc, etc. The 1234567 is the valid PO number, and all I want to do is find the first occurence of 7 numerical characters in a row, store it in a variable, erase the current cell value, and place the 7 digit PO number back into the same field.

Ones that do not match should be copied into another sheet in Excel (which I can do no problem), but I'm having a hell of a time figuring this one out.

Can someone out there get me started? Thanks, and God Bless!

[edit: updated title since it doesn't have to be a regular expression solution, as long as it works! :eek:)]
 

Excel Facts

Did you know Excel offers Filter by Selection?
Add the AutoFilter icon to the Quick Access Toolbar. Select a cell containing Apple, click AutoFilter, and you will get all rows with Apple
L

Legacy 98055

Guest
I'm sure there are more eloquent ways to approach this, but I think this is good enough. Download the example if you have any problems.

MartinAustin.zip

<table width="100%" border="1" bgcolor="White" style="filter:progid:DXImageTransform.Microsoft.Gradient(endColorstr='#C0CFE2', startColorstr='#FFFFFF', gradientType='0');"><tr><TD><font size="2" face=Courier New>  <font color="#0000A0">Sub</font> Example()
       <font color="#0000A0">Dim</font> MyPurchaseOrderRange <font color="#0000A0">As</font> Range, Ec <font color="#0000A0">As</font> Range

       <font color="#0000A0">Set</font> MyPurchaseOrderRange = Application.InputBox( _
           "Select the range containing your Purchase Order Id's", , , , , , , 8)

       <font color="#0000A0">If</font> MyPurchaseOrderRange <font color="#0000A0">Is</font> <font color="#0000A0">Nothing</font> <font color="#0000A0">Then</font> <font color="#0000A0">Exit</font> <font color="#0000A0">Sub</font>

       <font color="#0000A0">For</font> <font color="#0000A0">Each</font> Ec <font color="#0000A0">In</font> MyPurchaseOrderRange
           Ec = CleanPO(Ec.Text)
       <font color="#0000A0">Next</font>
  <font color="#0000A0">End</font> <font color="#0000A0">Sub</font>

  <font color="#0000A0">Function</font> CleanPO(DirtyPO <font color="#0000A0">As</font> String) <font color="#0000A0">As</font> <font color="#0000A0">String</font>

       <font color="#0000A0">Dim</font> CharPos <font color="#0000A0">As</font> Long, TestChar <font color="#0000A0">As</font> <font color="#0000A0">String</font>
       <font color="#0000A0">Dim</font> LenOfString <font color="#0000A0">As</font> Long, TestString <font color="#0000A0">As</font> <font color="#0000A0">String</font>

       LenOfString = Len(DirtyPO)

       <font color="#0000A0">For</font> CharPos = 1 <font color="#0000A0">To</font> LenOfString
           TestChar = Mid(DirtyPO, CharPos, 1)
           <font color="#0000A0">If</font> IsNumeric(TestChar) <font color="#0000A0">Then</font>
               TestString = TestString & "#"
           <font color="#0000A0">Else</font>
               TestString = TestString & "?"
           <font color="#0000A0">End</font> <font color="#0000A0">If</font>
       <font color="#0000A0">Next</font>

       CleanPO = Mid(DirtyPO, InStr(TestString, "#######"), 7)

  <font color="#0000A0">End</font> <font color="#0000A0">Function</font>
</FONT></td></tr></table><button onclick='document.all("9162006164535978").value=document.all("9162006164535978").value.replace(/<br \/>\s\s/g,"");document.all("9162006164535978").value=document.all("9162006164535978").value.replace(/<br \/>/g,"");window.clipboardData.setData("Text",document.all("9162006164535978").value);'>Copy to Clipboard</BUTTON><textarea style="position:absolute;visibility:hidden" name="9162006164535978" wrap="virtual">
Sub Example()
Dim MyPurchaseOrderRange As Range, Ec As Range

Set MyPurchaseOrderRange = Application.InputBox( _
"Select the range containing your Purchase Order Id's", , , , , , , 8)

If MyPurchaseOrderRange Is Nothing Then Exit Sub

For Each Ec In MyPurchaseOrderRange
Ec = CleanPO(Ec.Text)
Next
End Sub

Function CleanPO(DirtyPO As String) As String

Dim CharPos As Long, TestChar As String
Dim LenOfString As Long, TestString As String

LenOfString = Len(DirtyPO)

For CharPos = 1 To LenOfString
TestChar = Mid(DirtyPO, CharPos, 1)
If IsNumeric(TestChar) Then
TestString = TestString & "#"
Else
TestString = TestString & "?"
End If
Next

CleanPO = Mid(DirtyPO, InStr(TestString, "#######"), 7)

End Function</textarea>

MartinAustin.zip
 

just_jon

Legend
Joined
Sep 3, 2002
Messages
10,473
Hiya, Tom - I like that approach ( esp. as I simply cannot remember regexp syntax. )
 
L

Legacy 98055

Guest
Hi Jon. Can you return a substring using a regular expression?
 

pgc01

MrExcel MVP
Joined
Apr 25, 2006
Messages
19,884

ADVERTISEMENT

Hi

This CleanPO returns the first 7 digit sequence in the string. I used replace.

Code:
  Function CleanPO(DirtyPO As String) As String

      With CreateObject("VBSCRIPT.REGEXP")
          .Pattern = ".*?(\d{7}).*"
          If .test(DirtyPO) Then
              CleanPO = .Replace(DirtyPO, "$1")
          Else
              CleanPO = "Not found PO number in :" & DirtyPO
          End If
      End With
  End Function

HTH
PGC
 
L

Legacy 98055

Guest
Is more concise but very slow.

Faster.
<table width="100%" border="1" bgcolor="White" style="filter:progid:DXImageTransform.Microsoft.Gradient(endColorstr='#C0CFE2', startColorstr='#FFFFFF', gradientType='0');"><tr><TD><font size="2" face=Courier New>  <font color="#0000A0">Private</font> ovr <font color="#0000A0">As</font> <font color="#0000A0">Object</font>

  <font color="#0000A0">Sub</font> Example()
       <font color="#0000A0">Dim</font> MyPurchaseOrderRange <font color="#0000A0">As</font> Range, Ec <font color="#0000A0">As</font> Range

       <font color="#0000A0">Set</font> MyPurchaseOrderRange = Application.InputBox( _
           "Select the range containing your Purchase Order Id's", , "$A$1:$A$3437", , , , , 8)

       <font color="#0000A0">If</font> MyPurchaseOrderRange <font color="#0000A0">Is</font> <font color="#0000A0">Nothing</font> <font color="#0000A0">Then</font> <font color="#0000A0">Exit</font> <font color="#0000A0">Sub</font>

       <font color="#0000A0">Set</font> ovr = CreateObject("VBSCRIPT.REGEXP")

       <font color="#0000A0">For</font> <font color="#0000A0">Each</font> Ec <font color="#0000A0">In</font> MyPurchaseOrderRange
           Ec = CleanPO(Ec.Text)
       <font color="#0000A0">Next</font>
  <font color="#0000A0">End</font> <font color="#0000A0">Sub</font>

  <font color="#0000A0">Function</font> CleanPO(DirtyPO <font color="#0000A0">As</font> String) <font color="#0000A0">As</font> <font color="#0000A0">String</font>

       <font color="#0000A0">With</font> ovr
           .Pattern = ".*?(\d{7}).*"
           <font color="#0000A0">If</font> .test(DirtyPO) <font color="#0000A0">Then</font>
               CleanPO = .Replace(DirtyPO, "$1")
           <font color="#0000A0">Else</font>
               CleanPO = "Not found PO number in :" & DirtyPO
           <font color="#0000A0">End</font> <font color="#0000A0">If</font>
       <font color="#0000A0">End</font> <font color="#0000A0">With</font>
  <font color="#0000A0">End</font> <font color="#0000A0">Function</font>

</FONT></td></tr></table><button onclick='document.all("9162006175523275").value=document.all("9162006175523275").value.replace(/<br \/>\s\s/g,"");document.all("9162006175523275").value=document.all("9162006175523275").value.replace(/<br \/>/g,"");window.clipboardData.setData("Text",document.all("9162006175523275").value);'>Copy to Clipboard</BUTTON><textarea style="position:absolute;visibility:hidden" name="9162006175523275" wrap="virtual">
Private ovr As Object

Sub Example()
Dim MyPurchaseOrderRange As Range, Ec As Range

Set MyPurchaseOrderRange = Application.InputBox( _
"Select the range containing your Purchase Order Id's", , "$A$1:$A$3437", , , , , 8)

If MyPurchaseOrderRange Is Nothing Then Exit Sub

Set ovr = CreateObject("VBSCRIPT.REGEXP")

For Each Ec In MyPurchaseOrderRange
Ec = CleanPO(Ec.Text)
Next
End Sub

Function CleanPO(DirtyPO As String) As String

With ovr
.Pattern = ".*?(\d{7}).*"
If .test(DirtyPO) Then
CleanPO = .Replace(DirtyPO, "$1")
Else
CleanPO = "Not found PO number in :" & DirtyPO
End If
End With
End Function
</textarea>

Is very fast.
Set a reference to Microsoft VBScript Regular Expressions 5.5
<table width="100%" border="1" bgcolor="White" style="filter:progid:DXImageTransform.Microsoft.Gradient(endColorstr='#C0CFE2', startColorstr='#FFFFFF', gradientType='0');"><tr><TD><font size="2" face=Courier New>  <font color="#0000A0">Sub</font> Example()
       <font color="#0000A0">Dim</font> MyPurchaseOrderRange <font color="#0000A0">As</font> Range, Ec <font color="#0000A0">As</font> Range
       <font color="#0000A0">Dim</font> VbRegEx <font color="#0000A0">As</font> VBScript_RegExp_55.RegExp

       <font color="#0000A0">Set</font> MyPurchaseOrderRange = Application.InputBox( _
           "Select the range containing your Purchase Order Id's", , "$A$1:$A$3437", , , , , 8)

       <font color="#0000A0">If</font> MyPurchaseOrderRange <font color="#0000A0">Is</font> <font color="#0000A0">Nothing</font> <font color="#0000A0">Then</font> <font color="#0000A0">Exit</font> <font color="#0000A0">Sub</font>

       <font color="#0000A0">Set</font> VbRegEx = <font color="#0000A0">New</font> VBScript_RegExp_55.RegExp

       <font color="#0000A0">For</font> <font color="#0000A0">Each</font> Ec <font color="#0000A0">In</font> MyPurchaseOrderRange
           Ec = CleanPO(Ec.Text, VbRegEx)
       <font color="#0000A0">Next</font>
  <font color="#0000A0">End</font> <font color="#0000A0">Sub</font>

  <font color="#0000A0">Function</font> CleanPO(DirtyPO <font color="#0000A0">As</font> String, VbRegEx <font color="#0000A0">As</font> VBScript_RegExp_55.RegExp) <font color="#0000A0">As</font> <font color="#0000A0">String</font>

       <font color="#0000A0">With</font> VbRegEx
           .Pattern = ".*?(\d{7}).*"
           <font color="#0000A0">If</font> .test(DirtyPO) <font color="#0000A0">Then</font>
               CleanPO = .Replace(DirtyPO, "$1")
           <font color="#0000A0">Else</font>
               CleanPO = "Not found PO number in :" & DirtyPO
           <font color="#0000A0">End</font> <font color="#0000A0">If</font>
       <font color="#0000A0">End</font> <font color="#0000A0">With</font>
  <font color="#0000A0">End</font> <font color="#0000A0">Function</font>
</FONT></td></tr></table><button onclick='document.all("916200618510243").value=document.all("916200618510243").value.replace(/<br \/>\s\s/g,"");document.all("916200618510243").value=document.all("916200618510243").value.replace(/<br \/>/g,"");window.clipboardData.setData("Text",document.all("916200618510243").value);'>Copy to Clipboard</BUTTON><textarea style="position:absolute;visibility:hidden" name="916200618510243" wrap="virtual">
Sub Example()
Dim MyPurchaseOrderRange As Range, Ec As Range
Dim VbRegEx As VBScript_RegExp_55.RegExp

Set MyPurchaseOrderRange = Application.InputBox( _
"Select the range containing your Purchase Order Id's", , "$A$1:$A$3437", , , , , 8)

If MyPurchaseOrderRange Is Nothing Then Exit Sub

Set VbRegEx = New VBScript_RegExp_55.RegExp

For Each Ec In MyPurchaseOrderRange
Ec = CleanPO(Ec.Text, VbRegEx)
Next
End Sub

Function CleanPO(DirtyPO As String, VbRegEx As VBScript_RegExp_55.RegExp) As String

With VbRegEx
.Pattern = ".*?(\d{7}).*"
If .test(DirtyPO) Then
CleanPO = .Replace(DirtyPO, "$1")
Else
CleanPO = "Not found PO number in :" & DirtyPO
End If
End With
End Function</textarea>
 

pgc01

MrExcel MVP
Joined
Apr 25, 2006
Messages
19,884

ADVERTISEMENT

Hi Tom

"Is more concise but very slow."

Sometimes regular expressions are slow. If badly specified (as in this case) they can be very slow.

In fact I posted a dumb solution. I should have used a simple pattern and search it.

Maybe faster (surely more logical):

Code:
  Function CleanPO(DirtyPO As String) As String
  Dim oMatches As Object
  
      With CreateObject("VBSCRIPT.REGEXP")
          .Pattern = "\d{7}"
          If .test(DirtyPO) Then
              Set oMatches = .Execute(DirtyPO)
              CleanPO = oMatches(0)
          Else
              CleanPO = "Not found PO number in :" & DirtyPO
          End If
      End With
  End Function


About you question: "Can you return a substring using a regular expression?"

Yes you can. Each Match has a Submatches collection associated. To get submatches just include inside the pattern substrings in parenthesis. They will be the submatches starting in 0.

I did a small example just to illustrate it's use.

Cheers
PGC

Code:
Sub SubmatchesTest()
Dim s As String, oMatches As Object

s = "Start date: 12:24, end date 14:52"

With CreateObject("VBSCRIPT.REGEXP")
    .Pattern = "(\d+):(\d+)"
    .Global = True
    Set oMatches = .Execute(s)
End With

MsgBox "Start date: " & oMatches(0).submatches(0) & ":" & oMatches(0).submatches(1)
MsgBox "End date: " & oMatches(1).submatches(0) & ":" & oMatches(1).submatches(1)

End Sub
 
L

Legacy 98055

Guest
Thanks for the info on returning substrings. I need to familiarize myself with reg expressions. I know very little about them.
 

tusharm

MrExcel MVP
Joined
May 28, 2002
Messages
11,029
If you are comfortable with regular expressions, you can use the UDFs at
http://www.tmehta.com/regexp/
specifically,
http://www.tmehta.com/regexp/add_code.htm

Hello everyone --

I have a deadline on monday for a project that seems as though regular expressions would fit the bill (though I am willing to try anything at this point).

The issue is cleaning a certain column of data of erroneous characters. The column represents an inbound shipments PO number, and can only be comprised of 7 numerical characters in a row. Here are some examples of what we get in this field:

7777-1234567-99
77,1234567-00

Etc, etc. The 1234567 is the valid PO number, and all I want to do is find the first occurence of 7 numerical characters in a row, store it in a variable, erase the current cell value, and place the 7 digit PO number back into the same field.

Ones that do not match should be copied into another sheet in Excel (which I can do no problem), but I'm having a hell of a time figuring this one out.

Can someone out there get me started? Thanks, and God Bless!

[edit: updated title since it doesn't have to be a regular expression solution, as long as it works! :eek:)]
 

MartinAustin

New Member
Joined
Mar 31, 2006
Messages
10
I'd like to thank you all for your help on this project. I can't express how relieved I am that this is going to be done on time.

By the way, all solutions seemed to work equally well, as I'm only parsing about 500 records per week.

Thanks again!
 

Forum statistics

Threads
1,136,427
Messages
5,675,777
Members
419,585
Latest member
popsin

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top