Parsing Text Fields

myers_co

New Member
Joined
Mar 23, 2015
Messages
6
I have a column containing mixed data that I need to parse and extract a particular portion from but I've run out of brain power in figuring this one out. The data in the column is a combination of string codes and number codes that are joined by a varying number of spaces (no apparent rhyme or reason) and occasionally other symbols (like ~!@#$%^&*-_=+). The following list shows SOME of the possibilities of combinations that this column contains:

TRISTAR(15 spaces)M4 04333-0032
WIND ONE(11 spaces)K5
EXTRENDO(13 spaces)D1 92366
WILK17(17 spaces)K7 140 816
A-723(2 spaces)121-701
SAW SCAW 2222

It is a horrible mess! What I need to do is find a way to submit a pattern to a function and have the function return a zero (not found) or a numeric value (the position in the string) where the pattern is located. The pattern needs to allow for spaces, characters, digits and symbols (in fixed and variable quantities).

Examples:
=PARSE(A1, "[D5]-[D4]") should result in 26 for the first data example but return 0 for all others. The "[D5]-[D4]" pattern would describe 5 digits connected with a dash connected to 4 digits.

=PARSE(A1,"[SX][CD]") should result in 23 for the first example, 20 for the second, 22 for the third, 24 for the fourth, 0 for the fifth and 0 for the sixth. The "[SX][CD]" pattern would describe a variable number of spaces immediately followed by a character and digit combination.
 
Last edited:

Excel Facts

Formula for Yesterday
Name Manager, New Name. Yesterday =TODAY()-1. OK. Then, use =YESTERDAY in any cell. Tomorrow could be =TODAY()+1.
You will probably want to use the VBScript Regular Expressions library of tools. This link provides an overview.

https://msdn.microsoft.com/en-us/library/6wzad2b2(v=vs.84).aspx

I'd recommend using the standard matching characters for regEx instead of creating a custom mapping.

You can use regEx in a VBA User Defined Function and use that function in worksheet formulas.

Paste this code into a Standard Code Module in your workbook (like Module1)...
Code:
Function FindPatternStart(ByVal sInput As String, _
   ByVal sPattern As String, _
   Optional bIgnoreCase As Boolean = False) As Long

'--looks for first instance of RegEx pattern
'--if found, returns character position in string with
'    the first character index at 1.
'--if not found, returns 0

 Dim lReturn As Long
 Dim regEx As Object
 
 Set regEx = CreateObject("vbscript.regexp")

 With regEx
   .Global = True
   .MultiLine = True
   .IgnoreCase = bIgnoreCase
   .Pattern = "(" & sPattern & ")"
 End With

 If regEx.Test(sInput) Then
   lReturn = regEx.Execute(sInput)(0).FirstIndex + 1
 End If

 FindPatternStart = lReturn
End Function

Once that is in place, you can use a formula like this to accomplish your first example:
=FindPatternStart(A1,"\d{5}-\d{4}")
 
Upvote 0
Jerry has given a great starting point. If you want to be able to devise your own patterns you will certainly need to read up on Regular Expressions.
Otherwise, post back details of what is required and somebody here may be able to help.

I do have some comments about your second example though. You said the pattern was "a variable number of spaces immediately followed by a character and digit combination."
For your sample data, that pattern actually starts at position
8
9
9
7
0
0

See column C below for the pattern used for that.

What your expected results seem to be describing is rather where a single character followed by a digit starts, providing that follows any number of spaces.
You could still used Jerry's function for that, but a different pattern would be required, as well as a bit more formula work - see column D

Column B is the function used for the first problem, as described by Jerry.

Excel Workbook
ABCD
1TRISTAR M4 04333-003226823
2WIND ONE K50920
3EXTRENDO D1 923660922
4WILK17 K7 140 8160724
5A-723 121-701000
6SAW SCAW 2222000
Sheet1
 
Upvote 0
Actually, depending on what variations you will want, you may be able to avoid Regular Expressions if you so desire - but would require knowledge of "Like" patterns instead.

Here is another UDF that seems to cope with the 2 situations that you have described so far.

This is not case-sensitive. I have not tried to offer that option with this function.

Rich (BB code):
Function FindStart(s As String, p As String) As Long
  Dim i As Long, lp As Long
  Dim lcs As String
  
  lcs = LCase(s)
  lp = Len(Replace(p, "[a-z]", 1))
  For i = 1 To Len(s) - lp + 1
    If Mid(lcs, i, lp) Like p Then
      FindStart = i
      Exit For
    End If
  Next i
End Function


Excel Workbook
ABC
1TRISTAR M4 04333-00322623
2WIND ONE K5020
3EXTRENDO D1 92366022
4WILK17 K7 140 816024
5A-723 121-70100
6SAW SCAW 222200
Sheet2
 
Last edited:
Upvote 0
Here is another UDF that seems to cope with the 2 situations that you have described so far.

This is not case-sensitive. I have not tried to offer that option with this function.

Rich (BB code):
Function FindStart(s As String, p As String) As Long
  Dim i As Long, lp As Long
  Dim lcs As String
  
  lcs = LCase(s)
  lp = Len(Replace(p, "[a-z]", 1))
  For i = 1 To Len(s) - lp + 1
    If Mid(lcs, i, lp) Like p Then
      FindStart = i
      Exit For
    End If
  Next i
End Function
If you want your function to be able to include internal asterisks (wildcards), I think you can modify it like this...
Code:
Function FindStart(s As String, p As String) As Long
  Dim i As Long
  Dim lcs As String
  
  ' Remove any leading asterisks
  Do Until Left(p, 1) <> "*"
    p = Mid(p, 2)
  Loop
  
  lcs = LCase(s)
  For i = 1 To Len(s)
    If Mid(lcs, i) Like p & "*" Then
      FindStart = i
      Exit For
    End If
  Next i
  
End Function
 
Upvote 0
If you want your function to be able to include internal asterisks (wildcards),
Perhaps I'm not understanding your intent, but if your function is going to remove any leading wildcards from your pattern, why would you include leading wildcards in your pattern to start with?
 
Upvote 0
Perhaps I'm not understanding your intent, but if your function is going to remove any leading wildcards from your pattern, why would you include leading wildcards in your pattern to start with?
It is a protection against someone thinking they need to start the pattern with that wildcard in order to find the pattern internal to the text (for example, if the code asked the user, not the programmer, to provide a pattern)... if you left such an asterisk at the beginning of the pattern, it would screw up the search for the relevant part of the pattern as you marched down the characters of the text in the loop.
 
Upvote 0
Jerry, Peter and Rick...

You guys are awesome! I'd never heard of the regular expressions library so this could answer a lot of problems.

In reply to the reasoning for including leading wildcards, the data seems to exhibit variants of patterns - some of them start with one pattern and others with another but they imply different meanings to the code being sought. Unfortunately, I don't have access to the system (or the lunatic) that originated this mess so I can't ask why it was done. I can only attempt to "reverse engineer" the result.
 
Upvote 0
It is a protection against someone thinking they need to start the pattern with that wildcard in order to find the pattern internal to the text (for example, if the code asked the user, not the programmer, to provide a pattern)... if you left such an asterisk at the beginning of the pattern, it would screw up the search for the relevant part of the pattern as you marched down the characters of the text in the loop.
OK, fair enough but any user of the function is ..
a) going to have to know that it uses the Like operator, and
b) have a very good understanding of how the Like operator, and the overall functions works
.. if they are to devise any successful pattern for it.
 
Upvote 0

Forum statistics

Threads
1,215,140
Messages
6,123,269
Members
449,093
Latest member
Vincent Khandagale

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top