Pattern extraction UDF -- REGEX

ChrisOswald

Active Member
Joined
Jan 19, 2010
Messages
454
Hi,

I'm needing to extract from an input string a substring that fits a certain pattern. The pattern match I'm looking for is:

1) The substring is 5 characters long
2) The characters around the substring need to be non-alphanumeric (to bias the pattern match fails to false negatives)
3) The pattern is [1-2][0-9A-z][0-9A-z][0-9A-z][0-9A-z]
4) The string should be search from "Back to Front", and if more than 1 possibility is found the hindmost should be returned.
5) There is never more than 2 characters in the substring that are letters. (this is the bit I can't figure out without testing all 11 possible patterns, which is slightly less than elegant)
6) If no pattern match is found, the UDF returns ""

Thanks,
 

Excel Facts

Save Often
If you start asking yourself if now is a good time to save your Excel workbook, the answer is Yes
Hi Chris,

Would this work?

Code:
Function RegEx(strIn As String) As String
    Dim strInR As String
    Dim lngPos As Long, lngPosZ As Long
    Dim lngChar As Long, lngLtrs As Long
    On Error Resume Next
    strInR = reverse(strIn)
    lngPosZ = Len(strInR) - 4
    For lngPos = 1 To lngPosZ
        If Mid(strInR, lngPos, 5) Like _
            "[0-9A-z][0-9A-z][0-9A-z][0-9A-z][1-2]" Then
            If (lngPos = 1 Or _
                Mid(strInR, lngPos - 1, 1) Like "[!0-9A-z]") And _
                (lngPos = lngPosZ Or _
                Mid(strInR, lngPos + 5, 1) Like "[!0-9A-z]") Then
                    lngLtrs = 0
                    For lngChar = lngPos To lngPos + 4
                        If Mid(strInR, lngChar, 1) Like "[A-z]" Then _
                            lngLtrs = lngLtrs + 1
                    Next lngChar
                    If lngLtrs < 3 Then
                        RegEx = reverse(Mid(strInR, lngPos, 5))
                        Exit Function
                    End If
            End If
        End If
    Next lngPos
    RegEx = vbNullString
End Function
 
Private Function reverse(strIn As String) As String
    Dim strOut As String
    Dim lngPos As Long
    strOut = vbNullString
    For lngPos = Len(strIn) To 1 Step -1
        strOut = strOut + Mid(strIn, lngPos, 1)
    Next lngPos
    reverse = strOut
End Function
 
Upvote 0
Is there a way of adapting this for other types

I have an approaching project at work, a CAD log will be extracted and in that log is a free text entry

It will say "CRM", within the text but it would more likly be "CRM ALS" or "CRM BLS" (I'll need to be able to adjust for those). There is no way of fixing the actual location

once it has been identified I then need to search back through the log to identify the time stamp 00:00:00 plus 15? characters before (the time will always be exact)

using the identified time stamp from the string i would then like to slice out say 50 characters into a fresh cell, from the new start point (will also need to be able to cope if the length from beginning text is near the start of the record data)

I'm sure it can be coded, but just too many things for me to appreciate in a single hit.

I can't place a proper example as the data is sensitive and may be personal, so have to avoid breaching data protection act.

I'll try and find a record and fake the output so it can be seen
 
Upvote 0
Here's my need if anyone thinks they can help

AJAX log entry at 15/12/2010 08:03:18 by ajax1828 from extension 21333 4A06 advised of job lmj
message sent to resource to 3C03 at 15/12/2010 08:08:35 3E51 has booked Wednesday/at locale at 08:08
AJAX<b>log entry at <font color = red>15/12/2010 08:03:18</font> by ajax1828 from extension 21333 4A06 <font color = blue>CRM ALS</font> ETA 43221 ........ </b>

delivered as a string

I'm looking for the BLUE Item in the list, when found I need to track back to the RED time, and then pull the BOLD section, probably 150 --> 200 characters, the rest of the log will be not referenced.

I have to be able to extract the date and time for the CRM entry, thousands per day to look at

maybe an array can handle it, i don't know enough
 
Upvote 0
Hi Chris

Try:

Code:
Function GetStr(s As String) As String
Dim rxMatches As Object
 
With CreateObject("VBScript.RegExp")
    .Pattern = "\b(?!(\d*[A-Z]){3})[1-2][0-9A-Z][0-9A-Z][0-9A-z][0-9A-Z]\b"
    .Global = True
    .IgnoreCase = True
    Set rxMatches = .Execute(s)
    If rxMatches.Count > 0 Then GetStr = rxMatches(rxMatches.Count - 1).Value
End With
End Function
 
Upvote 0
This seems to work if the Criteria are always "ALS " & "CRM".
If it works for you, you could use it for the basis of an outer loop through all of your data.
Code:
[COLOR="Navy"]Sub[/COLOR] MG26Mar44
[COLOR="Navy"]Dim[/COLOR] Ray [COLOR="Navy"]As[/COLOR] Variant, n [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long[/COLOR]
[COLOR="Navy"]Dim[/COLOR] fd [COLOR="Navy"]As[/COLOR] Boolean
Ray = Split(Range("A1"), " ")
    [COLOR="Navy"]For[/COLOR] n = UBound(Ray) To 1 [COLOR="Navy"]Step[/COLOR] -1
        [COLOR="Navy"]If[/COLOR] Ray(n) = "ALS" And Ray(n - 1) = "CRM" [COLOR="Navy"]Then[/COLOR]
            fd = True
        [COLOR="Navy"]End[/COLOR] If
        [COLOR="Navy"]If[/COLOR] fd = True And IsDate(Ray(n)) And IsDate(Ray(n - 1)) [COLOR="Navy"]Then[/COLOR]
            MsgBox Ray(n - 1) & " " & Ray(n)
            fd = False
        [COLOR="Navy"]End[/COLOR] If
    [COLOR="Navy"]Next[/COLOR] n
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]
Regards Mick
 
Upvote 0
Mick, that's very good.

I'm just trying to construct the for next loop to walk down the page, and also then replace the message box with a pasted value in the right row.

This text "log entry at" will always appear to the left of the number and i want to pick that up and then copy from where log is used, probable better if i just use 30 words rather than 100 plus characters, i'm going to keep plugging at walking the data first
 
Upvote 0
Code:
Sub MG26Mar44()
    Dim Ray As Variant, n As Long
    Dim fd As Boolean
    Dim walking
    Dim LastRow
    LastRow = Cells.Find(What:="*", After:=[A1], _
                         SearchOrder:=xlByRows, _
                         SearchDirection:=xlPrevious).Row

    For walking = 1 To LastRow Step 1
        Ray = Split(Range("A" & walking), " ")
        For n = UBound(Ray) To 1 Step -1
            If Ray(n - 1) = "CRM" Then
                'If Ray(n) = "ALS" And Ray(n - 1) = "CRM" Then
                fd = True
            End If
            If fd = True And IsDate(Ray(n)) And IsDate(Ray(n - 1)) Then
                Range("B" & walking) = Ray(n - 1) & " " & Ray(n)
                'MsgBox Ray(n - 1) & " " & Ray(n)
                fd = False
            End If
        Next n
    Next walking
End Sub

Trimmed it for CRM only at this time
 
Upvote 0
anyone help on this bit

Code:
                Range("G" & walking) = Ray(n - 4) & " " & Ray(n - 3) & " " & Ray(n - 2) & " " & Ray(n - 1) & " " & Ray(n) _
                                     & " " & Ray(n + 1) & " " & Ray(n + 2) & " " & Ray(n + 3) & " " & Ray(n + 4) & " " & Ray(n + 5) & " " & Ray(n + 6) _
                                     & " " & Ray(n + 7) & " " & Ray(n + 8) & " " & Ray(n + 9) & " " & Ray(n + 10) & " " & Ray(n + 11) & " " & Ray(n + 12) _
                                     & " " & Ray(n + 13) & " " & Ray(n + 14) & " " & Ray(n + 15) & " " & Ray(n + 16) & " " & Ray(n + 17) & " " & Ray(n + 18)

my clumsy way of selecting the first word I want and then need to select the next 50ish
 
Upvote 0
Mole,
Since each record in your log begins with "log entry" and the time, I think i'd simplify the task by first splitting the file out into log entry records. Then you can take the ones that have CRM ALS (or whatever it is you are looking for). The time will always be at the beginning of the record, so its then very easy to find. This should be fairly accurate, perhaps one glitch would be if the words log entry appeared in a log entry itself - a further refinement would be to use the log entry + log entry time as the pattern for identifying each new record.
ξ
 
Upvote 0

Forum statistics

Threads
1,224,586
Messages
6,179,711
Members
452,939
Latest member
WCrawford

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top