Can VBA grab multiple integers from a string?

Sadsmileyface

New Member
Joined
Mar 21, 2013
Messages
11
I have a string which will include text, numbers and punctuation. It's free text so it's never constructed in a consistent fashion. Within that string, there will occur 4x 5-digit integers. Sometimes the integer will be preceded by a space or a symbol (like " 12345" or ":12345" or "-12345" etc) depending on how the user chose to write it. The only thing we know for sure it that 5-digit numbers will appear 4 separate times.


Would it be possible to use VBA to interrogate that string, then identify each of those 4 5-digit integers, then store them in separate columns on the spreadsheet? For simplicity, imagine cells A1:D1 being populated with each of the four integers.


As always, any insights are extremely appreciated.
 

Excel Facts

Do you hate GETPIVOTDATA?
Prevent GETPIVOTDATA. Select inside a PivotTable. In the Analyze tab of the ribbon, open the dropown next to Options and turn it off
You can setup a function in VBA to extract the numbers only, since there are 4x 5-digit numbers, you can then use a mid function and grab 1-5, 6-10, 11-15 & 16-20 from there on.

Code:
Function NumberExtract(txt As String) As String
    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = "\D"
        NumberExtract = .Replace(txt, "")
    End With
End Function

To use:
=NumberExtract(A1)

Where A1 contains the string.
Then manipulate this with MID function.

EDIT: With string in A1 try these manipulated MID functions with NumberExtract:
A2:
=MID(numberextract($A$1),COLUMN()*5-4,5)

Then drag that across to D2.
 
Last edited:
Upvote 0
You can setup a function in VBA to extract the numbers only, since there are 4x 5-digit numbers, you can then use a mid function and grab 1-5, 6-10, 11-15 & 16-20 from there on.

Code:
Function NumberExtract(txt As String) As String
    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = "\D"
        NumberExtract = .Replace(txt, "")
    End With
End Function

To use:
=NumberExtract(A1)

Where A1 contains the string.
Then manipulate this with MID function.

EDIT: With string in A1 try these manipulated MID functions with NumberExtract:
A2:
=MID(numberextract($A$1),COLUMN()*5-4,5)

Then drag that across to D2.

That looks interesting.

I was a bit sloppy when I was describing the string. In addition to the 5-digit integers peppered about, it also occasionally features other numbers... so sometimes you get something like "Entry 1: 12345". So the question also includes, is it possible to intelligently identify those 5-digit numbers while ignoring all other numbers which are not part of a 5 digit set?

I've been mulling over this for ages and keep coming back to the conclusion that Excel just doesn't have this kind of ability.
 
Upvote 0
I think it is possible with regular expressions, i'm not an expert on them so can't get it quite to work, in fact I have the opposite working currently!! What I have tried to write is removing the 5 digits; and I am not sure how to negate it exactly...
 
Upvote 0
Hi

I tweaked a bit tyija's code:

Code:
Function NumberExtract(txt As String) As String
    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = ".*?(?:^|\D)(\d{5})(?!\d)|.*"
        NumberExtract = Trim(.Replace(txt, " $1"))
    End With
End Function
 
Upvote 0
pgc,

That is a brilliant tweak!! I knew it was possible with regexp just didn't know how!! It has also answered the thread I recently put up about extracting 3 digits, so a big thank you.

I'd love to understand fully that .Pattern line you wrote, would you mind explaining it a bit? And also why the " $1" in the 2nd parameter of the replace function?

Thanks again!!
 
Upvote 0
If the OP (and other readers of this thread) are interested, we can also write the NumberExtract function without using Regular Expressions...
Code:
Function NumberExtract(ByVal S As String) As String
  Dim X As Long, Arr As Variant
  For X = 1 To Len(S)
    If Mid(S, X, 1) Like "[!0-9]" Then Mid(S, X) = " "
  Next
  Arr = Split(Application.Trim(S))
  For X = 0 To UBound(Arr)
    If Len(Arr(X)) <> 5 Then Arr(X) = ""
  Next
  NumberExtract = Application.Trim(Join(Arr))
End Function
 
Upvote 0
I'd love to understand fully that .Pattern line you wrote ...

I'll try to explain. Hope I'm clear.

This is a simpler form of the pattern

"(?:^|.*?\D)(\d{5})(?!\d)|.*"

and this is an explanation:

1-
What we want is a group of 5 consecutive digits.
We don't want more than 5 consecutive digits so either they start at the beginning of the string or they are not preceded by another digit

(?:^|.*?\D)

this means either the beginning of the string or a sequence of characters not ending in a digit

2 - we then want a sequence of 5 digits

(\d{5})

3 - After those 5 digits we don't want another digit (we want 5 and only 5 digits)

(?!\d)

this means: not followed by a digit

4 - if there's no match then get all the rest of the string

...|.*


5 - Example:

For ex., for the string "Entry 1: 12345, Entry 2: 67890 End"

First match: "Entry 1: 12345"
Second match: "Entry 2: 67890"
Third match: " End" (since there are no more matching sequences of numbers it gets the rest of the string)

Hope this helps
 
Last edited:
Upvote 0
Hi pgc,

Thanks for this detailed response, took a while to sink in to my head but I think I get it! Will take a while for me to learn how to write it for myself without looking at the notes though I am sure!

What was the $1 in the .Replace function meaning?
 
Upvote 0
What was the $1 in the .Replace function meaning?

The parentheses in a regex have 2 functions:

- group, like

(a|b)(c|d)

which matches ac, ac, bc and bd

- capture submatches

\d{3}(\d{4})\d{2}(\d{3})

This will match 12 consecutive digits like

123456789012

but, on top of that is also getting 2 submatches, the digits in positions 4-7 and 10-12

For ex., in the string:

"The code 125468597584 is valid"

Match: 125468597584
SubMatch1: 4685
SubMatch2: 584

In case you want to group but not capture a submatch you use (?: )

for ex.:

(?:a|c)(\d\d)

With the string

"the code c34 is valid"

Match: c34
Submatch1: 34

Notice that with the (?:a|c) the pattern matched the letter c but did not store it for a submatch.

Finally

With

"(?:^|.*?\D)(\d{5})(?!\d)|.*"

for the string: "Entry 1: 12345, Entry 2: 67890 End"



First match: "Entry 1: 12345", SubMatch 1: "12345"
Second match: "Entry 2: 67890", SubMatch 1: "67890"
Third match: " End" (since there are no more matching sequences of numbers it gets the rest of the string), SubMatch 1: ""

In the Replace, $1 means the first submatch, so the logic in replace with " $1" is

First match: Replace "Entry 1: 12345" with " 12345"
Second match: Replace "Entry 2: 67890" with " 67890"
Third match: Replace " End" with ""

The total result is " 12345 67890"

Hope it's clear
 
Upvote 0

Forum statistics

Threads
1,214,649
Messages
6,120,728
Members
448,987
Latest member
marion_davis

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top