Find and extract sequence within a text string

drousyblinx

New Member
Joined
Oct 7, 2015
Messages
13
Hi,

I have a long list of file names and within them sits a five digit sequence of numbers and letters. The problem is, the location of that sequence within the file name, differs for every file.

For example:
1) ThisisafilenameAB123.doc - sequence = AB123
2) ThisCD453isafilename.doc - sequence = CD453
3) ThisisEF713afilename.doc - sequence = EF713

I would be extremely grateful if anyone could provide me a formula that extracts these five digit sequences.
 

Excel Facts

Back into an answer in Excel
Use Data, What-If Analysis, Goal Seek to find the correct input cell value to reach a desired result
Hi,

I have a long list of file names and within them sits a five digit sequence of numbers and letters. The problem is, the location of that sequence within the file name, differs for every file.

For example:
1) ThisisafilenameAB123.doc - sequence = AB123
2) ThisCD453isafilename.doc - sequence = CD453
3) ThisisEF713afilename.doc - sequence = EF713

I would be extremely grateful if anyone could provide me a formula that extracts these five digit sequences.
Is your sequence always two uppercase letters followed by three digits?''

Also, will your filenames ever have digits in them besides the three digits in your sequence?
 
Last edited:
Upvote 0
Here's a UDF you can try. Assumes your 5 character sequence always begins with at least one capital letter.

To install the UDF:
1. With your workbook active press Alt and F11 keys. This will open the VBE window.
2. In the project tree on the left of the VBE window, find your project and click on it.
3. On the VBE menu: Insert>Module
4. Copy the code below from your browser window and paste it into the white space in the VBE window.
5. Close the VBE window and Save the workbook. If you are using Excel 2007 or a later version do a SaveAs and save it as a macro-enabled workbook (.xlsm file extension).
6. Press Alt+F8 keys to run the code
7. Make sure you have enabled macros whenever you open the file or the code will not run.
Code:
Function FiveCharacters(S As String) As String
With CreateObject("VBScript.Regexp")
       .Global = True
       .Pattern = "[A-Z]{1,3}\d{2,4}"
       If .test(S) Then
            FiveCharacters = .Execute(S)(0)
       Else
              FiveCharacters = ""
       End If
End With
End Function
Here's an example of use of the UDF:
Excel Workbook
AB
1ThisisafilenameAB123.docAB123
2ThisCD453isafilename.docCD453
3ThisisEF713afilename.docEF713
4ThisisE9713afilename.docE9713
Sheet7
 
Upvote 0
Is your sequence always two uppercase letters followed by three digits?''

Also, will your filenames ever have digits in them besides the three digits in your sequence?

Most of them are yes. I am looking to tackle these first but if you can assist with other sequences that will be great. (One other scenario is two uppercase letters + two digits).

Most file names do not have other digits in them, but there are a few which do. If it makes it any easier, a formula for those which do not have other digits, will still be a big help.
 
Upvote 0
Most of them are yes. I am looking to tackle these first but if you can assist with other sequences that will be great. (One other scenario is two uppercase letters + two digits).

Most file names do not have other digits in them, but there are a few which do. If it makes it any easier, a formula for those which do not have other digits, will still be a big help.
Did you try the UDF in post #3?
 
Upvote 0
Most of them are yes. I am looking to tackle these first but if you can assist with other sequences that will be great. (One other scenario is two uppercase letters + two digits).
Your answer muddied things up a little bit for me. When you say "most of them", does that mean some may have only one upper case with four digits or three upper case letters with two digits? Or were you referring to the what you wrote in parentheses as the exception? If the latter, can the two scenarios (two upper case letters followed by three digits and two upper case letters followed by two digits) both appear in your data? If so, do you want them both pulled out or only the five-character one? Please try and be clear on your scenarios... remember we here on the forum have no idea what your data looks like, we can only go on what you tell us... even though something is obvious to you, you cannot expect us to "figure it out".
 
Upvote 0
Wow thanks so much Joe! Works exactly how I wanted.

Sorry your reply came through as I was typing my initial response to Rick.
 
Upvote 0
Code:
Function FiveCharacters(S As String) As String
With CreateObject("VBScript.Regexp")
       .Global = True
       .Pattern = "[A-Z]{1,3}\d{2,4}"
       If .test(S) Then
            FiveCharacters = .Execute(S)(0)
       Else
              FiveCharacters = ""
       End If
End With
End Function
Probably immaterial for the OP's usage, but your UDF can return up to 7 characters. For example, feed this into your UDF...

ThisABC1234filenameXY123.doc

and it returns this...

ABC1234
 
Upvote 0
Probably immaterial for the OP's usage, but your UDF can return up to 7 characters. For example, feed this into your UDF...

ThisABC1234filenameXY123.doc

and it returns this...

ABC1234


Thanks Rick! I'm really impressed with the speed of everyone's responses. Wish I knew about this forum sooner.
 
Upvote 0
Thanks Rick! I'm really impressed with the speed of everyone's responses. Wish I knew about this forum sooner.
You are welcome. Just so you know, the UDF can also return three characters (a letter and two digits)...

ThisA12isafilename.doc

Any chance you can read Message #6 and try and clarify the limits for your scenarios for me?
 
Upvote 0

Forum statistics

Threads
1,215,603
Messages
6,125,784
Members
449,259
Latest member
rehanahmadawan

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top