Regular Expression (RegEx) User-Defined Function (UDF) to Extract Filenames from Network Path Strings

lneidorf

Board Regular
Joined
May 20, 2004
Messages
97
Office Version
  1. 365
Platform
  1. Windows
Hi there.

I've got a weekly work task involving the extraction of filenames from 10,000+ unwieldy and inconsistently-formatted network filepaths. Adding complication is the fact that these are multivalue: the same cells contain multiple filepaths\filenames separated by semicolons. I need to extract them all and have them display in the same cell separated by semicolons.

The complexity of the data makes a front-end formula impracticable. And so I've been trying to create a User-Defined Function (UDF) in order to use RegEx to accomplish this. Unfortunately, I've not been successful.

I'm not certain whether the issue is with my RegEx format, VBA, or both. Feedback on both fronts would be appreciated! I had been using code found here (VBA Express : Multiple Apps - UDF wrappers for RegExp Find and Replace methods) that creates a UDF allowing the user to specify the RegEx pattern on the front-end as an argument of the Function itself. This has worked wonderfully for other applications, but not here. So I've tried implementing this as a new UDF, but have not had success.


Ok, so here's what I've got:

SAMPLE DATA (contained in cell A2):
\\ln2-pv-nas1\AX_CD\MILVCMH-148_160\Homedirs\jeff.smith\Replication\Documents\Board meetings\2007 Meetings\Oct 12\sample_filename_1.pdf;\\ln2-pv-nas1\AX_CD\MILVCMH-78_90\Data\Files\Board meetings\2007 Meetings\sample_filename_2.xlsx;\\ln2-pv-nas1\AX_CD\MILVCMH-78_90\Users\jeff.smith\Documents\OJSC\meetings\2007\sample_filename_3.doc

DESIRED OUTPUT:
sample_filename_1.pdf; sample_filename_2.xlsx; sample_filename_3.doc

REGEX PATTERN (very possibly incorrect):
[^\\]*$


Keep in mind that I want to pick up ALL matching patterns in my string, and not just one. I've experimented with both the VBA code ".Global = True" as well as the RegEx "/g" operator to no avail.

Any thoughts would be most appreciated!
 

Excel Facts

Easy bullets in Excel
If you have a numeric keypad, press Alt+7 on numeric keypad to type a bullet in Excel.
This does it without a regex:
Code:
Public Function GetFileNames(filePaths As String)

    Dim files As Variant, i As Integer
    
    GetFileNames = ""
    files = Split(filePaths, ";")
    For i = 0 To UBound(files)
        GetFileNames = GetFileNames & Mid(files(i), InStrRev(files(i), "\") + 1) & "; "
    Next
    GetFileNames = Left(GetFileNames, Len(GetFileNames) - 2)

End Function
 
Upvote 0
Similar idea to John_w's function...
Code:
Function GetFileNames(S As String) As String
  Dim X As Long, Individuals() As String, Temp() As String
  Individuals = Split(S, ";")
  For X = 0 To UBound(Individuals)
    Temp = Split(Individuals(X), "\")
    GetFileNames = GetFileNames & "; " & Temp(UBound(Temp))
  Next
  GetFileNames = Mid(GetFileNames, 3)
End Function
 
Upvote 0
Rick, John_w:

Thanks to you both! Fantastic! They work perfectly for me. I'll spend some time studying to see if I can figure out what's happening with the code.

I am still interested in seeing if anyone can make any suggestions on a viable RegEx approach as that would give me some flexibility around matching different patterns. Also, it's possible that the filenames could occur without being immediately followed by a semi-colon, in which case these approaches may not work perfectly.
 
Upvote 0
Also, it's possible that the filenames could occur without being immediately followed by a semi-colon, in which case these approaches may not work perfectly.
The modification to my function should work correctly whether there is a semi-colon between the network file paths or not...
Code:
Function GetFileNames(S As String) As String
  Dim X As Long, Individuals() As String, Temp() As String
  Individuals = Split(S, "\\")
  For X = 1 To UBound(Individuals)
    Temp = Split(Individuals(X), "\")
    GetFileNames = GetFileNames & "; " & Temp(UBound(Temp))
  Next
  GetFileNames = Replace(Mid(Replace(GetFileNames, ";;", ";"), 3), " ;", ";")
End Function
 
Upvote 0
I am still interested in seeing if anyone can make any suggestions on a viable RegEx approach as that would give me some flexibility around matching different patterns. Also, it's possible that the filenames could occur without being immediately followed by a semi-colon, in which case these approaches may not work perfectly.

If the filenames are not immediately followed by a semi-colon, what character(s) immediately follow them? How do you know which is the last character of the filename?

Can you give examples different from the one you posted where looking for the ";" at the end of the address does not work?
 
Upvote 0
If the filenames are not immediately followed by a semi-colon, what character(s) immediately follow them? How do you know which is the last character of the filename?
By the way the OP wrote his new request, I assumed the file name were followed by either a semi-colon, a space or no character at all. Since these were network file paths, I assumed they all began with \\ and used that as the delimiter.
 
Upvote 0
By the way the OP wrote his new request, I assumed the file name were followed by either a semi-colon, a space or no character at all. Since these were network file paths, I assumed they all began with \\ and used that as the delimiter.

I agree, that's probably how it is. Just making sure the paths are not in the middle of some other text which may also have punctuation marks.
 
Upvote 0
Try this UDF with a regular expression:

Code:
Function GetFileName(s As String) As String
s = Replace(s, "\", "ß") & "ßß"
    With CreateObject("VBScript.Regexp")
        .Global = True
        .Pattern = "ß([^ß]+?)[ß]{2}|.+?"
        If .test(s) Then GetFileName = .Replace(s, "$1")
    End With
End Function
 
Upvote 0

Forum statistics

Threads
1,223,445
Messages
6,172,177
Members
452,446
Latest member
walkman99

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top