Add column to extract text after multiple possible delimiters

btwice

New Member
Joined
Dec 16, 2014
Messages
17
Office Version
  1. 365
Platform
  1. Windows
Hi,

Currently trying to pull some text out of another column in powerquery that has file names in it. The problem is these file names are not consistent and have multiple beginning 'codes' I need to look for to use as the starting point for extracting the text. An example of this is a file name contains "_35" so I would want to pull 8 characters after the "_", but it also contains "_" in other places in the file name, and other files may look like this " 35" instead of using "_". 35 is also not the only identifier, as there can be 31, 45, 46, and a few others. I actually can make this work with if statements after the query populates the data, but there are 50k+ rows so it bogs down Excel. Anyone have ideas on how to code this in powerquery as an added column? Was thinking a starting point would be making a table with the identifiers to use as a range. See below for file name examples, let me know if I need to clarify anything, thanks.

DI_44001147.pdf
DI_43979014.pdf
EB 35506863.pdf
MI 31607812.pdf
 

Excel Facts

Fastest way to copy a worksheet?
Hold down the Ctrl key while dragging tab for Sheet1 to the right. Excel will make a copy of the worksheet.
Maybe do a Replace Values and replace the space " " with an underscore "_".

You could also replace the ".pdf" with null and get the right 8 characters so that you wouldn't have to worry about the "_" or the " "
 
Upvote 0
So the replacing " " with "_" is great, that situates everything how I need, however those file name examples aren't full file names, should have posted those but didn't think of using .pdf to replace haha. So file names are actually like the below.

DI_31674137_127293026_2AA_ConPkgComp.pdf
DI Lafayette_45678921_125356078_Vendor_PreField.pdf

Now that i have everything with "_3" or "_4", what would be the code to pull 8 characters to the right for delimiters of "_3" OR "_4"? Can I do it that way?
Note the current formula I'm using to do this is in a cell after the query runs:

=ifs(isnumber(search("_4",[@[File Path]])),MID([@[File Path]],SEARCH("_4",[@[File Path]]),9),isnumber(search("_3",[@[File Path]])),MID([@[File Path]],SEARCH("_3",[@[File Path]]),9))
 
Last edited:
Upvote 0
Okay if it is always 8 digits, you can use this custom column in Power Query, and then you can Extract the values.

Power Query:
List.Select(Text.Split(
    Text.Select([Column1],{"0".."9","_"}),"_"),(x)
    =>Text.Length(x)=8)


Got this from here.
 
Upvote 0

Forum statistics

Threads
1,216,126
Messages
6,129,007
Members
449,480
Latest member
yesitisasport

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top