Extracting date from string of text

ESACAWIP

New Member
Joined
Nov 9, 2020
Messages
21
Office Version
  1. 365
Platform
  1. Windows

Data
John exp 052021_46857.pdf
Tom_46854.pdf
Jerry 5-31-2021_35585.pdf
Ford_Exp 05.31.2020_24707.pdf
David exp09302016_14115.pdf
Mike 2.28.2019_12495.pdf
Result:
John 05.20.21
Tom
Jerry 05.31.2021
Ford 05.31.2020
David 09.30.2016
Mike 02.28.2019


Hi, I am having some trouble extracting dates from these strings of text. In the raw data, I am given a mixture of dates written in various formats. Worse, there are numbers that I need to exclude that are non date values such as "46857" in the line "John exp 052021_46857.pdf".


I appreciate if anyone can help!
 

Excel Facts

Whats the difference between CONCAT and CONCATENATE?
The newer CONCAT function can reference a range of cells. =CONCATENATE(A1,A2,A3,A4,A5) becomes =CONCAT(A1:A5)
This might do about half of what you want. I decided to stop at this point, I may pick it back up later and add the date clean up code. It relies on the filename having 20xy somewhere in the name (x and y being digits 0-9). So if you've got dates in other centuries or people are just using the tens and ones places for the year this will be even less helpful.

Rich (BB code):
Option Explicit

Function ExtractNameAndDate(t As String) As String

    Dim FirstName As String, ti As String, k As Integer, yr As Integer, i As Integer
    Dim dtchr As String, dtstr As String
    
    k = 1
    ti = Mid(t, 1, 1)
    Do While UCase(ti) >= "A" And UCase(ti) <= "Z"
        FirstName = FirstName & ti
        k = k + 1
        ti = Mid(t, k, 1)
    Loop
    
    yr = InStr(1, t, "20")
    
    If yr = 0 Or yr > Len(t) - 2 Then
        ExtractNameAndDate = FirstName
        Exit Function
    End If
    
    If Not IsNumeric(Mid(t, yr + 2, 2)) Then
        ExtractNameAndDate = FirstName
        Exit Function
    End If
    
    i = yr - 1
    dtchr = Mid(t, i, 1)
    Do While IsNumeric(dtchr) Or dtchr = "." Or dtchr = "-" Or dtchr = "/"
        i = i - 1
        If i = 0 Then
            ExtractNameAndDate = FirstName
            Exit Function
        End If
        dtchr = Mid(t, i, 1)
    Loop
    i = i + 1
    dtstr = Mid(t, i, yr - i + 4)
    
    ExtractNameAndDate = FirstName & " " & dtstr
    
End Function

MrE20210127.xlsm
AB
1John exp 052021_46857.pdfJohn 052021
2Tom_46854.pdfTom
3Jerry 5-31-2021_35585.pdfJerry 5-31-2021
4Ford_Exp 05.31.2020_24707.pdfFord 05.31.2020
5David exp09302016_14115.pdfDavid 09302016
6Mike 2.28.2019_12495.pdfMike 2.28.2019
Sheet2
Cell Formulas
RangeFormula
B1:B6B1=ExtractNameAndDate(A1)
 
Upvote 0
Using Regular Expression

VBA Code:
Function ExtractNameDate(cell As Range) As String

Dim Element As Variant
Dim Name$, DT$, Pattern$, ArryPattern$()
Dim StrCapture As Object

Set RegEx = CreateObject("VBScript.RegExp")

Pattern = "\d{6},\d{8},\d?\d[(\-)|(\.)]\d{2}[(\-)|(\.)]\d{4}"
ArryPattern = Split(Pattern, ",")

With RegEx
    .Global = True
    .Pattern = "^[a-zA-Z]+"
End With
Name = RegEx.Execute(cell)(0)

For Each Element In ArryPattern
    With RegEx
        .Global = True
        .Pattern = Element
    End With
    If RegEx.Test(cell) Then
        DT = RegEx.Execute(cell)(0)
    End If
Next
ExtractNameDate = Name & " " & DT

End Function
 
Upvote 0
DataName.1Date
John exp 052021_46857.pdfJohn052021
Tom_46854.pdfTom
Jerry 5-31-2021_35585.pdfJerry5-31-2021
Ford_Exp 05.31.2020_24707.pdfFord05.31.2020
David exp09302016_14115.pdfDavid09302016
Mike 2.28.2019_12495.pdfMike2.28.2019

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    LC = Table.TransformColumns(Source,{{"Data", Text.Lower, type text}}),
    TBD = Table.AddColumn(LC, "Name", each Text.BeforeDelimiter([Data], "_", {0, RelativePosition.FromEnd}), type text),
    SplitAny = Table.SplitColumn(TBD, "Name", Splitter.SplitTextByAnyDelimiter({" ","exp","_"}, QuoteStyle.Csv)),
    TCC = Table.CombineColumns(SplitAny,{"Name.2", "Name.3", "Name.4"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Date"),
    Proper = Table.TransformColumns(TCC,{{"Name.1", Text.Proper, type text}}),
    RC = Table.RemoveColumns(Proper,{"Data"})
in
    RC
 
Upvote 0
or a bit modified
DataData.1Date
John exp 052021_46857.pdfJohn01/05/2021
Tom_46854.pdfTom
Jerry 5-31-2021_35585.pdfJerry31/05/2021
Ford_Exp 05.31.2020_24707.pdfFord31/05/2020
David exp09302016_14115.pdfDavid30/09/2016
Mike 2.28.2019_12495.pdfMike28/02/2019

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    LC = Table.TransformColumns(Source,{{"Data", Text.Lower, type text}}),
    TBD = Table.TransformColumns(LC, {{"Data", each Text.BeforeDelimiter(_, "_", {0, RelativePosition.FromEnd}), type text}}),
    SplitAny = Table.SplitColumn(TBD, "Data", Splitter.SplitTextByAnyDelimiter({" ","_","exp"}, QuoteStyle.Csv)),
    Proper = Table.TransformColumns(SplitAny,{{"Data.1", Text.Proper, type text}}),
    TCC = Table.CombineColumns(Proper,{"Data.2", "Data.3", "Data.4"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Mrg"),
    IF = Table.AddColumn(TCC, "Date", each if [Mrg] = "" then null else if not Text.StartsWith([Mrg], "0") then "0" & [Mrg] else if Text.Length([Mrg]) < 8 then Text.Insert([Mrg], 2, "/01/") else if Text.Length([Mrg]) = 8 then Text.Insert(Text.Insert([Mrg],4 , "/"), 2, "/") else [Mrg] ),
    DateUS = Table.TransformColumnTypes(IF, {{"Date", type date}}, "en-US"),
    RC = Table.RemoveColumns(DateUS,{"Mrg"}) 

in
    RC
 
Upvote 0
or a bit modified
DataData.1Date
John exp 052021_46857.pdfJohn01/05/2021
Tom_46854.pdfTom
Jerry 5-31-2021_35585.pdfJerry31/05/2021
Ford_Exp 05.31.2020_24707.pdfFord31/05/2020
David exp09302016_14115.pdfDavid30/09/2016
Mike 2.28.2019_12495.pdfMike28/02/2019

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    LC = Table.TransformColumns(Source,{{"Data", Text.Lower, type text}}),
    TBD = Table.TransformColumns(LC, {{"Data", each Text.BeforeDelimiter(_, "_", {0, RelativePosition.FromEnd}), type text}}),
    SplitAny = Table.SplitColumn(TBD, "Data", Splitter.SplitTextByAnyDelimiter({" ","_","exp"}, QuoteStyle.Csv)),
    Proper = Table.TransformColumns(SplitAny,{{"Data.1", Text.Proper, type text}}),
    TCC = Table.CombineColumns(Proper,{"Data.2", "Data.3", "Data.4"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Mrg"),
    IF = Table.AddColumn(TCC, "Date", each if [Mrg] = "" then null else if not Text.StartsWith([Mrg], "0") then "0" & [Mrg] else if Text.Length([Mrg]) < 8 then Text.Insert([Mrg], 2, "/01/") else if Text.Length([Mrg]) = 8 then Text.Insert(Text.Insert([Mrg],4 , "/"), 2, "/") else [Mrg] ),
    DateUS = Table.TransformColumnTypes(IF, {{"Date", type date}}, "en-US"),
    RC = Table.RemoveColumns(DateUS,{"Mrg"})

in
    RC
Query doesn't return a date if the month has two digits with the first being a 1 instead of 0. For example if David's file name is changed to "David exp12302016_14115.pdf". Can it be modified to account for Oct-Dec months?
 
Upvote 0
it works with given example (post#1) which should be representative
DataData.1Date
John exp 052021_46857.pdfJohn01/05/2021
Tom_46854.pdfTom
Jerry 5-31-2021_35585.pdfJerry31/05/2021
Ford_Exp 05.31.2020_24707.pdfFord31/05/2020
David exp09302016_14115.pdfDavid30/09/2016
Mike 12.28.2019_12495.pdfMike28/12/2019
Mike 2.28.2019_12495.pdfMike28/02/2019
Joe exp 62021_46857.pdfJoe01/06/2021
exp 82021_46857.pdf01/08/2021

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    LC = Table.TransformColumns(Source,{{"Data", Text.Lower, type text}}),
    TBD = Table.TransformColumns(LC, {{"Data", each Text.BeforeDelimiter(_, "_", {0, RelativePosition.FromEnd}), type text}}),
    SplitAny = Table.SplitColumn(TBD, "Data", Splitter.SplitTextByAnyDelimiter({" ","_","exp"}, QuoteStyle.Csv)),
    Proper = Table.TransformColumns(SplitAny,{{"Data.1", Text.Proper, type text}}),
    TCC = Table.CombineColumns(Proper,{"Data.2", "Data.3", "Data.4"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Mrg"),
    TS = Table.AddColumn(TCC, "TS", each if [Mrg] = "" then null else if Text.Length([Mrg]) < 6 then "0" & [Mrg] else Text.Select([Mrg], {"0".."9"})),
    IF = Table.AddColumn(TS, "IF", each if [Mrg] = "" then null else if Text.Length([TS]) <> 8 and not Text.StartsWith([TS], "0") then "0" & [TS] else if Text.Length([TS]) <= 6 then Text.Insert([TS], 2, "/01/") else [TS]),
    Date = Table.AddColumn(IF, "Date", each if [IF] = null then null else if Text.Length([IF]) > 8 then [IF] else Text.Insert(Text.Insert([IF],4 , "/"), 2, "/")),
    DateUS = Table.TransformColumnTypes(Date, {{"Date", type date}}, "en-US"),
    TSC = Table.SelectColumns(DateUS,{"Data.1", "Date"})
in
    TSC
 
Last edited:
Upvote 0

Forum statistics

Threads
1,214,959
Messages
6,122,476
Members
449,087
Latest member
RExcelSearch

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top