Extract Text String with PQ

AllisterB

Board Regular
Joined
Feb 22, 2019
Messages
120
Office Version
  1. 365
Platform
  1. Windows
I have a column that includes text consisting of 7 consecutive non-blank characters in the format AAANNNN (Alpha A-Z and Numeric 0-9).
I want to extract this field the 7 characters and place the extracted text string in a new Column. If the AAANNNN pattern does not exist on a rowthen no output is made.

The current M code does not do this. I have also tried an approach using by "Add Columns By Example".

Does anyone have any ideas?

Thank You



By Example (revised.xlsx
FGH
2ID
3(ABC:FGH4567)
4(ABC:HJK4567
5ABC:GH3456
6ABC:A KL5678
7:YUI4567
8ABC:HJKO1234
9GHK1234
10:GHY5672))
11:GHY567))
12
13
14
15
16IDCustom.1Comment
17(ABC:FGH4567)should be FGH4567
18(ABC:HJK4567HJK4567OK
19ABC:GH3456Correct treatment as not AAA1234 format
20ABC:A KL5678Correct Output as it contains a blank
21:YUI4567YUI4567OK
22ABC:HJKO1234Correct treatment as not AAA1234 format
23GHK1234GHK1234OK
24:GHY5672))should be GHY5672
25:GHY567))Correct treatment as not AAA1234 format
Sheet1


Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom.1", each if Text.Length(List.Last(Text.Split([ID], ":")))=7 then  List.Last(Text.Split([ID], ":")) else null)
in
    #"Added Custom"
 

Excel Facts

What do {} around a formula in the formula bar mean?
{Formula} means the formula was entered using Ctrl+Shift+Enter signifying an old-style array formula.
maybe
IDResult
(ABC:FGH4567)FGH4567
(ABC:HJK4567HJK4567
ABC:GH3456
ABC:A KL5678
:YUI4567YUI4567
ABC:HJKO1234
GHK1234GHK1234
:GHY5672))GHY5672
:GHY567))

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    Trim = Table.AddColumn(Source, "Trim", each Text.Trim(Text.Trim([ID],")"),":")),
    IfSpace = Table.AddColumn(Trim, "IF", each if not Text.Contains([Trim], " ") then [Trim] else null),
    ETBD = Table.TransformColumns(IfSpace, {{"IF", each Text.AfterDelimiter(_, ":"), type text}}),
    If = Table.AddColumn(ETBD, "Result", each if Text.Length([IF]) = 7 then [IF] else if Text.Length([Trim]) = 7 then [Trim] else null),
    TSC = Table.SelectColumns(If,{"Result"})
in
    TSC
 
Upvote 0
Beat me to it Sandy.

Book1
A
1Custom.4
2FGH4567
3HJK4567
4
5
6YUI4567
7
8GHK1234
9GHY5672
10
Sheet3


Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    Split = Table.SplitColumn(Source, "ID", Splitter.SplitTextByDelimiter(":", QuoteStyle.Csv), {"ID.1", "ID.2"}),
    fixSplit = Table.AddColumn(Split, "Custom", each if [ID.2] is null then [ID.1] else [ID.2]),
    ReplaceParen = Table.ReplaceValue(fixSplit,")","",Replacer.ReplaceText,{"Custom"}),
    RC0 = Table.RemoveColumns(ReplaceParen,{"ID.1", "ID.2"}),
    Trim = Table.TransformColumns(RC0,{{"Custom", Text.Trim, type text}}),
    SplitFirst = Table.SplitColumn(Trim, "Custom", Splitter.SplitTextByPositions({0, 3}, false), {"Custom.1", "Custom.2"}),
    TF = Table.AddColumn(SplitFirst, "Custom", each List.ContainsAll({"A".."Z"}, Text.ToList([Custom.1])) and List.Count(Text.ToList([Custom.1]))=3),
    Number = Table.AddColumn(TF, "Custom.3", each if Text.Length([Custom.2])=4 then try Int64.From([Custom.2]) otherwise null else null),
    Combo = Table.AddColumn(Number, "Custom.4", each if([Custom.3]<>null and [Custom]) then [Custom.1] & Text.From([Custom.2]) else null),
    RC1 = Table.RemoveColumns(Combo,{"Custom.1", "Custom.2", "Custom", "Custom.3"})
in
    RC1
 
Upvote 0
a little correction for post#2 because of string ABCD123 len = 7 but format should be 3|4 not 4|3 - example from post#1 doesn't contain such a case so example is not representative
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
    Trim = Table.AddColumn(Source, "Trim", each Text.Trim(Text.Trim([ID],")"),":")),
    IfSpace = Table.AddColumn(Trim, "IF", each if not Text.Contains([Trim], " ") then [Trim] else null),
    ETBD = Table.TransformColumns(IfSpace, {{"IF", each Text.AfterDelimiter(_, ":"), type text}}),
    If = Table.AddColumn(ETBD, "Result", each if Text.Length(Text.Select([IF],{"A".."Z"})) = 3 and Text.Length(Text.Select([IF],{"0".."9"})) = 4 then [IF] else if Text.Length(Text.Select([Trim],{"A".."Z"})) = 3 and Text.Length(Text.Select([Trim],{"0".."9"})) = 4 then [Trim] else null),
    TSC = Table.SelectColumns(If,{"Result"})
in
    TSC
IDResultResult
(ABC:FGH4567)FGH4567FGH4567
(ABC:HJK4567HJK4567HJK4567
ABC:GH3456
ABC:A KL5678
:YUI4567YUI4567YUI4567
ABC:HJKO1234
GHK1234GHK1234GHK1234
:GHY5672))GHY5672GHY5672
:GHY567))
ABCD123ABCD123
 
Last edited:
Upvote 0
Solution
Thank you @sandy666

I have just sen that my actual data has one instance of​

ABCHJFK1234)*B - No colon and trailing characters.

This string should output as JFK1234 as from the left there is a string in the field that contains 7consecutive characters like this AAANNN.

To clarify what I want - want the first instance of AAANNN that occurs within the string in the field irrespective of what comes before or after that occurrence of AAANNNN eg ABCHJFK1234)*B would output as JFK1234.

I have tested it using your code but I don't think it handles the instance above - Can PQ do this?

Allister
little correction for post#2 because of string ABCD123 len = 7 but format should be 3|4 not 4|3 - example from post#1 doesn't contain such a case so example is not representative
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
    Trim = Table.AddColumn(Source, "Trim", each Text.Trim(Text.Trim([ID],")"),":")),
    IfSpace = Table.AddColumn(Trim, "IF", each if not Text.Contains([Trim], " ") then [Trim] else null),
    ETBD = Table.TransformColumns(IfSpace, {{"IF", each Text.AfterDelimiter(_, ":"), type text}}),
    If = Table.AddColumn(ETBD, "Result", each if Text.Length(Text.Select([IF],{"A".."Z"})) = 3 and Text.Length(Text.Select([IF],{"0".."9"})) = 4 then [IF] else if Text.Length(Text.Select([Trim],{"A".."Z"})) = 3 and Text.Length(Text.Select([Trim],{"0".."9"})) = 4 then [Trim] else null),
    TSC = Table.SelectColumns(If,{"Result"})
in
    TSC
IDResultResult
(ABC:FGH4567)FGH4567FGH4567
(ABC:HJK4567HJK4567HJK4567
ABC:GH3456
ABC:A KL5678
:YUI4567YUI4567YUI4567
ABC:HJKO1234
GHK1234GHK1234GHK1234
:GHY5672))GHY5672GHY5672
:GHY567))
ABCD123ABCD123
[/RAN
 
Upvote 0
Column1Column1
ABCHJFK1234)*BJFK1234

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    ETBD = Table.TransformColumns(Source, {{"Column1", each Text.BeforeDelimiter(_, ")"), type text}}),
    TE = Table.TransformColumns(ETBD, {{"Column1", each Text.End(_, 7), type text}})
in
    TE
 
Upvote 0
Please try

Book1
ABCD
1IDIDResult
2(ABC:FGH4567)(ABC:FGH4567)FGH4567
3(ABC:HJK4567(ABC:HJK4567HJK4567
4ABC:GH3456ABC:GH3456
5ABC:A KL5678ABC:A KL5678
6:YUI4567:YUI4567YUI4567
7ABC:HJKO1234ABC:HJKO1234
8GHK1234GHK1234GHK1234
9:GHY5672)):GHY5672))GHY5672
10:GHY567)):GHY567))
11ABCHJFK1234)*BABCHJFK1234)*BJFK1234
Sheet1


Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    Extract = Table.AddColumn(Source, "Result", each try let z = Text.Start(Text.Split([ID],":"){1},7)
    in if Text.Length(Text.Select(Text.Start(z,3),{"A".."Z"})) =3 and Text.Length(Text.Select(Text.End(z,4 ),{"0".."9"}))=4 then z else null
        otherwise let 
            a = Text.ToList([ID]), 
            b = List.Accumulate({0..List.Count(a)-7},{},(s,l)=> s&{Text.Combine(List.Range(a,l,7),"")} ),
            c = List.Select(b, each Text.Length(Text.Select(Text.Start(_,3),{"A".."Z"})) =3 and Text.Length(Text.Select(Text.End(_,4 ),{"0".."9"}))=4 )
            in if List.Count(c)>0 then c{0} else null         )
in
    Extract
 
Upvote 0
Could always just use VBA instead.

Book1
AB
1IDVBA
2(ABC:FGH4567)FGH4567
3(ABC:HJK4567HJK4567
4ABC:GH3456 
5ABC:A KL5678 
6:YUI4567YUI4567
7ABC:HJKO1234JKO1234
8GHK1234GHK1234
9:GHY5672))GHY5672
10:GHY567)) 
11ABCHJFK1234)*BJFK1234
Sheet1
Cell Formulas
RangeFormula
B2:B11B2=exbs(A2)


VBA Code:
Function EXBS(s As String)
With CreateObject("VBScript.RegExp")
    .Global = True
    .Pattern = "\w{3}\d{4}"
    If .test(s) Then
        EXBS = .Execute(s)(0)
    Else
        EXBS = vbNullString
    End If
End With
End Function
 
Upvote 0
Could always just use VBA instead.

Book1
AB
1IDVBA
2(ABC:FGH4567)FGH4567
3(ABC:HJK4567HJK4567
4ABC:GH3456 
5ABC:A KL5678 
6:YUI4567YUI4567
7ABC:HJKO1234JKO1234
8GHK1234GHK1234
9:GHY5672))GHY5672
10:GHY567)) 
11ABCHJFK1234)*BJFK1234
Sheet1
Cell Formulas
RangeFormula
B2:B11B2=exbs(A2)


VBA Code:
Function EXBS(s As String)
With CreateObject("VBScript.RegExp")
    .Global = True
    .Pattern = "\w{3}\d{4}"
    If .test(s) Then
        EXBS = .Execute(s)(0)
    Else
        EXBS = vbNullString
    End If
End With
End Function
Thank You

The VBA works graet . However I am not sure how I would use it with PQ. I simply want to take a table from the Excelworkbook and then run the PQ. How would automate the use of the VBA to cleanup the field prior to PQ being run?
 
Upvote 0

Forum statistics

Threads
1,215,972
Messages
6,128,014
Members
449,414
Latest member
sameri

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top