Dynamically insert new columns in Excel Power Query and sort column pairs horizontally

nik_l

New Member
Joined
Feb 24, 2019
Messages
2
I'm stumped with a requirement and can't seem to get past it (I've tried to scour the internet, and I'm unable to fix it myself, given my lesser experience with Power Query). I could accomplish this using VBA, but the number of records, that need to be processed are shy of half a million records! :eek:

So here's the simplified task at hand. There are two columns - "Substring" and "Main String", as shown below [Input]:

SubstringMain String
AB|CDEABCDEF
ABC|DE|GHABCDEFGHI
A|BC|X|YZABCDYZ

<tbody>
</tbody>
The Substring column has mini substrings separated by a delimiter (|). These Mini Substrings may or may not be found within the Main string. The output at the end should look like this [Expected Output]:

SubstringMain StringMSS1ML1MSS2ML2MSS3ML3MSS4ML4
AB|CDEABCDEFCDE3AB
ABC|DE|GHABCDEFGHIABC3DE2GH2
A|BC|X|YZABCDYZBC2YZ2A1X0

<tbody>
</tbody>
(Note: MSS and ML in the column headers stand for Mini Substring and Match Length respectively)

The Mini Substrings can vary in number, so the row that has the maximum number of Mini Substrings would define the number of MSS and ML columns. In this case, the third row has four of them, so there were four MSS and ML columns each.

Not just that, the sequence of these Mini Substrings has to be such that those with highest matching lengths should be placed on the leftmost MSS and ML pair of columns. And then those with the least matching lengths should go on the rightmost pair of MSS and ML columns. And everything in between should progress in that order.

I got until the stage where the current output looks like this [Interim Output]:

SubstringMain StringMSS1MSS2MSS3MSS4
AB|CDEABCDEFABCDE
ABC|DE|GHABCDEFGHIABCDEGH
A|BC|X|YZABCDYZABCXYZ

<tbody>
</tbody>
Dynamically inserting the ML columns just after the respective MSS columns is the challenge I'm unable to get past by. And I'm sure sorting them horizontally by the rank of matching lengths (i.e. lengths of those mini substrings that found a match in the main string) would also be another challenge. Here's the Power Query that I could come up with that generated the [Interim Output].

Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Sub String", type text}, {"Main String", type text}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Changed Type", "Sub String", "For Split"),
    #"Added Custom" = Table.AddColumn(#"Duplicated Column", "Split Count", each List.Count(Text.Split([Sub String],"|"))),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"Split Count", "Sub String", "Main String", "For Split"}),
    #"Max No Of Splittable Columns" = List.Max(#"Reordered Columns"[Split Count]),
    #"List Of MSS Columns" = List.Transform({1..#"Max No Of Splittable Columns"}, each "MSS"&Text.From(_)),
    #"Split Columns By Delimiter" = Table.SplitColumn(#"Reordered Columns","For Split",Splitter.SplitTextByDelimiter("|"), #"List Of MSS Columns"),
    #"Removed Columns" = Table.RemoveColumns(#"Split Columns By Delimiter",{"Split Count"})
in
    #"Removed Columns"

Could you please guide me out of this? You can easily tell I am new to Power Query, so I very much appreciate any help that I can get. Thank you!

Note: I've also posted this on a different forum, but awaiting a response there.
 

Some videos you may like

Excel Facts

Excel Joke
Why can't spreadsheets drive cars? They crash too often!

alansidman

Well-known Member
Joined
Feb 26, 2007
Messages
5,856
Office Version
  1. 2019
Platform
  1. Windows
Here is the Mcode for adding columns and inserting the length of each column and then arranging their location. I am unable to determine how to sort horizontally as you desire

Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Substring", type text}, {"Main String", type text}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Changed Type", "Substring", "Substring - Copy"),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Duplicated Column", "Substring - Copy", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Substring - Copy.1", "Substring - Copy.2", "Substring - Copy.3", "Substring - Copy.4"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Substring - Copy.1", type text}, {"Substring - Copy.2", type text}, {"Substring - Copy.3", type text}, {"Substring - Copy.4", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type1", "Custom", each Text.Length([#"Substring - Copy.1"])),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"Substring", "Main String", "Substring - Copy.1", "Custom", "Substring - Copy.2", "Substring - Copy.3", "Substring - Copy.4"}),
    #"Added Custom1" = Table.AddColumn(#"Reordered Columns", "Custom.1", each Text.Length([#"Substring - Copy.2"])),
    #"Reordered Columns1" = Table.ReorderColumns(#"Added Custom1",{"Substring", "Main String", "Substring - Copy.1", "Custom", "Substring - Copy.2", "Custom.1", "Substring - Copy.3", "Substring - Copy.4"}),
    #"Added Custom2" = Table.AddColumn(#"Reordered Columns1", "Custom.2", each Text.Length([#"Substring - Copy.3"])),
    #"Reordered Columns2" = Table.ReorderColumns(#"Added Custom2",{"Substring", "Main String", "Substring - Copy.1", "Custom", "Substring - Copy.2", "Custom.1", "Substring - Copy.3", "Custom.2", "Substring - Copy.4"}),
    #"Added Custom3" = Table.AddColumn(#"Reordered Columns2", "Custom.3", each Text.Length([#"Substring - Copy.4"]))
in
    #"Added Custom3"

To determine the length of a column use
Insert Custom column
Type = Text.Length([Column Name])
 

nik_l

New Member
Joined
Feb 24, 2019
Messages
2
Hi Alan,


Thanks for your reply!


This would have been good enough had the Substring column had a maximum of four predefined mini substrings. Unfortunately, that isn't the case. They can be 10, 15, or even 20. In essence, the Substring column can be split into a varying number of columns depending on the incoming data.


For instance, one of the biggest substrings can look like this: AA|AB|AC|AD|AE|AF|AG|AH|AI|AJ


Another instance can be that the biggest substring is just this: AA|AB


Sorry if I couldn't make this clear in my original post.


How do I handle such dynamism?

Note: I've just posted a reply and I don't see it reflecting here, so I'm repeating my reply
 

Watch MrExcel Video

Forum statistics

Threads
1,109,163
Messages
5,527,163
Members
409,750
Latest member
BorisYeltsin

This Week's Hot Topics

Top