Improving performance of List.PositionOf

jimrward · Sep 26, 2023

Complete novice here but bumbling along, I have around 200k lines of source data by 35 columns, of which one of the columns call it TargetID can have repeat rows from 1 to 60,
I am trying to add a custom column that flags 1 if its the first instance of the targetid sequence and 0 if it aint
what I have below appears to work albeit very slowly
I have seen references to List.Buffer and Table.buffer to load things into memory but need some guidance to get my head around it, here is my current code gleaned from various searches, source is an external CSV file

Power Query:

let
    Source = logExport_Current,
    #"Added Custom" = Table.AddColumn(Source, "Custom", each List.PositionOf( Source[TargetID], [TargetID]) ),
    #"Added Index" = Table.AddIndexColumn(#"Added Custom", "Index", 0, 1, Int64.Type),
    #"Added Custom1" = Table.AddColumn(#"Added Index", "Flag", each Number.From([Custom]=[Index])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Custom", "Index"})
in
    #"Removed Columns"

JGordon11 · Sep 26, 2023

This took about five seconds for 200,000 rows:

Power Query:

let
    Source = logExport_Current,
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
    #"Grouped Rows" = Table.Group(#"Added Index", {"TargetID"}, {{"Idx", each List.Min([Index]), type number}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Flag", each 1),
    #"Merged Queries" = Table.NestedJoin(#"Added Index", {"Index"}, #"Added Custom", {"Idx"}, "Flag", JoinKind.LeftOuter),
    #"Expanded Flag" = Table.ExpandTableColumn(#"Merged Queries", "Flag", {"Flag"}),
    #"Sorted Rows" = Table.Sort(#"Expanded Flag",{{"Index", Order.Ascending}}),
    #"Replaced Value" = Table.ReplaceValue(#"Sorted Rows",null,0,Replacer.ReplaceValue,{"Flag"}),
    #"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"Index"})
in
    #"Removed Columns"

smozgur · Sep 26, 2023

I wouldn't use List.PositionOf() method for this in the first place. However, if I have to, then you are right, List.Buffer() will be your best friend. You don't need to fetch TargetID from the Source in each iteration. Instead, try the following version:

Power Query:

let
Source = logExport_Current,
BuffList = List.Buffer(Source[TargetID]),
#"Added Custom" = Table.AddColumn(Source, "Custom", each List.PositionOf(BuffList, [TargetID]) ),
#"Added Index" = Table.AddIndexColumn(#"Added Custom", "Index", 0, 1, Int64.Type),
#"Added Custom1" = Table.AddColumn(#"Added Index", "Flag", each Number.From([Custom]=[Index])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Custom", "Index"})
in
#"Removed Columns"

Note that I fetched the TargetID column as a list once, and buffered in memory, and then I used that list as the List.PositionOf() function parameter. This should radically improve the speed.

On the other hand, I would use another method. I will first add an index, group by the TargetID column by aggregating on the Index column to get the minimum index number as Min column, then expand the grouped tables back and finally compare the Index column with the Min column. Here is the code, should be easier to follow while it is working. And I believe this version should work much faster than the previously improved List.PositionOf() version. (4x maybe?)

Power Query:

let
    Source = logExport_Current,
    AddIndex = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
    GroupTable = Table.Group(AddIndex, {"TargetID"}, {{"Table", each _, type table}, {"Min", each List.Min([Index]), type number}}),
    RemoveColumn = Table.RemoveColumns(GroupTable,{"TargetID"}),
    ExpandTable = Table.ExpandTableColumn(RemoveColumn, "Table", Table.ColumnNames(AddIndex)),
    SortRows = Table.Sort(ExpandTable,{{"Index", Order.Ascending}}),
    AddFlag = Table.AddColumn(SortRows, "Flag", each if [Index] = [Min] then 1 else 0),
    RemoveColumns = Table.RemoveColumns(AddFlag,{"Index", "Min"})
in
    RemoveColumns

jimrward · Sep 27, 2023

Many thanks both for the quick replies, I will step thru them line by line in situ to see each magical step
in my mind doing it manually I thought filter to remove duplicates, tag a row, unfiltered to return to initial list
when I tried the list.buffer it looks like I did not fully qualify the TargetID by wrapping Source( ) around it, without list.buffer as posted it was getting to nearly 2 hours of processing but it was getting late at the end of a long day, so it was me.beer or list.buffer guess what won

jimrward · Sep 27, 2023

ok, after some brief testing in between other things
@smozgur method 1 using List.Buffer and List.Positionof 3 mins and 35 secs, massive increase on my effort
@smozgur method 2 using index, min, group etc 15 seconds
I will test @JGordon11 suggested solution later and report back timings

just got to factor in the above into the rest of my routines, this prompts a further question

My routines use the same core data block once read in, is PQ clever enough when you use the original load as reference to treat that as static or will all child dependency routines refresh the master every time they execute, I only want to apply the extra flag column once

jimrward · Sep 27, 2023

ignore the last paragraph above, I have pointed my child routines at the Table produced by the above.
One observation I did make was in my original import I changed the Data Type of one of the columns to DATE when I make reference to that when adding in the flag using one of the above methods I have to change the data type to DATE again, not a show stopper

Improving performance of List.PositionOf

jimrward

Well-known Member

Excel Facts

JGordon11

Well-known Member

smozgur

BatCoder

jimrward

Well-known Member

jimrward

Well-known Member

jimrward

Well-known Member

Similar threads

Forum statistics

Share this page

Improving performance of List.PositionOf

jimrward

Well-known Member

Excel Facts

JGordon11

Well-known Member

smozgur

BatCoder

jimrward

Well-known Member

jimrward

Well-known Member

jimrward

Well-known Member

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock