Improving performance of List.PositionOf

jimrward

Well-known Member
Joined
Feb 24, 2003
Messages
1,895
Office Version
  1. 2021
  2. 2019
  3. 2016
  4. 2013
  5. 2011
  6. 2010
  7. 2007
  8. 2003 or older
Platform
  1. Windows
Complete novice here but bumbling along, I have around 200k lines of source data by 35 columns, of which one of the columns call it TargetID can have repeat rows from 1 to 60,
I am trying to add a custom column that flags 1 if its the first instance of the targetid sequence and 0 if it aint
what I have below appears to work albeit very slowly
I have seen references to List.Buffer and Table.buffer to load things into memory but need some guidance to get my head around it, here is my current code gleaned from various searches, source is an external CSV file

Power Query:
let
    Source = logExport_Current,
    #"Added Custom" = Table.AddColumn(Source, "Custom", each List.PositionOf( Source[TargetID], [TargetID]) ),
    #"Added Index" = Table.AddIndexColumn(#"Added Custom", "Index", 0, 1, Int64.Type),
    #"Added Custom1" = Table.AddColumn(#"Added Index", "Flag", each Number.From([Custom]=[Index])),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Custom", "Index"})
in
    #"Removed Columns"
 
Last edited by a moderator:

Excel Facts

Highlight Duplicates
Home, Conditional Formatting, Highlight Cells, Duplicate records, OK to add pink formatting to any duplicates in selected range.
This took about five seconds for 200,000 rows:

Power Query:
let
    Source = logExport_Current,
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
    #"Grouped Rows" = Table.Group(#"Added Index", {"TargetID"}, {{"Idx", each List.Min([Index]), type number}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Flag", each 1),
    #"Merged Queries" = Table.NestedJoin(#"Added Index", {"Index"}, #"Added Custom", {"Idx"}, "Flag", JoinKind.LeftOuter),
    #"Expanded Flag" = Table.ExpandTableColumn(#"Merged Queries", "Flag", {"Flag"}),
    #"Sorted Rows" = Table.Sort(#"Expanded Flag",{{"Index", Order.Ascending}}),
    #"Replaced Value" = Table.ReplaceValue(#"Sorted Rows",null,0,Replacer.ReplaceValue,{"Flag"}),
    #"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"Index"})
in
    #"Removed Columns"
 
Upvote 0
I wouldn't use List.PositionOf() method for this in the first place. However, if I have to, then you are right, List.Buffer() will be your best friend. You don't need to fetch TargetID from the Source in each iteration. Instead, try the following version:

Power Query:
let
Source = logExport_Current,
BuffList = List.Buffer(Source[TargetID]),
#"Added Custom" = Table.AddColumn(Source, "Custom", each List.PositionOf(BuffList, [TargetID]) ),
#"Added Index" = Table.AddIndexColumn(#"Added Custom", "Index", 0, 1, Int64.Type),
#"Added Custom1" = Table.AddColumn(#"Added Index", "Flag", each Number.From([Custom]=[Index])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Custom", "Index"})
in
#"Removed Columns"

Note that I fetched the TargetID column as a list once, and buffered in memory, and then I used that list as the List.PositionOf() function parameter. This should radically improve the speed.

On the other hand, I would use another method. I will first add an index, group by the TargetID column by aggregating on the Index column to get the minimum index number as Min column, then expand the grouped tables back and finally compare the Index column with the Min column. Here is the code, should be easier to follow while it is working. And I believe this version should work much faster than the previously improved List.PositionOf() version. (4x maybe?)

Power Query:
let
    Source = logExport_Current,
    AddIndex = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
    GroupTable = Table.Group(AddIndex, {"TargetID"}, {{"Table", each _, type table}, {"Min", each List.Min([Index]), type number}}),
    RemoveColumn = Table.RemoveColumns(GroupTable,{"TargetID"}),
    ExpandTable = Table.ExpandTableColumn(RemoveColumn, "Table", Table.ColumnNames(AddIndex)),
    SortRows = Table.Sort(ExpandTable,{{"Index", Order.Ascending}}),
    AddFlag = Table.AddColumn(SortRows, "Flag", each if [Index] = [Min] then 1 else 0),
    RemoveColumns = Table.RemoveColumns(AddFlag,{"Index", "Min"})
in
    RemoveColumns
 
Upvote 0
Solution
Many thanks both for the quick replies, I will step thru them line by line in situ to see each magical step
in my mind doing it manually I thought filter to remove duplicates, tag a row, unfiltered to return to initial list
when I tried the list.buffer it looks like I did not fully qualify the TargetID by wrapping Source( ) around it, without list.buffer as posted it was getting to nearly 2 hours of processing but it was getting late at the end of a long day, so it was me.beer or list.buffer guess what won
 
Upvote 0
ok, after some brief testing in between other things
@smozgur method 1 using List.Buffer and List.Positionof 3 mins and 35 secs, massive increase on my effort
@smozgur method 2 using index, min, group etc 15 seconds
I will test @JGordon11 suggested solution later and report back timings

just got to factor in the above into the rest of my routines, this prompts a further question

My routines use the same core data block once read in, is PQ clever enough when you use the original load as reference to treat that as static or will all child dependency routines refresh the master every time they execute, I only want to apply the extra flag column once
 
Upvote 0
ignore the last paragraph above, I have pointed my child routines at the Table produced by the above.
One observation I did make was in my original import I changed the Data Type of one of the columns to DATE when I make reference to that when adding in the flag using one of the above methods I have to change the data type to DATE again, not a show stopper
 
Upvote 0

Forum statistics

Threads
1,223,098
Messages
6,170,100
Members
452,301
Latest member
QualityAssurance

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top