Faster Alternative to Conditional Running Total than List.Sum

takoyaki

New Member
Joined
Oct 24, 2022
Messages
30
Office Version
  1. 365
Platform
  1. Windows
Hello everyone, I'm trying to add a column that will sum up values in another column based on a few conditions. The code I currently have uses List.Sum. I have to do a similar conditional running total calculation several times in my query along with a few other transformations. The query currently takes about 7.5 minutes to evaluate and I suspect that this code is the culprit. Is there a better way I could handle the code below?

I am aware that there are ways to use a custom function based on List.Generate but I'm not sure how to customize the custom function to exclude values that don't meet desired conditions.

Power Query:
ConditionalRT = Table.AddColumn(Source, "Conditional RT", (x)=> List.Sum(Table.SelectRows(Source, each _[Condition1] = x[Condition1] and _[Condition2] < x[Condition2] and _[Condition3] = "Condition3")[Values]), type number),

Thank you again in advance for your help.
 

Excel Facts

How to change case of text in Excel?
Use =UPPER() for upper case, =LOWER() for lower case, and =PROPER() for proper case. PROPER won't capitalize second c in Mccartney
Thank you so much! I'm just wondering - it looks like the conditions you used are hard-coded constants. The conditions in my example compare the row's value to the entire column's value (for instance if the condition is for the Year, the goal is to sum up the column's values if they match the row's Year). The only constant condition in my example is Condition3. I am also curious about choosing 300 for lst1. Am I misunderstanding something or do I need to clarify my goal better?
 
Upvote 0
If you write a TableAddColumn step that adds a helper column with [Value] if conditions are met and zero otherwise. That will be used in tbl step. I used dummy conditions to test the speed.
300 is not a maximum it is used to evaluate the data in 300 row chunks which improves the speed greatly

this is the amended code. you need to change the tbl step to whatever conditions you have.

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
    tbl = Table.AddColumn(Source, "Helper", each if [Condition1]>3 and [Condition2]<12 and [Condition3] = "A" then [Value] else 0),
    RunningTotal = (lst as list) as list => 
        let 
            lst1 = List.Split(lst, 300),
            lst2 = List.Zip({lst1,{0..List.Count(lst1)-1}}),
            lst3 = List.Accumulate(lst2, {}, (ss,cc)=> ss & {List.Skip(List.Accumulate(cc{0}, (if cc{1} = 0 then {0} else {List.Last(List.Last(ss))}), (s,c)=> s & {List.Last(s) + c}))}),
            lst4 = List.Combine(lst3)
        in   
            lst4,
    lst = RunningTotal(tbl[Helper]),
    Result = Table.FromColumns(Table.ToColumns(Source) & {lst}, Table.ColumnNames(Source) & {"ConditionalRT"})
in
    Result


PQ Table_AddRunningTotal.xlsx
ABCDEFGHIJ
1Condition1Condition2Condition3ValueCondition1Condition2Condition3ValueConditionalRT
2611A8000611A80008000
3174B1000174B10008000
41516A70001516A70008000
5107B3000107B30008000
6129A6000129A600014000
725B400025B400014000
8188A9000188A900023000
931B200031B200023000
101314A50001314A500023000
Sheet2
 
Upvote 0
If you write a TableAddColumn step that adds a helper column with [Value] if conditions are met and zero otherwise. That will be used in tbl step. I used dummy conditions to test the speed.
300 is not a maximum it is used to evaluate the data in 300 row chunks which improves the speed greatly

this is the amended code. you need to change the tbl step to whatever conditions you have.

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
    tbl = Table.AddColumn(Source, "Helper", each if [Condition1]>3 and [Condition2]<12 and [Condition3] = "A" then [Value] else 0),
    RunningTotal = (lst as list) as list =>
        let
            lst1 = List.Split(lst, 300),
            lst2 = List.Zip({lst1,{0..List.Count(lst1)-1}}),
            lst3 = List.Accumulate(lst2, {}, (ss,cc)=> ss & {List.Skip(List.Accumulate(cc{0}, (if cc{1} = 0 then {0} else {List.Last(List.Last(ss))}), (s,c)=> s & {List.Last(s) + c}))}),
            lst4 = List.Combine(lst3)
        in  
            lst4,
    lst = RunningTotal(tbl[Helper]),
    Result = Table.FromColumns(Table.ToColumns(Source) & {lst}, Table.ColumnNames(Source) & {"ConditionalRT"})
in
    Result


PQ Table_AddRunningTotal.xlsx
ABCDEFGHIJ
1Condition1Condition2Condition3ValueCondition1Condition2Condition3ValueConditionalRT
2611A8000611A80008000
3174B1000174B10008000
41516A70001516A70008000
5107B3000107B30008000
6129A6000129A600014000
725B400025B400014000
8188A9000188A900023000
931B200031B200023000
101314A50001314A500023000
Sheet2
Thank you so much! I think this is definitely on the right track. I always appreciate your help!
 
Upvote 0
If you write a TableAddColumn step that adds a helper column with [Value] if conditions are met and zero otherwise. That will be used in tbl step. I used dummy conditions to test the speed.
300 is not a maximum it is used to evaluate the data in 300 row chunks which improves the speed greatly

this is the amended code. you need to change the tbl step to whatever conditions you have.

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
    tbl = Table.AddColumn(Source, "Helper", each if [Condition1]>3 and [Condition2]<12 and [Condition3] = "A" then [Value] else 0),
    RunningTotal = (lst as list) as list =>
        let
            lst1 = List.Split(lst, 300),
            lst2 = List.Zip({lst1,{0..List.Count(lst1)-1}}),
            lst3 = List.Accumulate(lst2, {}, (ss,cc)=> ss & {List.Skip(List.Accumulate(cc{0}, (if cc{1} = 0 then {0} else {List.Last(List.Last(ss))}), (s,c)=> s & {List.Last(s) + c}))}),
            lst4 = List.Combine(lst3)
        in  
            lst4,
    lst = RunningTotal(tbl[Helper]),
    Result = Table.FromColumns(Table.ToColumns(Source) & {lst}, Table.ColumnNames(Source) & {"ConditionalRT"})
in
    Result


PQ Table_AddRunningTotal.xlsx
ABCDEFGHIJ
1Condition1Condition2Condition3ValueCondition1Condition2Condition3ValueConditionalRT
2611A8000611A80008000
3174B1000174B10008000
41516A70001516A70008000
5107B3000107B30008000
6129A6000129A600014000
725B400025B400014000
8188A9000188A900023000
931B200031B200023000
101314A50001314A500023000
Sheet2
Hi JGordon11, I really appreciate all of your help. I've been trying to apply your solution to my scenario for the past few weeks and am not having any luck. I understand your intention with the helper column, but due to the way I am trying to sum my values I don't know of any way to filter my table before using your function. Here is an example of the result I am trying to produce, using Excel:

Sumproduct.jpg


The SUMPRODUCT conditions being:
Excel Formula:
=SUMPRODUCT(([Year]=[@Year])*([Month]<[@Month])*([Value]))

I don't believe a helper column will be possible in this scenario. Doing this sum is very simple in Excel but very difficult in Power Query. The below M-code gets the desired result, but is very time consuming for the query to calculate.

Power Query:
ConditionalRT = Table.AddColumn(Source, "Conditional RT", (x)=> List.Sum(Table.SelectRows(Source, each _[Year] = x[Year] and _[Month] < x[Month])[Value]), type number)

I'm creating a tool that will be fed thousands of rows of data potentially and will be providing this tool for others on my team to use so I'd like it to be as efficient as possible.
 
Upvote 0
I ran this on 20,000 rows and it took about 30 seconds

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    AddIndex = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
    Sort = Table.Sort(AddIndex,{{"Year", Order.Descending}, {"Month", Order.Descending}}),
    GroupLocal = Table.Buffer(Table.Group(Sort, {"Year"}, {{"All", each _}},0)),
    TransformAll = Table.TransformColumns(GroupLocal, {"All", each    
                    Table.AddColumn(_, "Conditional RT", (x)=> List.Sum(Table.SelectRows(_, (y)=> y[Month] < x[Month])[Value]), type number)}),
    Expand = Table.ExpandTableColumn(TransformAll, "All", {"Month", "Value", "Sumproduct", "Index", "Conditional RT"}),
    #"Re-Sort" = Table.Sort(Expand,{{"Index", Order.Ascending}}),
    RemoveIndex = Table.RemoveColumns(#"Re-Sort",{"Index"})
in
    RemoveIndex

Book4
ABCDEFGHIJK
1YearMonthValueSumproductYearMonthValueSumproductConditional RT
2202251353233720225135323372337
320201215994428202012159944284428
4202071968369202071968369369
5202171722295220217172229522952
62021612328292021612328292829
72021598418452021598418451845
820233147602023314760
9202117380202117380
1020231020915535202310209155355535
1120233221402023322140
122023461536902023461536903690
132022949236902022949236903690
14202013690202013690
15202111861467420211186146744674
162020724636920207246369369
17202081845258320208184525832583
1820224233702022423370
19202131107738202131107738738
20202351230430520235123043054305
21
Sheet1
Cell Formulas
RangeFormula
D2:D20D2=SUMPRODUCT(([Year]=[@Year])*([Month]<[@Month])*([Value]))
 
Upvote 1
Solution
I ran this on 20,000 rows and it took about 30 seconds

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    AddIndex = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
    Sort = Table.Sort(AddIndex,{{"Year", Order.Descending}, {"Month", Order.Descending}}),
    GroupLocal = Table.Buffer(Table.Group(Sort, {"Year"}, {{"All", each _}},0)),
    TransformAll = Table.TransformColumns(GroupLocal, {"All", each   
                    Table.AddColumn(_, "Conditional RT", (x)=> List.Sum(Table.SelectRows(_, (y)=> y[Month] < x[Month])[Value]), type number)}),
    Expand = Table.ExpandTableColumn(TransformAll, "All", {"Month", "Value", "Sumproduct", "Index", "Conditional RT"}),
    #"Re-Sort" = Table.Sort(Expand,{{"Index", Order.Ascending}}),
    RemoveIndex = Table.RemoveColumns(#"Re-Sort",{"Index"})
in
    RemoveIndex

Book4
ABCDEFGHIJK
1YearMonthValueSumproductYearMonthValueSumproductConditional RT
2202251353233720225135323372337
320201215994428202012159944284428
4202071968369202071968369369
5202171722295220217172229522952
62021612328292021612328292829
72021598418452021598418451845
820233147602023314760
9202117380202117380
1020231020915535202310209155355535
1120233221402023322140
122023461536902023461536903690
132022949236902022949236903690
14202013690202013690
15202111861467420211186146744674
162020724636920207246369369
17202081845258320208184525832583
1820224233702022423370
19202131107738202131107738738
20202351230430520235123043054305
21
Sheet1
Cell Formulas
RangeFormula
D2:D20D2=SUMPRODUCT(([Year]=[@Year])*([Month]<[@Month])*([Value]))
You're incredible. I implemented your solution into my query and it's almost instantaneous.

Thank you so, so much!
 
Upvote 0

Forum statistics

Threads
1,214,792
Messages
6,121,612
Members
449,038
Latest member
apwr

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top