Is Take() the best way to calculate a moving sum?

JenniferMurphy · Apr 5, 2024

About a week ago, Cubist introduced me to the Take function. I have made great use of it.

I just had a situation where I needed to calculate a moving average. The Take function seems like the perfect solution, but I would appreciate any comments.

Here are several examples. The mini-sheet is below.

Notice that I can't easily get the moving average from the moving sum without calculating the number of values in the sum, something that Take does for me.

Take Function.xlsx

C

D

E

5

N=5

6

Value

Sum of Last N

Average of Last N

7

3

3.00

8

6

9

4.50

9

1

10

3.33

10

7

17

4.25

11

3

20

4.00

12

1

18

3.60

13

2

14

2.80

14

5

18

3.60

15

3

14

2.80

16

8

19

3.80

Moving Sum

Cell Formulas
Range	Formula
D7:D16	D7	=SUM(TAKE(OFFSET(Table2[[#Headers],[Value]],1,0):[@Value],-Num))
E7:E16	E7	=AVERAGE(TAKE(OFFSET(Table2[[#Headers],[Value]],1,0):[@Value],-Num))

Named Ranges
Name	Refers To	Cells
'Moving Sum'!Num	='Moving Sum'!$C$5	D7:E16

JenniferMurphy · Apr 6, 2024

KRice said:
There are many way to tackle the moving average.

For sure. The rest of your post make a lot of sense. I need to study it and get up to speed on Lambda.

Thanks

KRice · Apr 6, 2024

Happy to help. For your consideration, a few other approaches. The orange cells form the range to use using the same DROP/TAKE methodology described earlier, but we need to know which row index we're on, so ridx is a named variable that relies on the row number of the worksheet and the row number where the column header is found to create row indexes of 1, then 2, and so on.

The green cells use a more conventional INDEX approach, but the rows to use are specified by the SEQUENCE function, which starts at the current row index and goes backward some number of rows.

I still think my preference is a LAMBDA...the pink cells use the MAP helper LAMBDA function, and in it, the row indexes are created using the SEQUENCE function. We step along each row index (called r inside the LAMBDA function...an arbitrary designation)...and use the same DROP/TAKE method for efficiently selecting the desired portion of the data range. This is a variation on the earlier LAMBDA approach...a bit shorter and still only one formula to spill the results.

MrExcel_20240405.xlsx

C

D

E

F

G

H

I

5

6

Value

Sum of Last N

Average of Last N

Sum of Last N

Average of Last N

Sum of Last N

Average of Last N

7

3

3.00

8

6

9.00

4.50

9

1

10.00

3.33

10

7

17.00

4.25

11

3

20.00

4.00

12

1

18.00

3.60

13

2

14.00

2.80

14

5

18.00

3.60

15

3

14.00

2.80

16

8

19.00

3.80

Moving Sum

Cell Formulas
Range	Formula
D7:D16	D7	=LET(ridx,ROW()-ROW(Table2[#Headers]),SUM(DROP(TAKE(Table2[Value],ridx),MAX(0,ridx-Num))))
E7:E16	E7	=LET(ridx,ROW()-ROW(Table2[#Headers]),AVERAGE(DROP(TAKE(Table2[Value],ridx),MAX(0,ridx-Num))))
F7:F16	F7	=LET(ridx,ROW()-ROW(Table2[#Headers]),SUM(INDEX(Table2[Value],SEQUENCE(MIN(ridx,Num),,ridx,-1))))
G7:G16	G7	=LET(ridx,ROW()-ROW(Table2[#Headers]),AVERAGE(INDEX(Table2[Value],SEQUENCE(MIN(ridx,Num),,ridx,-1))))
H7:H16	H7	=MAP(SEQUENCE(ROWS(Table2)),LAMBDA(r,SUM(DROP(TAKE(Table2[Value],r),MAX(0,r-Num)))))
I7:I16	I7	=MAP(SEQUENCE(ROWS(Table2)),LAMBDA(r,AVERAGE(DROP(TAKE(Table2[Value],r),MAX(0,r-Num)))))
Dynamic array formulas.

Named Ranges
Name	Refers To	Cells
Num	='Moving Sum'!$C$5	H7:I7, D7:G16

JenniferMurphy · Apr 6, 2024

KRice said:
Happy to help. For your consideration, a few other approaches. The orange cells form the range to use using the same DROP/TAKE methodology described earlier, but we need to know which row index we're on, so ridx is a named variable that relies on the row number of the worksheet and the row number where the column header is found to create row indexes of 1, then 2, and so on.

The green cells use a more conventional INDEX approach, but the rows to use are specified by the SEQUENCE function, which starts at the current row index and goes backward some number of rows.

I still think my preference is a LAMBDA...the pink cells use the MAP helper LAMBDA function, and in it, the row indexes are created using the SEQUENCE function. We step along each row index (called r inside the LAMBDA function...an arbitrary designation)...and use the same DROP/TAKE method for efficiently selecting the desired portion of the data range. This is a variation on the earlier LAMBDA approach...a bit shorter and still only one formula to spill the results.

Wow!

That is amazing and very helpful. It will give something to study. Thank you.

I'm going to mark this as the solution. Even though my original solution works, these additional solutions have other advantages.

KRice · Apr 6, 2024

Thanks...these others avoid OFFSET to form the array of interest. but nothing really wrong with the original ideas either. Once any of these are applied to a larger data set, I'm curious about whether you notice any performance changes. Let me know if I might be able to clarify anything with the LAMBDAs.

Cubist · Apr 6, 2024

JenniferMurphy said:
We may be talking apples and oranges here. I am not trying to "move 5 averages". I'm not even sure what that means.

I am using the definition of a "simple moving average" as defined in several sources. Here are just two:

Moving Average (MA): Purpose, Uses, Formula, and Examples

A moving average (MA) is a technical analysis indicator that helps level price action by filtering out the noise from random price fluctuations.

www.investopedia.com

Moving average - Wikipedia

en.wikipedia.org

In my solution, a simple moving average of length N, is the simple average of the last N values. But if there are less than N values, then it's the average of as many values as there are. For the first N-1,values the moving average is the simple average of a many values as there are. As you can see in my examples, the moving average of the first value is just that value. The moving average of the first 2 values is the sum of those values divided by 2. And so on.

Now there are other types of moving averages, such as weighted and exponential. These take into account more values and they have different uses.

Did you look at the formula from the Wiki page you linked? The definition: "simple moving average (SMA) is the unweighted mean of the previous k data points."
When there are not enough k points, it's not a moving average. By definition, the first 4 averages in your example are not moving averages. But if that's what your intent is for your purpose that is fine.

Screen Shot 2024-04-06 at 6.38.37 AM.png

In your example, there are 10 data points (n=10), and the moving 5 average (k=5). The first data point is determined by
i = n-k+1 or i = 10-5+1 = 6. There is a reason the index didn't start from 1 because it would make it a simple average not a moving average. I'm not trying to be overly pedantic. Just so you're aware someone with a technical background might call you out on that.

JenniferMurphy · Apr 14, 2024

Cubist said:
Did you look at the formula from the Wiki page you linked? The definition: "simple moving average (SMA) is the unweighted mean of the previous k data points."
When there are not enough k points, it's not a moving average. By definition, the first 4 averages in your example are not moving averages. But if that's what your intent is for your purpose that is fine.
View attachment 109586
In your example, there are 10 data points (n=10), and the moving 5 average (k=5). The first data point is determined by
i = n-k+1 or i = 10-5+1 = 6. There is a reason the index didn't start from 1 because it would make it a simple average not a moving average. I'm not trying to be overly pedantic. Just so you're aware someone with a technical background might call you out on that.

I think my point is stated in this paragraph from the Wiki article:

During the initial filling (emphasis mine) of the FIFO / circular buffer the sampling window is equal to the data-set size thus, k=n, and the average calculation is performed as a cumulative moving average.

I think this is saying that when the number of values is less than the intended sample size, the moving average is the average of however many values there are. The moving average of the 1st value is that value. The moving average of the first 2 values is the sum of those 2 values divided by 2. And so on until the intended average length is achieved. I agree that the first n-1 moving averages have less "validity?" than those with the full n values. But then they are probably not relied on much either. The alternative would be to show no values at all for the moving average until n values are present.

JenniferMurphy · Apr 14, 2024

I have been playing with the various solutions offered here and am grateful for all contributions. The solution I have come up with for my current need is in the minisheet below. This solution eliminates the use of Offset.

Here are a couple of sample results. The number in the yellow cell is the number of values in the moving average.