Percentile of n most recent values

Fitzy22

New Member
Joined
Jan 19, 2022
Messages
17
Office Version
  1. 365
Platform
  1. Windows
Hi,

Please see simplified minisheet of my data below.

I would like to calculate the 80th percentile of the most recent (i.e. dates in Column A) 24 values (i.e. values in Column C) for each site (i.e. x and y in Column B). Where there is no value for a particular date it means there is no value recorded and therefore needs to be ignored and doesn't from part of the percentile calculation. So only the 24 most recent "values" should be used.

I hope this adequately explains my requirements, but if there are any questions please ask.

Thanks in advance...

Percentile_calculation.xlsx
ABC
1DateSiteValue
225/12/2006x
321/12/2007x1
421/12/2007x2
518/11/2008x
618/11/2008x
728/11/2008y
828/11/2008y1
928/11/2008y2
1028/11/2008y
1128/11/2008y
1228/11/2008y
1328/11/2008x
1412/01/2009x
1512/01/2009x1
1612/01/2009y1
1712/01/2009y1
1812/01/2009y
1912/01/2009x
2012/01/2009y
2112/01/2009x
2212/01/2009y
2312/01/2009x
2417/01/2009x
2517/01/2009x2
2617/01/2009y
2717/01/2009y
2817/01/2009y
2917/01/2009x
301/01/2010x
311/01/2010x10
3215/02/2010x
3315/02/2010x
3415/02/2010y3
3518/02/2010x
3618/02/2010x
3718/02/2010x
3818/02/2010x
3918/02/2010x
4017/01/2011x4
419/03/2011x8
429/05/2011x3
437/03/2012x1
4426/02/2013x2
457/03/2013x1
461/03/2014x2.1
474/12/2014x3.4
484/12/2014y2.4
497/12/2014y1.2
507/12/2014y1.5
5114/12/2014x2
5214/12/2014x1.8
5329/12/2014x2.7
5416/01/2015x2.3
557/11/2015x
566/02/2016x2.3
5713/05/2016x3.2
5813/05/2016x2.5
5927/06/2016y1.1
6024/09/2016y1.4
6124/09/2016y1
6224/02/2017x1.1
633/03/2018x4.1
645/03/2018x9.1
655/03/2018x9
665/03/2018x7.8
676/03/2018x2.8
688/03/2018x5.3
698/03/2018x5.4
708/03/2018x3.3
719/03/2018x7.8
7224/02/2019x1.6
7324/02/2019x3.2
7431/03/2019y1.6
7531/03/2019y3.7
7625/01/2020y1.9
7727/01/2020x6.7
7827/01/2020x2.7
7927/01/2020x1.3
805/02/2020x4.6
818/03/2020y4.4
828/03/2020x0.8
8326/09/2020x1.6
8426/09/2020x2.2
8525/12/2020x1.7
8610/03/2021x14
8723/03/2021x1.4
8824/06/2021x2.4
8929/10/2021x4.2
9021/01/2022x7.5
9124/01/2022x12
9226/01/2022x2.2
Sheet1
 
That’s good news, and we’re happy to help.
 
Upvote 0

Excel Facts

Last used cell?
Press Ctrl+End to move to what Excel thinks is the last used cell.
Hi again @KRice.

I am having issues with some of the calculations still and I'm not sure why.

The formula I'm using is:

=PERCENTILE.INC(INDEX(SORT(FILTER(FILTER(A9:BD273,($D$9:$D$273=F2)*($K$9:$K$273<>"")),{1,0,0,0,0,0,0,0,0,0,1}),1,-1),SEQUENCE(MIN(24,SUM(($D$9:$D$273=F2)*(K9:K273<>"")))),11),0.8)

In this case the spreadsheet columns are:

- Entire data array - A9:BD273
- Dates - A9:A273
- Site - D9:D273
- Values - K9:K273
- Site reference - F2

There are 9 intervening columns between the Date (Column A - Column 1) and the Values (Column K - Column 11) which are represented by the 9 zeros in the formula.

Apologies if this is a hassle. I cannot understand why its not working because it works on one spreadsheet and not another.

Regards,
 
Upvote 0
This part (A9:BD273) of your inner FILTER says to return information from columns A:BD...that's a lot. Yet your outer FILTER says to turn on/off this many columns {1,0,0,0,0,0,0,0,0,0,1}, which means to turn on, or keep, columns A and K. So change the first reference to A9:K273. So the outer FILTER isn't receiving enough information to know what to do with columns L:BD.
 
Upvote 0
If your table structure is such that the Date, Site, and Value columns are far apart and constructing the outer FILTER column "on/off" array is tedious (lots of 0's), there is another approach. Rather than specifying a very large contiguous range in the inner FILTER, you could feed that large contiguous range into an INDEX function and have it return only the three columns that you need. Here, I've separated Date (A) and Site (C) and Value (G) so that they are no longer contiguous, and suppose I do not want to write the awkward {1,0,0,0,0,0,1} outer filter column on/off array. Instead, I could use this:
Excel Formula:
INDEX($A$2:$G$92,SEQUENCE(COUNT($A$2:$A$92)),{1,3,7})
...in the inner FILTER to consolidate columns 1,3, and 7 (A,C,G) into a range that FILTER operates on just like before.
MrExcel_20220629b.xlsx
ABCDEFGHI
1DateSiteValue47
225/12/2006xx
321/12/2007x17.62
421/12/2007x2
518/11/2008x
618/11/2008x
728/11/2008y
828/11/2008y1
928/11/2008y2
Sheet1 (2)
Cell Formulas
RangeFormula
I1I1=SUM(($C$2:$C$92=I$2)*($G$2:$G$92<>""))
I3I3=PERCENTILE.INC(INDEX(SORT(FILTER(FILTER(INDEX($A$2:$G$92,SEQUENCE(COUNT($A$2:$A$92)),{1,3,7}), ($C$2:$C$92=I$2)*($G$2:$G$92<>"") ),{1,0,1}),1,-1),SEQUENCE(MIN(24,SUM( ($C$2:$C$92=I$2)*($G$2:$G$92<>"") ))),2),0.8)
 
Upvote 0
Thanks @KRice,

It's working perfectly now. I have set the formula up in such a way that I can fill right and just change the number of the Values column. I think the biggest hassle will be once I add additional data to the bottom of the spreadsheet and I have to extend the selection of the various columns.
 
Upvote 0
You have a couple of options to deal with extending the ranges. You could convert your data table into an official Excel table and use structured references. By doing so, Excel will automatically account for data added to the end of the table. The other option is to make the range overly large right now...perhaps you anticipate this doubling in size, so instead of rows 2:92, you could make it 2:1000 to accommodate anticipated growth and more. The FILTER functions will still work fine, even with no data present...although if you do this and you are using the INDEX variation I mentioned in post #14, change the COUNT function to ROWS($A$2:$A$1000), as the COUNT function will not consider the spare blank cells, whereas the ROWS function will simply count the rows.
 
Last edited:
Upvote 0
Thanks @KRice, I'll look into both of those options. I may start out initially with making the range larger and then convert to the excel table option.
 
Upvote 0
Post back if you run into any difficulties. Here is an example of an expanded range using non-contiguous data columns (therefore the INDEX variation is used), and the ROWS functions is used instead of the COUNT function:
Excel Formula:
=PERCENTILE.INC(INDEX(SORT(FILTER(FILTER(INDEX($A$2:$G$1000,SEQUENCE(ROWS($A$2:$A$1000)),{1,3,7}), ($C$2:$C$1000=I$2)*($G$2:$G$1000<>"") ),{1,0,1}),1,-1),SEQUENCE(MIN(24,SUM( ($C$2:$C$1000=I$2)*($G$2:$G$1000<>"") ))),2),0.8)
...or slightly shortening it with the LET function:
Excel Formula:
=LET(fcrit,($C$2:$C$1000=I$2)*($G$2:$G$1000<>""),PERCENTILE.INC(INDEX(SORT(FILTER(FILTER(INDEX($A$2:$G$1000,SEQUENCE(ROWS($A$2:$A$1000)),{1,3,7}),fcrit),{1,0,1}),1,-1),SEQUENCE(MIN(24,SUM(fcrit))),2),0.8))
 
Upvote 0
Solution
Brilliant, you've been such a great help with this!! Thanks for your patience.
 
Upvote 0

Forum statistics

Threads
1,215,523
Messages
6,125,317
Members
449,218
Latest member
Excel Master

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top