Rank (Spearman) Correlation of Non-Adjacent Values in Excel in Single Formula

BuckChuker

New Member
Joined
Apr 30, 2020
Messages
3
Office Version
2010
Platform
Windows
Rank (Spearman) Correlation of Non-Adjacent Values in Excel in Single Formula

1. In a single formula, I am trying to get the Rank (Spearman) correlation between two arrays. One of the arrays contains non-adjacent values. I can get the Pearson correlation with this formula:

=CORREL(CHOOSE({1;2;3},U7,AM7,BE7),HR1:HT1)

2. The formula in #1 works because CHOOSE creates an array of the 3 non-adjacent cells and effectively makes them appear ‘adjacent’ to each other.

3. The Rank correlation is just the correlation of the ranks, and to get the Rank correlation, I just need to convert the values in cells U7, AM7, and BE7 to ranks. (The values in HR1:HT1 are already ranks.)

4. If the non-adjacent values in U7, AM7, and BE7 were adjacent to each other (e.g., in cells U7:U9), I could convert the values to their respective ranks and get the Rank correlation like this:

=CORREL(RANK.AVG(U7:U9,U7:U9),HR1:HT1)

5. Consequently, I thought I could simply substitute the adjacent values created by “CHOOSE({1;2;3},U7,AM7,BE7)” in place of the naturally occurring adjacent values in (U7:U9) in the formula in #4 to get the Rank correlation, something like this:

=CORREL(RANK.AVG(CHOOSE({1;2;3},U7,AM7,BE7),CHOOSE({1;2;3},U7,AM7,BE7)),HR1:HT1)

Unfortunately, this doesn’t work, either as a regular formula or an array formula. Nor do these ‘unions’ work:

=CORREL(CHOOSE({1;2;3},RANK((U7,AM7,BE7),(U7,AM7,BE7))),HR1:HT1)

=CORREL(RANK.AVG((U7,AM7,BE7),(U7,AM7,BE7)),HR1:HT1)


I could get the answer by adding 10 more columns to my spreadsheet, but I don't want to do that. I need to get the Rank Correlation in one formula.

Can anyone figure out how to write such a formula?

Thanks for your help.
 

Some videos you may like

Excel Facts

What do {} around a formula in the formula bar mean?
{Formula} means the formula was entered using Ctrl+Shift+Enter signifying an old-style array formula.

Eric W

MrExcel MVP
Joined
Aug 18, 2015
Messages
9,750
Welcome to the forum.

Excel has its issues with disjoint ranges. Most functions are not set up to handle them. In some cases, like CORREL, it will accept an array instead of a range. Using your CHOOSE construct converts the range into a contiguous array. Some functions, like RANK.AVG requires a range, so the CHOOSE trick won't work. But there are some other tricks. It's possible to write a version of RANK.AVG like this:

=SUMPRODUCT(--(H$3:H$9>H3))+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2+1

You could (with some effort) convert that to a disjoint RANK.AVG function which you could use in your formula. Similarly, you could go back to the definition of CORREL (see CORREL function ) and use the formula directly in some manner. Before I work on this any more though, are you looking at exactly 3 cells, or is this just an example? With only 3 cells, there are some shortcuts we could make. If you have more than 3 cells, are they always 18 columns apart on the same row?
 

BuckChuker

New Member
Joined
Apr 30, 2020
Messages
3
Office Version
2010
Platform
Windows
Welcome to the forum.

Excel has its issues with disjoint ranges. Most functions are not set up to handle them. In some cases, like CORREL, it will accept an array instead of a range. Using your CHOOSE construct converts the range into a contiguous array. Some functions, like RANK.AVG requires a range, so the CHOOSE trick won't work. But there are some other tricks. It's possible to write a version of RANK.AVG like this:

=SUMPRODUCT(--(H$3:H$9>H3))+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2+1

You could (with some effort) convert that to a disjoint RANK.AVG function which you could use in your formula. Similarly, you could go back to the definition of CORREL (see CORREL function ) and use the formula directly in some manner. Before I work on this any more though, are you looking at exactly 3 cells, or is this just an example? With only 3 cells, there are some shortcuts we could make. If you have more than 3 cells, are they always 18 columns apart on the same row?
Eric - thanks much for your help. The three cells were just an example. There are 10 cells, and they are all 18 columns apart. I will also need another similar correlation formula that also has 10 non-adjacent cells, but those cells are all 22 columns apart. I would love to use the correl function directly.

Here are examples of the formulas with which I am returning the Pearson correlations (one 18 columns apart and one 22 columns apart):

=CORREL(CHOOSE({1;2;3;4;5;6;7;8;9;10},C6,Y6,AU6,BQ6,CM6,DI6,EE6,FA6,FW6,GS6),$HR$1:$IA$1)

=CORREL(CHOOSE({1;2;3;4;5;6;7;8;9;10},U6,AM6,BE6,BW6,CO6,DG6,DY6,EQ6,FI6,GA6),$HR$1:$IA$1)

Let me know if there is any other information I can provide.
 

Eric W

MrExcel MVP
Joined
Aug 18, 2015
Messages
9,750
Here's my first stab at it. I was a bit surprised it worked.

Book1
HIJKTUAMBEBFHQHRHSHTHU
1111317
2RANK.AVGSUMPRODUCT
31
7
7
312
-0.32733
45
3.5
3.5
21
39
57
55
3.5
3.5
64
5
5
72
6
6
5137
86
2
2
97
1
1
0.052414
10
0.052414
11
-0.32733
12
Sheet19
Cell Formulas
RangeFormula
I3:I9I3=RANK.AVG(H3,H$3:H$9)
J3:J9J3=SUMPRODUCT(--(H$3:H$9>H3))+1+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2
HU3HU3=CORREL(HR3:HT3,HR1:HT1)
U4,BE4,AM4U4=COLUMN(U4)
U9U9=CORREL(CHOOSE({1;2;3},U7,AM7,BE7),HR1:HT1)
U10U10=CORREL(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18)),HR1:HT1)
U11U11=CORREL(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))>TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))=TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})-1)/2+1,HR1:HT1)
Press CTRL+SHIFT+ENTER to enter array formulas surrounded with curly braces.


Your original formula is in U9. The U10 formula is how I rewrote it using SUBTOTAL(OFFSET instead of CHOOSE. You should be able to see how to change the array constant {0,1,2} to {0,1,2,3,4,5,6,7,8,9} for 10 values.

Over in column I is the RANK.AVG formula, and column J shows that the SUMPRODUCT formula I derived gives the same results.

Then the U11 formula is the RANK CORRELATION formula. You can see how I used the same SUBTOTAL(OFFSET structure. Notice that the SUMPRODUCT formula in J3 has 7 comparisons, 1 for each row. Each of 7 rows has 7 comparisons, for a total of 49 (n squared). The RANGE() > TRANSPOSE(RANGE()) performs all n squared comparisons, then the MMULT sums up each individual column to get the value for each row. You should also be able to see that you'll need to change the {1;1;1} array constant to have 10 1s in it to work with 10 values. The second MMULT construct is equivalent to the second SUMPRODUCT in the J3 formula. Et voila! You have your rankings. Then CORREL gives the correlation. The HU3 formula is just to show I got it right.

There might be ways to simplify this, I'll look at it a bit more. But it shows the lengths you have to go to to use disjoint ranges. I might have been able to shorten it a bit by using the Correlation formula, but you said you wanted the function. Anyway, have a look and see what you think.
 

BuckChuker

New Member
Joined
Apr 30, 2020
Messages
3
Office Version
2010
Platform
Windows
Here's my first stab at it. I was a bit surprised it worked.

Book1
HIJKTUAMBEBFHQHRHSHTHU
1111317
2RANK.AVGSUMPRODUCT
31
7
7
312
-0.32733
45
3.5
3.5
21
39
57
55
3.5
3.5
64
5
5
72
6
6
5137
86
2
2
97
1
1
0.052414
10
0.052414
11
-0.32733
12
Sheet19
Cell Formulas
RangeFormula
I3:I9I3=RANK.AVG(H3,H$3:H$9)
J3:J9J3=SUMPRODUCT(--(H$3:H$9>H3))+1+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2
HU3HU3=CORREL(HR3:HT3,HR1:HT1)
U4,BE4,AM4U4=COLUMN(U4)
U9U9=CORREL(CHOOSE({1;2;3},U7,AM7,BE7),HR1:HT1)
U10U10=CORREL(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18)),HR1:HT1)
U11U11=CORREL(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))>TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))=TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})-1)/2+1,HR1:HT1)
Press CTRL+SHIFT+ENTER to enter array formulas surrounded with curly braces.


Your original formula is in U9. The U10 formula is how I rewrote it using SUBTOTAL(OFFSET instead of CHOOSE. You should be able to see how to change the array constant {0,1,2} to {0,1,2,3,4,5,6,7,8,9} for 10 values.

Over in column I is the RANK.AVG formula, and column J shows that the SUMPRODUCT formula I derived gives the same results.

Then the U11 formula is the RANK CORRELATION formula. You can see how I used the same SUBTOTAL(OFFSET structure. Notice that the SUMPRODUCT formula in J3 has 7 comparisons, 1 for each row. Each of 7 rows has 7 comparisons, for a total of 49 (n squared). The RANGE() > TRANSPOSE(RANGE()) performs all n squared comparisons, then the MMULT sums up each individual column to get the value for each row. You should also be able to see that you'll need to change the {1;1;1} array constant to have 10 1s in it to work with 10 values. The second MMULT construct is equivalent to the second SUMPRODUCT in the J3 formula. Et voila! You have your rankings. Then CORREL gives the correlation. The HU3 formula is just to show I got it right.

There might be ways to simplify this, I'll look at it a bit more. But it shows the lengths you have to go to to use disjoint ranges. I might have been able to shorten it a bit by using the Correlation formula, but you said you wanted the function. Anyway, have a look and see what you think.
Maybe I mis-spoke. I would prefer to use the correl formula.

Thanks.
 

KRice

Well-known Member
Joined
Dec 9, 2003
Messages
714
Office Version
2019
Platform
Windows
If I'm not mistaken, there are few tweaks necessary to give the Spearman rho coefficient, as the rank of x and the rank of y are to be correlated. Here's a short example using the three x,y pairs. The table develops the various quanities that are eventually used by the "basic" formula in HS16. The shorter version in HS17 grabs the ranks directly and submits them as arguments to the CORREL function.
MrExcelBook20200430.xlsx
AMBEHQHRHSHTHUHVHWHXHYHZIA
16Spearman rho long basic
0.5
17Spearman rho with RankX vs Rank Y
0.5
18
19rxbarrybarsum
20
2
2
2
1
2
2
21xyrxryrx-ry(rx-ry)2rx-rxbarry-rybar(rx-rxbar)* (ry-rybar)(rx-rxbar)2(ry-rybar)2
22115
1
1
0
0
-1
-1
1
1
1
231313
2
3
-1
1
0
1
0
0
1
24177
3
2
1
1
1
0
0
1
0
Sheet3
Cell Formulas
RangeFormula
HS16HS16=HY20/SQRT(HZ20*IA20)
HS17HS17=CORREL(HS22:HS24,HT22:HT24)
HS20:HT20HS20=SUBTOTAL(1,HS22:HS24)
HV20,HY20:IA20HV20=SUBTOTAL(9,HV22:HV24)
HS22:HT22HS22=IFERROR(RANK.AVG(HQ22,HQ$22:HQ$24,1),"")
HU22:HU24HU22=IFERROR(HS22-HT22,"")
HV22:HV24HV22=IFERROR(HU22^2,"")
HW22:HX24HW22=IFERROR(HS22-HS$20,"")
HY22:HY24HY22=IFERROR(HW22*HX22,"")
HZ22:IA24HZ22=IFERROR(HW22^2,"")
HS23:HT24HS23=IFERROR(RANK.AVG(HQ23,HQ$15:HQ$24,1),"")


If this is correct, then Eric's beautiful and complex formula might need to become even more complex, like that shown in J11 below, as some scheme is needed to pass the ranks of the second array to CORREL. I changed one thing in Eric's formula...I changed the >TRANSPOSE to a <TRANSPOSE to make the ranks ascending from low value (1) and up.
MrExcelBook20200430.xlsx
HIJTUAMBEHQHRHSHTHU
1111317
2RANK.AVGSUMPRODUCT
31
7
7
312
-0.32732684
45
3.5
3.5
21
39
57
55
3.5
3.5
64
5
5
72
6
6
5137
86
2
2
97
1
1
0.052414242
10
0.052414242
11Spearman rho
0.5
-0.32732684
Sheet3
Cell Formulas
RangeFormula
I3:I9I3=RANK.AVG(H3,H$3:H$9)
J3:J9J3=SUMPRODUCT(--(H$3:H$9>H3))+1+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2
HU3HU3=CORREL(HR3:HT3,HR1:HT1)
U4,BE4,AM4U4=COLUMN(U4)
J11J11=CORREL(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))<TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))=TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})-1)/2+1, MMULT((--(SUBTOTAL(9,OFFSET(HR1,0,{0,1,2}))<TRANSPOSE(SUBTOTAL(9,OFFSET(HR1,0,{0,1,2}))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(HR1,0,{0,1,2}))=TRANSPOSE(SUBTOTAL(9,OFFSET(HR1,0,{0,1,2}))))),{1;1;1})-1)/2+1)
U9U9=CORREL(CHOOSE({1;2;3},U7,AM7,BE7),HR1:HT1)
U10U10=CORREL(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18)),HR1:HT1)
U11U11=CORREL(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))>TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))=TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})-1)/2+1,HR1:HT1)
 

KRice

Well-known Member
Joined
Dec 9, 2003
Messages
714
Office Version
2019
Platform
Windows
To better illustrate this over 10 pairs of (x,y) data, here's an example drawn from Spearman's rank correlation coefficient - Wikipedia with a reported Spearman's coefficient of −0.175757575. In this example, the starting points for the x and y arrays are in different rows and columns, and the spacings between data points are also different, with the x points shown in every other column, while the y points are in contiguous columns. To clarify this, the first few points then are (106,7), (100,27), and (86,2). The formula has two main components: one for ranking the 1st array and the other for ranking the 2nd array. To adapt this for use, the starting reference location in the OFFSET formulas point to the first data point in the relevant array, and the "*1" or "*2" placeholders are changed to reflect the column spacing for that array (each of these is done in 4 places for each X and Y array). In this example, every other column means "*2" for the first array and continuous columns means "*1" for the second array.

@BuckChuker Earlier you mentioned that you preferred to use the CORREL function. These solutions do use that function, but the main issue is finding some way to create an array of rankings that will be accepted by the CORREL function. Eric's proposed solution manages to do that.
MrExcelBook20200430.xlsx
TUAMBEHQHRHSHTHUHVHWHXHYHZIAIBICIDIEIFIGIHIIIJ
32Spearman's rho
-0.17575758
X_i -->106100861019910397113112110
33Y_i ->72725028292012617
Sheet3
Cell Formulas
RangeFormula
U32U32=CORREL(MMULT((--(SUBTOTAL(9,OFFSET(HR32,0,{0,1,2,3,4,5,6,7,8,9}*2))<TRANSPOSE(SUBTOTAL(9,OFFSET(HR32,0,{0,1,2,3,4,5,6,7,8,9}*2))))),{1;1;1;1;1;1;1;1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(HR32,0,{0,1,2,3,4,5,6,7,8,9}*2))=TRANSPOSE(SUBTOTAL(9,OFFSET(HR32,0,{0,1,2,3,4,5,6,7,8,9}*2))))),{1;1;1;1;1;1;1;1;1;1})-1)/2+1, MMULT((--(SUBTOTAL(9,OFFSET(HS33,0,{0,1,2,3,4,5,6,7,8,9}*1))<TRANSPOSE(SUBTOTAL(9,OFFSET(HS33,0,{0,1,2,3,4,5,6,7,8,9}*1))))),{1;1;1;1;1;1;1;1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(HS33,0,{0,1,2,3,4,5,6,7,8,9}*1))=TRANSPOSE(SUBTOTAL(9,OFFSET(HS33,0,{0,1,2,3,4,5,6,7,8,9}*1))))),{1;1;1;1;1;1;1;1;1;1})-1)/2+1)
 

Eric W

MrExcel MVP
Joined
Aug 18, 2015
Messages
9,750
KRice, thank you for the deeper dive into the actual statistics part of the question. My statistics classes were years ago, and although some of the terminology is familiar, it would take a lot for me to get good at it again. As you suggested, my main contribution here is to rank disjoint values, and get them into a format that CORREL can handle.

BuckChuker, I originally thought that I could use this formula:

Equation


from the Microsoft CORREL help page to calculate the rank correlation, possibly easier than using CORREL itself, possibly with using other stats functions like DEVSQ or AVEDEV. Possibly with only 3 values instead of 10 it could have worked, but after looking at it some more, the CORREL function is the way to go. The largest part of the function is the double MMULT trick to create the rankings, and that's not going to get any shorter. So at this point, I'd stick with the current formula. Make sure you test it with known values to see that you get the right answers. See if KRice's comments are something you need to consider.
 

KRice

Well-known Member
Joined
Dec 9, 2003
Messages
714
Office Version
2019
Platform
Windows
I'm glad to help, Eric. I too think the CORREL function is the way to go...if only there were a more direct way to compile a rankings array for disjoint data sets. It hurts my head to look at it, but your formula for doing so is clever and it works. I also wondered about using the basic function you've shown---that is what the first table in my post #6 was exploring, and the formula in HS16 pulls the various summation terms into that formula. The problem, of course, is that the x_i, y_i, xbar, and ybar terms are all derived from the rankings arrays, and if we have those arrays, we're better off taking the shortcut and simply feeding them directly into CORREL.
 

Watch MrExcel Video

Forum statistics

Threads
1,099,044
Messages
5,466,209
Members
406,473
Latest member
Dunno123

This Week's Hot Topics

Top