Rank (Spearman) Correlation of Non-Adjacent Values in Excel in Single Formula

BuckChuker · Apr 30, 2020

Rank (Spearman) Correlation of Non-Adjacent Values in Excel in Single Formula

1. In a single formula, I am trying to get the Rank (Spearman) correlation between two arrays. One of the arrays contains non-adjacent values. I can get the Pearson correlation with this formula:

=CORREL(CHOOSE({1;2;3},U7,AM7,BE7),HR1:HT1)

2. The formula in #1 works because CHOOSE creates an array of the 3 non-adjacent cells and effectively makes them appear ‘adjacent’ to each other.

3. The Rank correlation is just the correlation of the ranks, and to get the Rank correlation, I just need to convert the values in cells U7, AM7, and BE7 to ranks. (The values in HR1:HT1 are already ranks.)

4. If the non-adjacent values in U7, AM7, and BE7 were adjacent to each other (e.g., in cells U7:U9), I could convert the values to their respective ranks and get the Rank correlation like this:

=CORREL(RANK.AVG(U7:U9,U7:U9),HR1:HT1)

5. Consequently, I thought I could simply substitute the adjacent values created by “CHOOSE({1;2;3},U7,AM7,BE7)” in place of the naturally occurring adjacent values in (U7:U9) in the formula in #4 to get the Rank correlation, something like this:

=CORREL(RANK.AVG(CHOOSE({1;2;3},U7,AM7,BE7),CHOOSE({1;2;3},U7,AM7,BE7)),HR1:HT1)

Unfortunately, this doesn’t work, either as a regular formula or an array formula. Nor do these ‘unions’ work:

=CORREL(CHOOSE({1;2;3},RANK((U7,AM7,BE7),(U7,AM7,BE7))),HR1:HT1)

=CORREL(RANK.AVG((U7,AM7,BE7),(U7,AM7,BE7)),HR1:HT1)

I could get the answer by adding 10 more columns to my spreadsheet, but I don't want to do that. I need to get the Rank Correlation in one formula.

Can anyone figure out how to write such a formula?

Thanks for your help.

Eric W · Apr 30, 2020

Welcome to the forum.

Excel has its issues with disjoint ranges. Most functions are not set up to handle them. In some cases, like CORREL, it will accept an array instead of a range. Using your CHOOSE construct converts the range into a contiguous array. Some functions, like RANK.AVG requires a range, so the CHOOSE trick won't work. But there are some other tricks. It's possible to write a version of RANK.AVG like this:

=SUMPRODUCT(--(H$3:H$9>H3))+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2+1

You could (with some effort) convert that to a disjoint RANK.AVG function which you could use in your formula. Similarly, you could go back to the definition of CORREL (see CORREL function ) and use the formula directly in some manner. Before I work on this any more though, are you looking at exactly 3 cells, or is this just an example? With only 3 cells, there are some shortcuts we could make. If you have more than 3 cells, are they always 18 columns apart on the same row?

BuckChuker · Apr 30, 2020

Eric W said:
Welcome to the forum.

Excel has its issues with disjoint ranges. Most functions are not set up to handle them. In some cases, like CORREL, it will accept an array instead of a range. Using your CHOOSE construct converts the range into a contiguous array. Some functions, like RANK.AVG requires a range, so the CHOOSE trick won't work. But there are some other tricks. It's possible to write a version of RANK.AVG like this:

=SUMPRODUCT(--(H$3:H$9>H3))+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2+1

You could (with some effort) convert that to a disjoint RANK.AVG function which you could use in your formula. Similarly, you could go back to the definition of CORREL (see CORREL function ) and use the formula directly in some manner. Before I work on this any more though, are you looking at exactly 3 cells, or is this just an example? With only 3 cells, there are some shortcuts we could make. If you have more than 3 cells, are they always 18 columns apart on the same row?

Eric - thanks much for your help. The three cells were just an example. There are 10 cells, and they are all 18 columns apart. I will also need another similar correlation formula that also has 10 non-adjacent cells, but those cells are all 22 columns apart. I would love to use the correl function directly.

Here are examples of the formulas with which I am returning the Pearson correlations (one 18 columns apart and one 22 columns apart):

=CORREL(CHOOSE({1;2;3;4;5;6;7;8;9;10},C6,Y6,AU6,BQ6,CM6,DI6,EE6,FA6,FW6,GS6),$HR$1:$IA$1)

=CORREL(CHOOSE({1;2;3;4;5;6;7;8;9;10},U6,AM6,BE6,BW6,CO6,DG6,DY6,EQ6,FI6,GA6),$HR$1:$IA$1)

Let me know if there is any other information I can provide.

Eric W · Apr 30, 2020

Here's my first stab at it. I was a bit surprised it worked.

Book1

H

I

J

K

T

U

AM

BE

BF

HQ

HR

HS

HT

HU

1

11

13

17

2

RANK.AVG

SUMPRODUCT

3

1

7

3

1

2

-0.32733

4

5

5

6

4

5

7

2

6

5

13

7

8

6

2

9

7

10

11

12

Sheet19

Cell Formulas
Range	Formula
I3:I9	I3	=RANK.AVG(H3,H$3:H$9)
J3:J9	J3	=SUMPRODUCT(--(H$3:H$9>H3))+1+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2
HU3	HU3	=CORREL(HR3:HT3,HR1:HT1)
U4,BE4,AM4	U4	=COLUMN(U4)
U9	U9	=CORREL(CHOOSE({1;2;3},U7,AM7,BE7),HR1:HT1)
U10	U10	=CORREL(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18)),HR1:HT1)
U11	U11	=CORREL(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}18))>TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}18))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}18))=TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}18))))),{1;1;1})-1)/2+1,HR1:HT1)
Press CTRL+SHIFT+ENTER to enter array formulas surrounded with curly braces.

Your original formula is in U9. The U10 formula is how I rewrote it using SUBTOTAL(OFFSET instead of CHOOSE. You should be able to see how to change the array constant {0,1,2} to {0,1,2,3,4,5,6,7,8,9} for 10 values.

Over in column I is the RANK.AVG formula, and column J shows that the SUMPRODUCT formula I derived gives the same results.

Then the U11 formula is the RANK CORRELATION formula. You can see how I used the same SUBTOTAL(OFFSET structure. Notice that the SUMPRODUCT formula in J3 has 7 comparisons, 1 for each row. Each of 7 rows has 7 comparisons, for a total of 49 (n squared). The RANGE() > TRANSPOSE(RANGE()) performs all n squared comparisons, then the MMULT sums up each individual column to get the value for each row. You should also be able to see that you'll need to change the {1;1;1} array constant to have 10 1s in it to work with 10 values. The second MMULT construct is equivalent to the second SUMPRODUCT in the J3 formula. Et voila! You have your rankings. Then CORREL gives the correlation. The HU3 formula is just to show I got it right.

There might be ways to simplify this, I'll look at it a bit more. But it shows the lengths you have to go to to use disjoint ranges. I might have been able to shorten it a bit by using the Correlation formula, but you said you wanted the function. Anyway, have a look and see what you think.

BuckChuker · May 1, 2020

Eric W said:
Here's my first stab at it. I was a bit surprised it worked.

Book1
H I J K T U AM BE BF HQ HR HS HT HU
1 11 13 17
2 RANK.AVG SUMPRODUCT
3 1 7 7 3 1 2 -0.32733
4 5 3.5 3.5 21 39 57
5 5 3.5 3.5
6 4 5 5
7 2 6 6 5 13 7
8 6 2 2
9 7 1 1 0.052414
10 0.052414
11 -0.32733
12
Sheet19
Cell Formulas
Range Formula
I3:I9 I3 =RANK.AVG(H3,H$3:H$9)
J3:J9 J3 =SUMPRODUCT(--(H$3:H$9>H3))+1+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2
HU3 HU3 =CORREL(HR3:HT3,HR1:HT1)
U4,BE4,AM4 U4 =COLUMN(U4)
U9 U9 =CORREL(CHOOSE({1;2;3},U7,AM7,BE7),HR1:HT1)
U10 U10 =CORREL(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18)),HR1:HT1)
U11 U11 =CORREL(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))>TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))=TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})-1)/2+1,HR1:HT1)
Press CTRL+SHIFT+ENTER to enter array formulas surrounded with curly braces.

Your original formula is in U9. The U10 formula is how I rewrote it using SUBTOTAL(OFFSET instead of CHOOSE. You should be able to see how to change the array constant {0,1,2} to {0,1,2,3,4,5,6,7,8,9} for 10 values.

Over in column I is the RANK.AVG formula, and column J shows that the SUMPRODUCT formula I derived gives the same results.

Then the U11 formula is the RANK CORRELATION formula. You can see how I used the same SUBTOTAL(OFFSET structure. Notice that the SUMPRODUCT formula in J3 has 7 comparisons, 1 for each row. Each of 7 rows has 7 comparisons, for a total of 49 (n squared). The RANGE() > TRANSPOSE(RANGE()) performs all n squared comparisons, then the MMULT sums up each individual column to get the value for each row. You should also be able to see that you'll need to change the {1;1;1} array constant to have 10 1s in it to work with 10 values. The second MMULT construct is equivalent to the second SUMPRODUCT in the J3 formula. Et voila! You have your rankings. Then CORREL gives the correlation. The HU3 formula is just to show I got it right.

There might be ways to simplify this, I'll look at it a bit more. But it shows the lengths you have to go to to use disjoint ranges. I might have been able to shorten it a bit by using the Correlation formula, but you said you wanted the function. Anyway, have a look and see what you think.

Maybe I mis-spoke. I would prefer to use the correl formula.

Thanks.

KRice · May 1, 2020

If I'm not mistaken, there are few tweaks necessary to give the Spearman rho coefficient, as the rank of x and the rank of y are to be correlated. Here's a short example using the three x,y pairs. The table develops the various quanities that are eventually used by the "basic" formula in HS16. The shorter version in HS17 grabs the ranks directly and submits them as arguments to the CORREL function.

MrExcelBook20200430.xlsx

AM

BE

HQ

HR

HS

HT

HU

HV

HW

HX

HY

HZ

IA

16

Spearman rho long basic

0.5

17

Spearman rho with RankX vs Rank Y

0.5

18

19

rxbar

rybar

sum

20

2

1

2

21

x

y

rx

ry

rx-ry

(rx-ry)2

rx-rxbar

ry-rybar

(rx-rxbar)* (ry-rybar)

(rx-rxbar)2

(ry-rybar)2

22

11

5

23

13

24

17

7

Sheet3

Cell Formulas
Range	Formula
HS16	HS16	=HY20/SQRT(HZ20*IA20)
HS17	HS17	=CORREL(HS22:HS24,HT22:HT24)
HS20:HT20	HS20	=SUBTOTAL(1,HS22:HS24)
HV20,HY20:IA20	HV20	=SUBTOTAL(9,HV22:HV24)
HS22:HT22	HS22	=IFERROR(RANK.AVG(HQ22,HQ$22:HQ$24,1),"")
HU22:HU24	HU22	=IFERROR(HS22-HT22,"")
HV22:HV24	HV22	=IFERROR(HU22^2,"")
HW22:HX24	HW22	=IFERROR(HS22-HS$20,"")
HY22:HY24	HY22	=IFERROR(HW22*HX22,"")
HZ22:IA24	HZ22	=IFERROR(HW22^2,"")
HS23:HT24	HS23	=IFERROR(RANK.AVG(HQ23,HQ$15:HQ$24,1),"")

If this is correct, then Eric's beautiful and complex formula might need to become even more complex, like that shown in J11 below, as some scheme is needed to pass the ranks of the second array to CORREL. I changed one thing in Eric's formula...I changed the >TRANSPOSE to a <TRANSPOSE to make the ranks ascending from low value (1) and up.

MrExcelBook20200430.xlsx

H

I

J

T

U

AM

BE

HQ

HR

HS

HT

HU

1

11

13

17

2

RANK.AVG

SUMPRODUCT

3

1

7

3

1

2

-0.32732684

4

5

5

6

4

5

7

2

6

5

13

7

8

6

2

9

7

1

0.052414242

10

0.052414242

11

Spearman rho

0.5

-0.32732684

Sheet3

Cell Formulas
Range	Formula
I3:I9	I3	=RANK.AVG(H3,H$3:H$9)
J3:J9	J3	=SUMPRODUCT(--(H$3:H$9>H3))+1+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2
HU3	HU3	=CORREL(HR3:HT3,HR1:HT1)
U4,BE4,AM4	U4	=COLUMN(U4)
J11	J11	=CORREL(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}18))<TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}18))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}18))=TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}18))))),{1;1;1})-1)/2+1, MMULT((--(SUBTOTAL(9,OFFSET(HR1,0,{0,1,2}))<TRANSPOSE(SUBTOTAL(9,OFFSET(HR1,0,{0,1,2}))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(HR1,0,{0,1,2}))=TRANSPOSE(SUBTOTAL(9,OFFSET(HR1,0,{0,1,2}))))),{1;1;1})-1)/2+1)
U9	U9	=CORREL(CHOOSE({1;2;3},U7,AM7,BE7),HR1:HT1)
U10	U10	=CORREL(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18)),HR1:HT1)
U11	U11	=CORREL(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}18))>TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}18))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}18))=TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}18))))),{1;1;1})-1)/2+1,HR1:HT1)

KRice · May 1, 2020

To better illustrate this over 10 pairs of (x,y) data, here's an example drawn from Spearman's rank correlation coefficient - Wikipedia with a reported Spearman's coefficient of −0.175757575. In this example, the starting points for the x and y arrays are in different rows and columns, and the spacings between data points are also different, with the x points shown in every other column, while the y points are in contiguous columns. To clarify this, the first few points then are (106,7), (100,27), and (86,2). The formula has two main components: one for ranking the 1st array and the other for ranking the 2nd array. To adapt this for use, the starting reference location in the OFFSET formulas point to the first data point in the relevant array, and the "*1" or "*2" placeholders are changed to reflect the column spacing for that array (each of these is done in 4 places for each X and Y array). In this example, every other column means "*2" for the first array and continuous columns means "*1" for the second array.

@BuckChuker Earlier you mentioned that you preferred to use the CORREL function. These solutions do use that function, but the main issue is finding some way to create an array of rankings that will be accepted by the CORREL function. Eric's proposed solution manages to do that.

MrExcelBook20200430.xlsx

T

U

AM

BE

HQ

HR

HS

HT

HU

HV

HW

HX

HY

HZ

IA

IB

IC

ID

IE

IF

IG

IH

II

IJ

32

Spearman's rho

-0.17575758

X_i -->

106

100

86

101

99

103

97

113

112

110

33

Y_i ->

7

27

2

50

28

29

20

12

6

17

Sheet3

Cell Formulas
Range	Formula
U32	U32	=CORREL(MMULT((--(SUBTOTAL(9,OFFSET(HR32,0,{0,1,2,3,4,5,6,7,8,9}2))<TRANSPOSE(SUBTOTAL(9,OFFSET(HR32,0,{0,1,2,3,4,5,6,7,8,9}2))))),{1;1;1;1;1;1;1;1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(HR32,0,{0,1,2,3,4,5,6,7,8,9}2))=TRANSPOSE(SUBTOTAL(9,OFFSET(HR32,0,{0,1,2,3,4,5,6,7,8,9}2))))),{1;1;1;1;1;1;1;1;1;1})-1)/2+1, MMULT((--(SUBTOTAL(9,OFFSET(HS33,0,{0,1,2,3,4,5,6,7,8,9}1))<TRANSPOSE(SUBTOTAL(9,OFFSET(HS33,0,{0,1,2,3,4,5,6,7,8,9}1))))),{1;1;1;1;1;1;1;1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(HS33,0,{0,1,2,3,4,5,6,7,8,9}1))=TRANSPOSE(SUBTOTAL(9,OFFSET(HS33,0,{0,1,2,3,4,5,6,7,8,9}1))))),{1;1;1;1;1;1;1;1;1;1})-1)/2+1)

Eric W · May 1, 2020

KRice, thank you for the deeper dive into the actual statistics part of the question. My statistics classes were years ago, and although some of the terminology is familiar, it would take a lot for me to get good at it again. As you suggested, my main contribution here is to rank disjoint values, and get them into a format that CORREL can handle.

BuckChuker, I originally thought that I could use this formula:

from the Microsoft CORREL help page to calculate the rank correlation, possibly easier than using CORREL itself, possibly with using other stats functions like DEVSQ or AVEDEV. Possibly with only 3 values instead of 10 it could have worked, but after looking at it some more, the CORREL function is the way to go. The largest part of the function is the double MMULT trick to create the rankings, and that's not going to get any shorter. So at this point, I'd stick with the current formula. Make sure you test it with known values to see that you get the right answers. See if KRice's comments are something you need to consider.

KRice · May 1, 2020

I'm glad to help, Eric. I too think the CORREL function is the way to go...if only there were a more direct way to compile a rankings array for disjoint data sets. It hurts my head to look at it, but your formula for doing so is clever and it works. I also wondered about using the basic function you've shown---that is what the first table in my post #6 was exploring, and the formula in HS16 pulls the various summation terms into that formula. The problem, of course, is that the x_i, y_i, xbar, and ybar terms are all derived from the rankings arrays, and if we have those arrays, we're better off taking the shortcut and simply feeding them directly into CORREL.

Rank (Spearman) Correlation of Non-Adjacent Values in Excel in Single Formula

BuckChuker

New Member

Excel Facts

Eric W

MrExcel MVP

BuckChuker

New Member

Eric W

MrExcel MVP

BuckChuker

New Member

KRice

Well-known Member

KRice

Well-known Member

Eric W

MrExcel MVP

KRice

Well-known Member

Similar threads

Forum statistics

Share this page

Rank (Spearman) Correlation of Non-Adjacent Values in Excel in Single Formula

New Member

Excel Facts

MrExcel MVP

New Member

MrExcel MVP

New Member

Well-known Member

Well-known Member

MrExcel MVP

Well-known Member

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock