Rank (Spearman) Correlation of Non-Adjacent Values in Excel in Single Formula

BuckChuker

New Member
Joined
Apr 30, 2020
Messages
3
Office Version
  1. 2010
Platform
  1. Windows
Rank (Spearman) Correlation of Non-Adjacent Values in Excel in Single Formula

1. In a single formula, I am trying to get the Rank (Spearman) correlation between two arrays. One of the arrays contains non-adjacent values. I can get the Pearson correlation with this formula:

=CORREL(CHOOSE({1;2;3},U7,AM7,BE7),HR1:HT1)

2. The formula in #1 works because CHOOSE creates an array of the 3 non-adjacent cells and effectively makes them appear ‘adjacent’ to each other.

3. The Rank correlation is just the correlation of the ranks, and to get the Rank correlation, I just need to convert the values in cells U7, AM7, and BE7 to ranks. (The values in HR1:HT1 are already ranks.)

4. If the non-adjacent values in U7, AM7, and BE7 were adjacent to each other (e.g., in cells U7:U9), I could convert the values to their respective ranks and get the Rank correlation like this:

=CORREL(RANK.AVG(U7:U9,U7:U9),HR1:HT1)

5. Consequently, I thought I could simply substitute the adjacent values created by “CHOOSE({1;2;3},U7,AM7,BE7)” in place of the naturally occurring adjacent values in (U7:U9) in the formula in #4 to get the Rank correlation, something like this:

=CORREL(RANK.AVG(CHOOSE({1;2;3},U7,AM7,BE7),CHOOSE({1;2;3},U7,AM7,BE7)),HR1:HT1)

Unfortunately, this doesn’t work, either as a regular formula or an array formula. Nor do these ‘unions’ work:

=CORREL(CHOOSE({1;2;3},RANK((U7,AM7,BE7),(U7,AM7,BE7))),HR1:HT1)

=CORREL(RANK.AVG((U7,AM7,BE7),(U7,AM7,BE7)),HR1:HT1)


I could get the answer by adding 10 more columns to my spreadsheet, but I don't want to do that. I need to get the Rank Correlation in one formula.

Can anyone figure out how to write such a formula?

Thanks for your help.
 

Excel Facts

Which lookup functions find a value equal or greater than the lookup value?
MATCH uses -1 to find larger value (lookup table must be sorted ZA). XLOOKUP uses 1 to find values greater and does not need to be sorted.
Welcome to the forum.

Excel has its issues with disjoint ranges. Most functions are not set up to handle them. In some cases, like CORREL, it will accept an array instead of a range. Using your CHOOSE construct converts the range into a contiguous array. Some functions, like RANK.AVG requires a range, so the CHOOSE trick won't work. But there are some other tricks. It's possible to write a version of RANK.AVG like this:

=SUMPRODUCT(--(H$3:H$9>H3))+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2+1

You could (with some effort) convert that to a disjoint RANK.AVG function which you could use in your formula. Similarly, you could go back to the definition of CORREL (see CORREL function ) and use the formula directly in some manner. Before I work on this any more though, are you looking at exactly 3 cells, or is this just an example? With only 3 cells, there are some shortcuts we could make. If you have more than 3 cells, are they always 18 columns apart on the same row?
 
Upvote 0
Welcome to the forum.

Excel has its issues with disjoint ranges. Most functions are not set up to handle them. In some cases, like CORREL, it will accept an array instead of a range. Using your CHOOSE construct converts the range into a contiguous array. Some functions, like RANK.AVG requires a range, so the CHOOSE trick won't work. But there are some other tricks. It's possible to write a version of RANK.AVG like this:

=SUMPRODUCT(--(H$3:H$9>H3))+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2+1

You could (with some effort) convert that to a disjoint RANK.AVG function which you could use in your formula. Similarly, you could go back to the definition of CORREL (see CORREL function ) and use the formula directly in some manner. Before I work on this any more though, are you looking at exactly 3 cells, or is this just an example? With only 3 cells, there are some shortcuts we could make. If you have more than 3 cells, are they always 18 columns apart on the same row?
Eric - thanks much for your help. The three cells were just an example. There are 10 cells, and they are all 18 columns apart. I will also need another similar correlation formula that also has 10 non-adjacent cells, but those cells are all 22 columns apart. I would love to use the correl function directly.

Here are examples of the formulas with which I am returning the Pearson correlations (one 18 columns apart and one 22 columns apart):

=CORREL(CHOOSE({1;2;3;4;5;6;7;8;9;10},C6,Y6,AU6,BQ6,CM6,DI6,EE6,FA6,FW6,GS6),$HR$1:$IA$1)

=CORREL(CHOOSE({1;2;3;4;5;6;7;8;9;10},U6,AM6,BE6,BW6,CO6,DG6,DY6,EQ6,FI6,GA6),$HR$1:$IA$1)

Let me know if there is any other information I can provide.
 
Upvote 0
Here's my first stab at it. I was a bit surprised it worked.

Book1
HIJKTUAMBEBFHQHRHSHTHU
1111317
2RANK.AVGSUMPRODUCT
3177312-0.32733
453.53.5213957
553.53.5
6455
72665137
8622
97110.052414
100.052414
11-0.32733
12
Sheet19
Cell Formulas
RangeFormula
I3:I9I3=RANK.AVG(H3,H$3:H$9)
J3:J9J3=SUMPRODUCT(--(H$3:H$9>H3))+1+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2
HU3HU3=CORREL(HR3:HT3,HR1:HT1)
U4,BE4,AM4U4=COLUMN(U4)
U9U9=CORREL(CHOOSE({1;2;3},U7,AM7,BE7),HR1:HT1)
U10U10=CORREL(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18)),HR1:HT1)
U11U11=CORREL(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))>TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))=TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})-1)/2+1,HR1:HT1)
Press CTRL+SHIFT+ENTER to enter array formulas surrounded with curly braces.


Your original formula is in U9. The U10 formula is how I rewrote it using SUBTOTAL(OFFSET instead of CHOOSE. You should be able to see how to change the array constant {0,1,2} to {0,1,2,3,4,5,6,7,8,9} for 10 values.

Over in column I is the RANK.AVG formula, and column J shows that the SUMPRODUCT formula I derived gives the same results.

Then the U11 formula is the RANK CORRELATION formula. You can see how I used the same SUBTOTAL(OFFSET structure. Notice that the SUMPRODUCT formula in J3 has 7 comparisons, 1 for each row. Each of 7 rows has 7 comparisons, for a total of 49 (n squared). The RANGE() > TRANSPOSE(RANGE()) performs all n squared comparisons, then the MMULT sums up each individual column to get the value for each row. You should also be able to see that you'll need to change the {1;1;1} array constant to have 10 1s in it to work with 10 values. The second MMULT construct is equivalent to the second SUMPRODUCT in the J3 formula. Et voila! You have your rankings. Then CORREL gives the correlation. The HU3 formula is just to show I got it right.

There might be ways to simplify this, I'll look at it a bit more. But it shows the lengths you have to go to to use disjoint ranges. I might have been able to shorten it a bit by using the Correlation formula, but you said you wanted the function. Anyway, have a look and see what you think.
 
Upvote 0
Here's my first stab at it. I was a bit surprised it worked.

Book1
HIJKTUAMBEBFHQHRHSHTHU
1111317
2RANK.AVGSUMPRODUCT
3177312-0.32733
453.53.5213957
553.53.5
6455
72665137
8622
97110.052414
100.052414
11-0.32733
12
Sheet19
Cell Formulas
RangeFormula
I3:I9I3=RANK.AVG(H3,H$3:H$9)
J3:J9J3=SUMPRODUCT(--(H$3:H$9>H3))+1+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2
HU3HU3=CORREL(HR3:HT3,HR1:HT1)
U4,BE4,AM4U4=COLUMN(U4)
U9U9=CORREL(CHOOSE({1;2;3},U7,AM7,BE7),HR1:HT1)
U10U10=CORREL(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18)),HR1:HT1)
U11U11=CORREL(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))>TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))=TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})-1)/2+1,HR1:HT1)
Press CTRL+SHIFT+ENTER to enter array formulas surrounded with curly braces.


Your original formula is in U9. The U10 formula is how I rewrote it using SUBTOTAL(OFFSET instead of CHOOSE. You should be able to see how to change the array constant {0,1,2} to {0,1,2,3,4,5,6,7,8,9} for 10 values.

Over in column I is the RANK.AVG formula, and column J shows that the SUMPRODUCT formula I derived gives the same results.

Then the U11 formula is the RANK CORRELATION formula. You can see how I used the same SUBTOTAL(OFFSET structure. Notice that the SUMPRODUCT formula in J3 has 7 comparisons, 1 for each row. Each of 7 rows has 7 comparisons, for a total of 49 (n squared). The RANGE() > TRANSPOSE(RANGE()) performs all n squared comparisons, then the MMULT sums up each individual column to get the value for each row. You should also be able to see that you'll need to change the {1;1;1} array constant to have 10 1s in it to work with 10 values. The second MMULT construct is equivalent to the second SUMPRODUCT in the J3 formula. Et voila! You have your rankings. Then CORREL gives the correlation. The HU3 formula is just to show I got it right.

There might be ways to simplify this, I'll look at it a bit more. But it shows the lengths you have to go to to use disjoint ranges. I might have been able to shorten it a bit by using the Correlation formula, but you said you wanted the function. Anyway, have a look and see what you think.
Maybe I mis-spoke. I would prefer to use the correl formula.

Thanks.
 
Upvote 0
If I'm not mistaken, there are few tweaks necessary to give the Spearman rho coefficient, as the rank of x and the rank of y are to be correlated. Here's a short example using the three x,y pairs. The table develops the various quanities that are eventually used by the "basic" formula in HS16. The shorter version in HS17 grabs the ranks directly and submits them as arguments to the CORREL function.
MrExcelBook20200430.xlsx
AMBEHQHRHSHTHUHVHWHXHYHZIA
16Spearman rho long basic0.5
17Spearman rho with RankX vs Rank Y0.5
18
19rxbarrybarsum
20222122
21xyrxryrx-ry(rx-ry)2rx-rxbarry-rybar(rx-rxbar)* (ry-rybar)(rx-rxbar)2(ry-rybar)2
221151100-1-1111
23131323-1101001
24177321110010
Sheet3
Cell Formulas
RangeFormula
HS16HS16=HY20/SQRT(HZ20*IA20)
HS17HS17=CORREL(HS22:HS24,HT22:HT24)
HS20:HT20HS20=SUBTOTAL(1,HS22:HS24)
HV20,HY20:IA20HV20=SUBTOTAL(9,HV22:HV24)
HS22:HT22HS22=IFERROR(RANK.AVG(HQ22,HQ$22:HQ$24,1),"")
HU22:HU24HU22=IFERROR(HS22-HT22,"")
HV22:HV24HV22=IFERROR(HU22^2,"")
HW22:HX24HW22=IFERROR(HS22-HS$20,"")
HY22:HY24HY22=IFERROR(HW22*HX22,"")
HZ22:IA24HZ22=IFERROR(HW22^2,"")
HS23:HT24HS23=IFERROR(RANK.AVG(HQ23,HQ$15:HQ$24,1),"")


If this is correct, then Eric's beautiful and complex formula might need to become even more complex, like that shown in J11 below, as some scheme is needed to pass the ranks of the second array to CORREL. I changed one thing in Eric's formula...I changed the >TRANSPOSE to a <TRANSPOSE to make the ranks ascending from low value (1) and up.
MrExcelBook20200430.xlsx
HIJTUAMBEHQHRHSHTHU
1111317
2RANK.AVGSUMPRODUCT
3177312-0.32732684
453.53.5213957
553.53.5
6455
72665137
8622
97110.052414242
100.052414242
11Spearman rho0.5-0.32732684
Sheet3
Cell Formulas
RangeFormula
I3:I9I3=RANK.AVG(H3,H$3:H$9)
J3:J9J3=SUMPRODUCT(--(H$3:H$9>H3))+1+(SUMPRODUCT(--(H$3:H$9=H3))-1)/2
HU3HU3=CORREL(HR3:HT3,HR1:HT1)
U4,BE4,AM4U4=COLUMN(U4)
J11J11=CORREL(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))<TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))=TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})-1)/2+1, MMULT((--(SUBTOTAL(9,OFFSET(HR1,0,{0,1,2}))<TRANSPOSE(SUBTOTAL(9,OFFSET(HR1,0,{0,1,2}))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(HR1,0,{0,1,2}))=TRANSPOSE(SUBTOTAL(9,OFFSET(HR1,0,{0,1,2}))))),{1;1;1})-1)/2+1)
U9U9=CORREL(CHOOSE({1;2;3},U7,AM7,BE7),HR1:HT1)
U10U10=CORREL(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18)),HR1:HT1)
U11U11=CORREL(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))>TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))=TRANSPOSE(SUBTOTAL(9,OFFSET(U7,0,{0,1,2}*18))))),{1;1;1})-1)/2+1,HR1:HT1)
 
Upvote 0
To better illustrate this over 10 pairs of (x,y) data, here's an example drawn from Spearman's rank correlation coefficient - Wikipedia with a reported Spearman's coefficient of −0.175757575. In this example, the starting points for the x and y arrays are in different rows and columns, and the spacings between data points are also different, with the x points shown in every other column, while the y points are in contiguous columns. To clarify this, the first few points then are (106,7), (100,27), and (86,2). The formula has two main components: one for ranking the 1st array and the other for ranking the 2nd array. To adapt this for use, the starting reference location in the OFFSET formulas point to the first data point in the relevant array, and the "*1" or "*2" placeholders are changed to reflect the column spacing for that array (each of these is done in 4 places for each X and Y array). In this example, every other column means "*2" for the first array and continuous columns means "*1" for the second array.

@BuckChuker Earlier you mentioned that you preferred to use the CORREL function. These solutions do use that function, but the main issue is finding some way to create an array of rankings that will be accepted by the CORREL function. Eric's proposed solution manages to do that.
MrExcelBook20200430.xlsx
TUAMBEHQHRHSHTHUHVHWHXHYHZIAIBICIDIEIFIGIHIIIJ
32Spearman's rho-0.17575758X_i -->106100861019910397113112110
33Y_i ->72725028292012617
Sheet3
Cell Formulas
RangeFormula
U32U32=CORREL(MMULT((--(SUBTOTAL(9,OFFSET(HR32,0,{0,1,2,3,4,5,6,7,8,9}*2))<TRANSPOSE(SUBTOTAL(9,OFFSET(HR32,0,{0,1,2,3,4,5,6,7,8,9}*2))))),{1;1;1;1;1;1;1;1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(HR32,0,{0,1,2,3,4,5,6,7,8,9}*2))=TRANSPOSE(SUBTOTAL(9,OFFSET(HR32,0,{0,1,2,3,4,5,6,7,8,9}*2))))),{1;1;1;1;1;1;1;1;1;1})-1)/2+1, MMULT((--(SUBTOTAL(9,OFFSET(HS33,0,{0,1,2,3,4,5,6,7,8,9}*1))<TRANSPOSE(SUBTOTAL(9,OFFSET(HS33,0,{0,1,2,3,4,5,6,7,8,9}*1))))),{1;1;1;1;1;1;1;1;1;1})+(MMULT((--(SUBTOTAL(9,OFFSET(HS33,0,{0,1,2,3,4,5,6,7,8,9}*1))=TRANSPOSE(SUBTOTAL(9,OFFSET(HS33,0,{0,1,2,3,4,5,6,7,8,9}*1))))),{1;1;1;1;1;1;1;1;1;1})-1)/2+1)
 
Upvote 0
KRice, thank you for the deeper dive into the actual statistics part of the question. My statistics classes were years ago, and although some of the terminology is familiar, it would take a lot for me to get good at it again. As you suggested, my main contribution here is to rank disjoint values, and get them into a format that CORREL can handle.

BuckChuker, I originally thought that I could use this formula:

Equation


from the Microsoft CORREL help page to calculate the rank correlation, possibly easier than using CORREL itself, possibly with using other stats functions like DEVSQ or AVEDEV. Possibly with only 3 values instead of 10 it could have worked, but after looking at it some more, the CORREL function is the way to go. The largest part of the function is the double MMULT trick to create the rankings, and that's not going to get any shorter. So at this point, I'd stick with the current formula. Make sure you test it with known values to see that you get the right answers. See if KRice's comments are something you need to consider.
 
Upvote 0
I'm glad to help, Eric. I too think the CORREL function is the way to go...if only there were a more direct way to compile a rankings array for disjoint data sets. It hurts my head to look at it, but your formula for doing so is clever and it works. I also wondered about using the basic function you've shown---that is what the first table in my post #6 was exploring, and the formula in HS16 pulls the various summation terms into that formula. The problem, of course, is that the x_i, y_i, xbar, and ybar terms are all derived from the rankings arrays, and if we have those arrays, we're better off taking the shortcut and simply feeding them directly into CORREL.
 
Upvote 0

Forum statistics

Threads
1,214,421
Messages
6,119,392
Members
448,891
Latest member
tpierce

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top