Dynamic multi-level sorting with array formulas

squishy2010

New Member
Joined
Oct 22, 2018
Messages
4
In Excel, there is a nifty sort feature you can use to sort by several columns of information. For example, if your data looks like this:

NameType
appletree
gingerroot
peartree
grapevine
apricottree
carrotroot
bananatree
muscadinevine
booberryvine
onionroot

<colgroup><col><col></colgroup><tbody>
</tbody>

You can do a sort by Type, and add a level to include Name. It then will look like this:

NameType
carrotroot
gingerroot
onionroot
appletree
apricottree
bananatree
peartree
booberryvine
grapevine
muscadinevine

<colgroup><col><col></colgroup><tbody>
</tbody>

As you can see, it groups the list into three groups, and sorts each group independently. Roots are sorted by name first, then Trees, then Vines.

Let's say you want to do this with an array formula. The first thing you try is some SMALL(IF array formulas. Then you have to work in some MIN(SMALL(IF(COUNTIF formulas. Then you get completely nuts and go with MMULT array constructs, but to no avail. You try posting on forums, and all you get are clever responses like:

1. WHY DON'T YOU JUST CLICK THE SORT BUTTON?

A: Because I don't want to click the sort button. I want an array formula that sorts the list dynamically. That means it updates with no user interaction.

2. WHY DON'T YOU USE PIVOT TABLES?

A: Because I don't want to use pivot tables. Rather than go into the 1000 reasons why, hows about we try to find a solution with an array formula? I don't like the caching, the refreshing, etc. that comes with pivot tables.

3. WHY DON'T YOU USE HELPER COLUMNS?

A: sigh. YES, I know I could use helper columns. I DON'T WANT to use helper columns. I want to use a nice, tidy array formula. Why is this so hard to do?

4. WHY DON'T YOU BUY THIS EXCEL EXTENSION?

5. WHY DON'T YOU WRITE A CUSTOM BUBBLE SORT FUNCTION IN VBA?

Look, I know I can do that stuff. I have done that stuff. What I want to do is use an array formula to do a multi-level sort in Excel. Exhaustive searches have failed to find a solution. Would anyone on this magnificent forum have a clue how to do this in an array formula?

Many thanks- I have a feeling I'm not alone in searching for a solution to this problem. This could be the only forum on the internet that provides a solution to this if we can come up with an answer.
 

Excel Facts

Fastest way to copy a worksheet?
Hold down the Ctrl key while dragging tab for Sheet1 to the right. Excel will make a copy of the worksheet.
Hi,

Assuming your original table is in A1:B11 (with headers in row 1), array formula** (in e.g. D2):

=LOOKUP(1,0/FREQUENCY(ROWS(A$2:A2),MMULT(N($B$2:$B$11&"|"&$A$2:$A$11>=TRANSPOSE($B$2:$B$11&"|"&$A$2:$A$11)),ROW($A$2:$A$11)^0)),A$2:A$11)

and copied across and down (to e.g. D2:E11).

Regards
 
Upvote 0
XOR LX: Your solution works perfectly. Please note to anyone googling this later, you must enter the formula as an array formula (CTRL+SHIFT+ENTER) and then copy down.

Once I decypher how the formula works, I will post an explanation.
 
Upvote 0
Explanation, best I can figure:

First, you take your data and format it such that you join each field using a pipe (|). This makes an array that looks like:

tree|apple
root|ginger
tree|pear
vine|grape
tree|apricot
root|carrot
tree|banana
vine|musc.
vine|berry
root|onion

<tbody>
</tbody>

Why you ask? Well, we're doing a multi-level sort. That means the type must be sorted first, followed by the name. By putting the type as a prefix, they all get grouped together in a list. That list is then sorted, just like a multi-level sort! So you should just be able to do a RANK of those values and you're done! Oh wait... the Rank function does not allow for array formulas. We must hack the system to do what we want.

To get the ranking, you must compare each item to every other item in the list using the transpose operator and the ">=" operator. This results in a 10x10 array that says if that row's text value is > or = to the column's text value. (e.g. is tree|apple >= tree|apple? TRUE.) The 10x10 array then looks like this:

tree|appleroot|gingertree|pearvine|grapetree|apricotroot|carrottree|bananavine|musc.vine|berryroot|onion
tree|appleTRUETRUEFALSEFALSEFALSETRUEFALSEFALSEFALSETRUE
root|gingerFALSETRUEFALSEFALSEFALSETRUEFALSEFALSEFALSEFALSE
tree|pearTRUETRUETRUEFALSETRUETRUETRUEFALSEFALSETRUE
vine|grapeTRUETRUETRUETRUETRUETRUETRUEFALSETRUETRUE
tree|apricotTRUETRUEFALSEFALSETRUETRUEFALSEFALSEFALSETRUE
root|carrotFALSEFALSEFALSEFALSEFALSETRUEFALSEFALSEFALSEFALSE
tree|bananaTRUETRUEFALSEFALSETRUETRUETRUEFALSEFALSETRUE
vine|musc.TRUETRUETRUETRUETRUETRUETRUETRUETRUETRUE
vine|berryTRUETRUETRUEFALSETRUETRUETRUEFALSETRUETRUE
root|onionFALSETRUEFALSEFALSEFALSETRUEFALSEFALSEFALSETRUE

<tbody>
</tbody>

I think this is the foundation for XOR's understanding of how to do these sorts of data comparisons. Once you get the data into this form, you can then massage it into a form you want to display.

So what he does next, is convert this matrix to 1's for false, and 0's for true, using the N formula:

tree|appleroot|gingertree|pearvine|grapetree|apricotroot|carrottree|bananavine|musc.vine|berryroot|onion
tree|apple1100010001
root|ginger0100010000
tree|pear1110111001
vine|grape1111111011
tree|apricot1100110001
root|carrot0000010000
tree|banana1100111001
vine|musc.1111111111
vine|berry1110111011
root|onion0100010001

<tbody>
</tbody>

This is multiplied by 1 1x10 array of all 1's. Why? Basically, it sums the "1's" across each row. In other words, how many other values are greater than this one? If the answer is 1, then it is at the top of the list (remember the value is compared against itself, so an answer of 1 really means there are 0 other values greater). Tree|apple has 4 entries which are greater or equal to itself. So it would come in 4th on a sorted list.

This is done using the MMULT (matrix multiply) formula. The first array is the comparison array above, the second is the vector of 1's. The 1's vector is created by row($A$2:$A$11)^0. The "row" formula is not important because it's just a place holder to construct the 1x10 vector. Any number to the 0 power is 1, so you could put any property of that range here to get the vector.

After multiplying, you get:

tree|apple4
root|ginger2
tree|pear7
vine|grape9
tree|apricot5
root|carrot1
tree|banana6
vine|musc.10
vine|berry8
root|onion3

<tbody>
</tbody>

Does this familiar? This is just like using the RANK formula! But you can't use RANK in array formulas in Excel! So this is a workaround to achieve the same result using some clever constructs.

From here, we lookup the result value by any formula you are comfortable with. XOR likes the FREQUENCY formula, which does this:

0/Frequency(data,bins) returns an error for each number in the array that does not match. By incrementing the rows array as the data array, you are first searching for a "1" in the MMULT array (in other words, which is the item ranked #1 in the list?) then the 2nd ranked item in the next row, etc. The LOOKUP function will see an array of errors like {#DIV/0;#DIV/0;#DIV/0;#DIV/0;#DIV/0; 1 ;#DIV/0;#DIV/0;#DIV/0;#DIV/0;} and will return a match on the 6th value for the 1st ranked item. The result array is just the name array, indexed to the 6th entry, which is "carrot".

This makes my brain hurt. Yes, it is a compact formula that works, but trying to see it working in my head makes me want to impale myself on a sword.

Instead, I like this solution:

{=INDEX(fruit,MATCH(ROW()-firstrow,MMULT(N(type&"|"&fruit>=TRANSPOSE(fruit&"|"&type)),ROW($A$2:$A$11)^0),0))}

See what I did there? The ROW()-firstrow just increments 1,2,3,4,.. as you copy down. I'm just doing an INDEX/MATCH on that number to the array I'm indexing. If I named the MMULT array with the name "rank", it would look like this:

{=INDEX(fruit,MATCH(ROW()-firstrow,rank,0))}

But that would require a helper column, so I just lump it all together and it works.

Anyway, hope that helps someone. Explaining it helps me see how it works, so I can modify it to other applications.
 
Upvote 0
Couldn't have explained it better myself! :)

Excellent post, and quite some effort to write I would imagine. I'm sure it will be appreciated by those who come across this thread.

I might just mention that the issue with RANK is not so much that it "does not allow for array formulas" (not sure what you mean here: something like =SUMPRODUCT(RANK(A1:A4,B1:B4)) is a perfectly valid array construction), but rather that it operates only on numerics, not text. In fact, COUNTIF(S) is the usual way to 'rank' here, though that function does not allow us to manipulate a range (in this case concatenating with another) prior to passing as the criteria_range. Example here if you're interested:

https://excelxor.com/2015/01/22/sorting-a-list-alphabetically-without-filters/


This makes my brain hurt. Yes, it is a compact formula that works, but trying to see it working in my head makes me want to impale myself on a sword.

THAT bad, eh? :LOL:

Instead, I like this solution:

{=INDEX(fruit,MATCH(ROW()-firstrow,MMULT(N(type&"|"&fruit>=TRANSPOSE(fruit&"|"&type)),ROW($A$2:$A$11)^0),0))}

That'll do the job as well, though for the record I'm not a big fan of these ROW constructions (https://excelxor.com/2014/08/25/row-vs-rows-for-consecutive-integer-generation/).

Cheers
 
Upvote 0
Interesting... somewhere in my brain I recall tinkering with Rank and being frustrated at not getting what I needed. Maybe it was the text thing rather than the array thing. Anyway I wound up using the LOOKUP(1,0/FREQUENCY method because it's easier to construct a catch if the range you're pasting to has more rows than sortable data. Good thing it's Oktoberfest month... my brain needs a rest today.
 
Upvote 0

Forum statistics

Threads
1,215,064
Messages
6,122,941
Members
449,094
Latest member
teemeren

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top