What if analysis in Median (formula help required)

aravindhan_31

Well-known Member
Joined
Apr 11, 2006
Messages
672
Office Version
  1. 365
  2. 2019
  3. 2016
Platform
  1. Windows
Hi,

I am using a simple median formula in cell D1 = Median(C:C), there are about 350000 rows of employee numbers in column A & Names in Col B & Salaries in Col C .

Assume the median value is 18567 in Cell D1. now if I want to get median value to become 20000, how many more employee to be recruited ( rows to be added) and with what salary.

Not sure if the is easy or difficult to do in formulas.

Thanks for your help

Regards
Arvind
 

Excel Facts

Will the fill handle fill 1, 2, 3?
Yes! Type 1 in a cell. Hold down Ctrl while you drag the fill handle.
I am using a simple median formula in cell D1 = Median(C:C), there are about 350000 rows of employee numbers in column A & Names in Col B & Salaries in Col C .

If you have "only" 350,000 rows, it would be better to write =MEDIAN(C1:C350000).

If you use C:C, Excel might have to look at all 1,048,576 rows. This becomes more important if you use the form C:C in array-entered formulas.

Assume the median value is 18567 in Cell D1. now if I want to get median value to become 20000, how many more employee to be recruited ( rows to be added) and with what salary.

There is no single answer. It is easy to calculate the minimum additional employees; but you add more. However, there might be no limit on the salaries.

Let me explain....

The median simply means that 50% of the data lies below and 50% of the data lies above.

So we can create a median of 18567 as follows:

1. Select A1:A175000, type =RANDBETWEEN(1,18566), press ctrl+Enter (not ctrl+shift+Enter), then copy and paste-value back into the selected range.

If you prefer, simply type any number less than 18567.

2. Select A175001:A350000, type =RANDBETWEEN(18568,1000000), press ctrl+Enter, then copy and paste-value back into the selected range.

If you prefer, simply type any number greater than 18567.

Note that 1,000,000 is an arbitrary upper limit. It can be as high or low as you wish, as long as it 18658 or more.

3. Enter 18567 into A350001.

Then =MEDIAN(A1:A350001) returns 18567.


4. Enter =COUNTIF(A1:A350001,"<20000") into B1. Enter =COUNTIF(A1:A350001,">20000") into B2.

5. Enter =ABS(B2-B1) into B3. This (plus one?) is the minimum additional employees.

6. Enter 20000 into A350002 to be sure there is at least one 20000.

This is unnecessary if you want to verify that =COUNTIF(A1:A350001,20000) is not zero.

7. Select the number of cells below A350002 indicated by B3.

For example, if B3 is 545, select A350003:A350547.

7. If B1 < B2 (more values over 20000), type =RANDBETWEEN(1,19999), press ctrl+Enter, then copy and paste-value back into the selected range.

If you prefer, simply type any number less than 20000.

8. If B1 > B2 (more values under 20000), type =RANDBETWEEN(20001,1000000), press ctrl+Enter, then copy and paste-value back into the selected range.

If you prefer, simply type any number more than 20000.

Then =MEDIAN(A1:A350547) returns 20000.

Of course, "A350547" should be replaced by the last row number that you selected in step #7.
 
Upvote 0
If you use C:C, Excel might have to look at all 1,048,576 rows. This becomes more important if you use the form C:C in array-entered formulas.

Actually, as I understand it, if the range being passed is not one which is being queried on a cell-by-cell basis (such would be the case if, as you say, we were to pass it to some appropriate array-processing construction), then the use of entire column references has virtually no detriment to calculation performance.

So, for example, with just A1 and A2 non-empty within column A:

=MEDIAN(A1:A2)

and:

=MEDIAN(A:A)

should have near-identical calculation times, despite the fact that the latter is effectively querying half-a-million more cells than the former.

I can only surmise that, when not being processed on a cell-by-cell (i.e. array) basis, any range passed is first redimensioned to that which comprises the used range only.

Your second point is the correct and crucial one, however, and one which is greatly misunderstood and misused by many. And that is that, with a seemingly innocent change to the above, this time the difference in terms of performance between the (array-entered):

=MEDIAN(IF(B1:B2="X",A1:A2))

and:

=MEDIAN(IF(B:B="X",A:A))

(with, again, only the first two cells in both columns populated), is indeed enormous, to the effect that the latter is approximately 500,000 times less efficient than the former.

Regards
 
Last edited:
Upvote 0
Errata....
There is no single answer.

I still believe that much is correct.

4. Enter =COUNTIF(A1:A350001,"<20000") into B1. Enter =COUNTIF(A1:A350001,">20000") into B2.
5. Enter =ABS(B2-B1) into B3. This (plus one?) is the minimum additional employees.

What a bunch of malarkey! I wish this forum allowed us to remove postings or delete text to correct such misdirection.

To demonstrate the fallacy of my assumptions, consider the following counter-example.

1. Select A1:A175000, type 17132, and press ctrl+Enter (not ctrl+shift+Enter).
2. Select A175001:A350000, type 20002, and press ctrl+Enter.
Note that 17132 is below 20000, 20002 is above 20000, and AVERAGE(17132,20002) is 18567. ;)
3. Note that =MEDIAN(A1:A350000) is 18567.
4. Also note that =COUNTIF(A1:A350000,"<20000") is 175,000, and so is =COUNTIF(A1:A350000,">20000").

According to my screwy logic, we do not need to add any employees in order for the median to be 20000. Obviously wrong! QED.

Back to the drawing board....
 
Last edited:
Upvote 0
PS (too late to edit)....
What a bunch of malarkey!

The real point is: it is one thing to construct data and a process to meet the requirements; it is a very different thing to develop a process for arbitrary given data. My bad!
 
Upvote 0
Sorry, but no errata was needed....
4. Enter =COUNTIF(A1:A350001,"<20000") into B1. Enter =COUNTIF(A1:A350001,">20000") into B2.
5. Enter =ABS(B2-B1) into B3. This (plus one?) is the minimum additional employees.
6. Enter 20000 into A350002 to be sure there is at least one 20000.
7. Select the number of cells below A350002 indicated by B3.
8. If B1 < B2 (more values over 20000), type =RANDBETWEEN(1,19999), press ctrl+Enter [....]
9. If B1 > B2 (more values under 20000), type =RANDBETWEEN(20001,1000000), press ctrl+Enter [....]
Then =MEDIAN(A1:A350547) returns 20000.

That is correct, after all. I forgot that I added step #6. Technically, step 6 is not needed if COUNTIF(range,20000)<>0.

Bottom line: as long as COUNTIF(range,"<k") = COUNTIF(range,">k") and there is at least one k in range, the median is indeed k. That is the definition of MEDIAN.

That is a sufficient, but not a necessary condition. That is, there are contents of "range" that can result in a median of k.

Sorry for all the back-and-forth rethinking.
 
Upvote 0
Hi,

I am using a simple median formula in cell D1 = Median(C:C), there are about 350000 rows of employee numbers in column A & Names in Col B & Salaries in Col C .

Assume the median value is 18567 in Cell D1. now if I want to get median value to become 20000, how many more employee to be recruited ( rows to be added) and with what salary.

Not sure if the is easy or difficult to do in formulas.

Thanks for your help

Regards
Arvind

I need another help from you , with ref to the above, what if I have to do this vice versa. eg? now I am changing the median value and its giving me number of employee that is required to get the anticipated median.
but not if enter projected number of employees that I would recruit say another 5000 employees at a salary of 15000 each, what would be my median?

Thanks for your support
Regards
Arvind
 
Upvote 0

Forum statistics

Threads
1,216,113
Messages
6,128,904
Members
449,477
Latest member
panjongshing

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top