Getting STDEV from sumproduct() ...

just_matt

New Member
Joined
Aug 10, 2018
Messages
7
I am struggling with standard deviation in a sumproduct() output.

I have developed a matrix that tracks stuff by hour of day and day of week over the course of a great length of time. This matrix is specific and by design.

Right now, I am able to achieve a total count, and an average count (details below). I need to figure out if I can leverage the same formulas to get standard deviation, min, and max.

I do have this answer elsewhere, however, I did it "the hard way" by creating massive tables and datasets to perform DAVERAGE() and DSTDEV()'s against.

Here's where I'm at with my "to-be"

The matrix is driven by a dataset that is about 38,000 rows.

So for my total count matrix, I have:

2 X24345671
MTWThFSSu
TimeMTWThFSSu
12:00 AM


W27
504641545463
5:00 AM26425525729124741
6:00 AM53759859764051391
7:00 AM72680778368974479
8:00 AM785865795818742107

<colgroup><col span="2"><col span="6"></colgroup><tbody>
</tbody>


(snipped for the sake of brevity)
(apologies for the garbage/duplicate rows)

In each ## cell, the following formula exists:
=SUMPRODUCT(--(WEEKDAY(Open)=X$24),(HOUR(Open)>=HOUR($W27))*(HOUR(Open)<HOUR($W28)))

"Open" is the column within my raw data which contain a date. Lots and lots of data here to be counted, averaged, and hopefully STDEV'd!
X24 is "2" (above the M)
W27 is "12:00AM"

In order to get an average, I need to take into account the fact that my data source only records when events happen, and not when they don't. e.g. an event at 6:30AM may occur one week and be counted, but not occur the following week. If my sample size were 2, I'd need that total number to calculate my average.

Therefore, to calculate averages, I update that formula to:

=(SUMPRODUCT(--(WEEKDAY(Open)=X$24),(HOUR(Open)>=HOUR($W27))*(HOUR(Open)<HOUR($W28))))/(DAYS(MAX(Open),MIN(Open))/7)


So far, this works fine, and I have validated it against "the long way" which was mentioned above.

My Dilemma:
Where I'm getting stuck is figuring out how/where to apply the standard deviation equation in the SUMPRODUCT formula, or if this is even possible based on how I've derived the total and average counts.

If you really want to play with what's in "Open" you can do a =randbetween(42370375,43465625)/1000 for a few thousand rows to simulate what I've got.

If I may clarify what I've stated above please let me know.

Thank you in advance!
 

Excel Facts

What is =ROMAN(40) in Excel?
The Roman numeral for 40 is XL. Bill "MrExcel" Jelen's 40th book was called MrExcel XL.
Unfortunately, this forum's UI broke the presentation of your formulas, I think. Put spaces around " > " and " < " to be sure.

In any case, I wonder if you need SUMPRODUCT at all.

Maybe the following paradigm will work for you, array-entered (press ctrl+shift+Enter instead of just Enter):

STDEV(IF(cond1,IF(cond2,IF(cond3,range))))

And I wonder if you should use STDEVP instead.

The difference is: STDEVP is the actual std dev for "range", whereas STDEV is an approximate std dev of a larger data set (population), assuming "range" is sample.
 
Upvote 0
Unfortunately, this forum's UI broke the presentation of your formulas, I think. Put spaces around " > " and " < " to be sure.

In any case, I wonder if you need SUMPRODUCT at all.

Maybe the following paradigm will work for you, array-entered (press ctrl+shift+Enter instead of just Enter):

STDEV(IF(cond1,IF(cond2,IF(cond3,range))))

And I wonder if you should use STDEVP instead.

The difference is: STDEVP is the actual std dev for "range", whereas STDEV is an approximate std dev of a larger data set (population), assuming "range" is sample.

I'll have to go back and look at the formula but WOW you're right - that is not exactly what I cut & paste into the browser.

{STDEV.P(IF(cond1....} won't work as I need to "manually" compute the mean (instead of average() ) ... of course if my formula would have pasted correctly, you'd see where this happens.

Let me think about setting up the entire standard deviation formula via a giant {IF()} array and see if that will work. I want to say there were "reasons" that approach was not taken, possibly simplicity, possibly lack of compatibility with other functions I was trying to hammer into the cell (stuff that was irrelevant to the problem I was trying to solve, but may be relevant in the grand scheme of things now that I take a step back and see it differently).

Thanks for getting me to think about this differently. I appreciate it. I'll update those formulas soon, but... I think this is what it was (off of the top of my head)

In each ## cell, the following formula exists:
=SUMPRODUCT(--(WEEKDAY(Open)=X$24),(HOUR(Open)>=HOUR($W27))*(HOUR(Open)=HOUR($W28)

Therefore, to calculate averages, I update that formula to:

=(SUMPRODUCT(--(WEEKDAY(Open)=X$24),(HOUR(Open)>=HOUR($W27))*(HOUR(Open)
=HOUR($W28))/DAYS(MAX(Open),MIN(Open))/7



<hour($w28)))) (days(max(open),min(open))="" 7)
[I promise I'll update this soon with what I'm actually using but I think you should get the gist... ]</hour($w28))))>

 
Upvote 0
{STDEV.P(IF(cond1....} won't work as I need to "manually" compute the mean (instead of average() )
[....]
I'll update those formulas soon, but... I think this is what it was (off of the top of my head)
[....]
=SUMPRODUCT(--(WEEKDAY(Open)=X$24),(HOUR(Open)>=HOUR($W27))*(HOUR(Open)=HOUR($W28)

Again, I'm not seeing enough of a formula (due to GUI mangling?) to provide a turnkey solution with SUMPRODUCT.

Moreover, it is still unclear why you think you need a "manually-computed" mean. I presume the conditions for including data in the calculation of the mean are the same as the conditions for including data in the calculation of the std dev. At least, that is usually the case. So:

=AVERAGE(IF(cond1,IF(cond2,IF(cond3,range))))
and
=STDEVP(IF(cond1,IF(cond2,IF(cond3,range))))

should be copacetic. Both formulas are array-entered, pressing ctrl+shift+Enter instead of just Enter.

That said, I myself encountered an instance where I thought I wanted different conditions for selecting data for the mean and std dev calculations. I don't remember why; and I might have abandoned the approach after having second thoughts.

Be that as it may, the general form of the formula for STDEVP is:

=SQRT( SUMPRODUCT((cond1)*(cond2)*(cond3)*(range-mean)^2) / COUNTIFS(cond1,cond2,cond3) )
or
=SQRT( SUMPRODUCT((cond1)*(cond2)*(cond3)*(range-mean)^2) / SUMPRODUCT((cond1)*(cond2)*(cond3)) )

The latter form might be necessary if we are not ANDing all the conditions, or if for some other reason we cannot use COUNTIFS.

For STDEV, we subtract 1 in the denominator (divisor).

Hope that helps.
 
Upvote 0
Ugh. This board's handling of text is beyond frustrating.

To answer the first question a different way (why can't I just use stdev or average) it's a basic mathematics concept. Take a step back, go populate sample data as I suggested.

Now look at the matrix I am trying to fill.

I need to average things like counts of events occurring during the 8am hour.

Average() on my dataset will only take into consideration the number of times the EVENTS occur, not the total number of times the 8am hour on a Monday occur. Therefore, when I take the SUM of my numbers and divide them by N of numbers, my answer is incorrect with average() as N is an incorrect value. My event data, like the simtable I offered, do not indicate null counts.

To overcome this, I've derived N from the dataset. That part of the formula may have gotten mangled, but it's a DAYS(MAX(OPEN),MIN(OPEN))/7 that returns the correct value.

Let's pause here with that explanation. Did I miss an easier way to extract N from the dataset? In my assessment, the only other way to get N was to manually add 0 values for dates where an event did not occur.
 
Upvote 0
No time to follow all of the details. Your explanations are not as clear as I would like, at least for a quick read. I suspect this forum's mangling has a lot to do with my failure to follow quickly.

But if you are trying to calculate weighted average and weighted std dev, I agree that AVERAGE and STDEVP do not work for you, and SUMPRODUCT is the way that I would go. Sorry for the misdirection.

Hopefully, someone with more time will jump in.
 
Upvote 0
=SQRT( SUMPRODUCT((cond1)*(cond2)*(cond3)*(range-mean)^2) / COUNTIFS(cond1,cond2,cond3) )
or
=SQRT( SUMPRODUCT((cond1)*(cond2)*(cond3)*(range-mean)^2) / SUMPRODUCT((cond1)*(cond2)*(cond3)) )
I'm having a similar issue.

I have a massive Likert table and I used SumProduct to get a nice summary - X answered "1", Y answered "2", and so on.

I know you shouldn't use statistical formulas on Likert Data. That is a battle I have lost many times over.

I'd like to use SumProduct to calc the deviation - array formulas make me uncomfortable (I am handing file over to someone else).

So my question is - what is "(range-mean)" in your example above?
 
Upvote 0
=SQRT( SUMPRODUCT((cond1)*(cond2)*(cond3)*(range-mean)^2) / COUNTIFS(cond1,cond2,cond3) )
or
=SQRT( SUMPRODUCT((cond1)*(cond2)*(cond3)*(range-mean)^2) / SUMPRODUCT((cond1)*(cond2)*(cond3)) )
what is "(range-mean)" in your example above?

The quick answer is: read that as "range minus mean". "Range" is the cell reference (e.g. A1:A10000) for the data. "Mean" is the cell reference (e.g. M1) for the specially-calculated average of the data.

(I am biting my tongue about using such statistics for Likert scale data.)
 
Last edited:
Upvote 0
The quick answer is: read that as "range minus mean". "Range" is the cell reference (e.g. A1:A10000) for the data. "Mean" is the cell reference (e.g. M1) for the specially-calculated average of the data.
Thank you. A little lost on actual application.
Excel Formula:
J358 = (SUMPRODUCT((Year=$H358)*(Question=$C358)*(Title=E$1)*Responses)/(SUMPRODUCT((Year=$H358)*(Question=$C358)*(Title=E$1)))
Gives me my "average" for the question from the year in that row, for the position in that column.

What would I add for the "Range-Mean" in the formula? (The column has answers for "average" in J2 through J 360.)

(I am biting my tongue about using such statistics for Likert scale data.)
I know. I have presented on this. I have written papers and blog articles on this. I have fought the people who create the surveys and get paid lots of money to present the "analysis" that it's garbage. I have made the "interval data" vs. "ordinal data" charts. And yet.
 
Upvote 0

Forum statistics

Threads
1,215,429
Messages
6,124,838
Members
449,193
Latest member
MikeVol

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top