Comparing errors

JenniferMurphy

Well-known Member
Joined
Jul 23, 2011
Messages
1,338
Office Version
  1. 365
  2. 2016
  3. 2007
Platform
  1. Windows
This is really a statistics question. If this is inappropriate, let me know. I've tried getting help in a couple of statistics forums, but I always get better answers here and I know Excel is often used for statistical calculations.

I am interested in comparing the "tally errors" is different "win streaks" in a random event, like a game, with a given probability of a win. The table shows results for win streaks of various lengths (Row 5). Columns D-G contain data for these 4 streak lengths over 4,000 games with an overall winning % (Wins/Games) of 60%.

Col H shows the formulas used for the calculations.

R/CCDEFGH
3#Games4000400040004000D3: 4000
4Win %60.00%60.00%60.00%60.00%D4: 0.6
5Streak Length1234D5: 1
6Exp Streak %24.00%14.40%8.64%5.18%D6: =(D4^D5)*(1-D4)
7Expected Tally384.00230.40138.2482.94D7: =D6*D3*(1-D4)
8Actual Tally39022714181D8: 390
9Error (Act-Exp)6.00-3.402.76-1.94D9: =D8-D7
10Error %1.54%-1.50%1.96%-2.40%D10: =D9/D8

<tbody>
</tbody>

Row 6 shows the expected probability for each streak length. Row 7 shows the expected tally for each streak length. Row 8 shows the actual tallies. Row 9 shows the difference in these percentages and Row 10 shows the percentage error.

My question is this: Are the values in Row 10 the best measures of how closely the actual tallies (Row 8) are to the expected tallies (Row 7)? If there is a better measure, I would appreciate some help understanding what it is and how to calculate it. I am looking for a measure that will allow me to rank the various tallies according to the error from the expected.

I hope I explained that adequately.
 

Some videos you may like

Excel Facts

Copy formula down without changing references
If you have =SUM(F2:F49) in F50; type Alt+' in F51 to copy =SUM(F2:F49) to F51, leaving the formula in edit mode. Change SUM to COUNT.

BrianJN1

Well-known Member
Joined
Jun 15, 2017
Messages
603
Jennifer,
I'm not sure if this is appropriate, ie, does my logic hold true?

You have 4 different events (by category, 1, 2, 3, 4).
Each event differs from zero by the Error %.

I think that using a Standard Deviation might be the better way.
1.54-1.51.9-2.4
2.37162.253.615.7613.9916
0.3302670.3702670.0297330.5297331.870267

<tbody>
</tbody>

Row 1 is your error %. Row 2 is the Square of those values with the sum of the 4 at the end.
Beneath that sum is the STD DEV = Sqrt(SUM/4)
The other 4 values in row 3 are how far the absolute values sit from 1 Standard deviation.
My formula in the first cell of row 3 is: =ABS(ABS(A1)-$E3)

Be careful. I have a rather limited concept of statistics; as I said, my logic may NOT be exactly right.
 
Last edited:

JenniferMurphy

Well-known Member
Joined
Jul 23, 2011
Messages
1,338
Office Version
  1. 365
  2. 2016
  3. 2007
Platform
  1. Windows
Jennifer,
I'm not sure if this is appropriate, ie, does my logic hold true?

You have 4 different events (by category, 1, 2, 3, 4).
Each event differs from zero by the Error %.

I think that using a Standard Deviation might be the better way.

I wanted to use the std dev, but couldn't figure out how to calculate it.

1.54-1.51.9-2.4
2.37162.253.615.7613.9916
0.3302670.3702670.0297330.5297331.870267

<tbody>
</tbody>

Row 1 is your error %. Row 2 is the Square of those values with the sum of the 4 at the end.
Beneath that sum is the STD DEV = Sqrt(SUM/4)

The other 4 values in row 3 are how far the absolute values sit from 1 Standard deviation.
My formula in the first cell of row 3 is: =ABS(ABS(A1)-$E3)

Be careful. I have a rather limited concept of statistics; as I said, my logic may NOT be exactly right.

Hmmm... Is that really the std dev? And do we want to find out how far a value is from one std dev or how many std devs it is from the mean (z-score)?

How about this? I've added the mean and std dev to rows 9 & 11. I'm not sure if these are valid measures because I'm not sure if they are from the same population.

Then in rows 10 & 12, I calculate the Z score based on those values.

R/CCDEFGIJKL
3Streak Length1234MeanFormulaStd DevFormula
4#Games4000400040004000
5Win %60.00%60.00%60.00%60.00%
6Exp Streak %24.00%14.40%8.64%5.18%D6: =(D5^D3)*(1-D5)
7Expected Tally384.00230.40138.2482.94D7: =D6*D4*(1-D5)
8Actual Tally39022714181
9Error (Act-Exp)6.00-3.402.76-1.940.8540I9: =AVERAGE(D9:G9)4.32K9: =STDEV(D9:G9)
10Z-Score1.19-0.980.44-0.65D10: =(D9-$I9)/$K9
11Error %1.54%-1.50%1.96%-2.40%-0.1005%I11: =AVERAGE(D11:G11)2.17%K11: =STDEV(D11:G11)
12Z-Score0.75-0.640.95-1.06D12: =(D11-$I11)/$K11

<tbody>
</tbody>

Is any of this meaningful?
 

BrianJN1

Well-known Member
Joined
Jun 15, 2017
Messages
603
Jennifer,
I recognise two things in which I was probably was in error. I made the assumption that the "mean" would be zero, ie, no error. Secondly my divisor should have been 1 less that the values, ie, 3.

What are you doubting about "same population"? Are not the 4 categories of errors coming from the same 4000 game population?
That said, and you'd be more conversant than I, the most accurate means to report deviations would be based upon STDs and Z-scores are a more meaningful way to interpret them.

Yes, your table is meaningful to me and would seem to be the way to go.
 

JenniferMurphy

Well-known Member
Joined
Jul 23, 2011
Messages
1,338
Office Version
  1. 365
  2. 2016
  3. 2007
Platform
  1. Windows
Jennifer,
I recognise two things in which I was probably was in error. I made the assumption that the "mean" would be zero, ie, no error. Secondly my divisor should have been 1 less that the values, ie, 3.
I think the mean should be zero, if this is a true random variable, but since it's about human behavior, it's almost certainly not perfectly random. Does a human play the same after a loss as after 10 wins in a row? I think not.

What are you doubting about "same population"? Are not the 4 categories of errors coming from the same 4000 game population?
Yes, all of the results come from the same 4000 games. But it seems to me that by tallying the number the different streaks separately, I may have created different (sub)populations. This is over my skill level.

That said, and you'd be more conversant than I, the most accurate means to report deviations would be based upon STDs and Z-scores are a more meaningful way to interpret them.

Yes, your table is meaningful to me and would seem to be the way to go.
I think I'll try this on one of the stats forums.

Thanks
 

BrianJN1

Well-known Member
Joined
Jun 15, 2017
Messages
603
Ah well, if we didn't get to "the prize" then hopefully our exchange has cleared some thoughts or at least offered some new directions.
All the best.
 

Watch MrExcel Video

Forum statistics

Threads
1,108,525
Messages
5,523,384
Members
409,514
Latest member
MarkZuckerberg

This Week's Hot Topics

Top