# Comparing errors

#### JenniferMurphy

##### Well-known Member
This is really a statistics question. If this is inappropriate, let me know. I've tried getting help in a couple of statistics forums, but I always get better answers here and I know Excel is often used for statistical calculations.

I am interested in comparing the "tally errors" is different "win streaks" in a random event, like a game, with a given probability of a win. The table shows results for win streaks of various lengths (Row 5). Columns D-G contain data for these 4 streak lengths over 4,000 games with an overall winning % (Wins/Games) of 60%.

Col H shows the formulas used for the calculations.

 R/C C D E F G H 3 #Games 4000 4000 4000 4000 D3: 4000 4 Win % 60.00% 60.00% 60.00% 60.00% D4: 0.6 5 Streak Length 1 2 3 4 D5: 1 6 Exp Streak % 24.00% 14.40% 8.64% 5.18% D6: =(D4^D5)*(1-D4) 7 Expected Tally 384.00 230.40 138.24 82.94 D7: =D6*D3*(1-D4) 8 Actual Tally 390 227 141 81 D8: 390 9 Error (Act-Exp) 6.00 -3.40 2.76 -1.94 D9: =D8-D7 10 Error % 1.54% -1.50% 1.96% -2.40% D10: =D9/D8

<tbody>
</tbody>

Row 6 shows the expected probability for each streak length. Row 7 shows the expected tally for each streak length. Row 8 shows the actual tallies. Row 9 shows the difference in these percentages and Row 10 shows the percentage error.

My question is this: Are the values in Row 10 the best measures of how closely the actual tallies (Row 8) are to the expected tallies (Row 7)? If there is a better measure, I would appreciate some help understanding what it is and how to calculate it. I am looking for a measure that will allow me to rank the various tallies according to the error from the expected.

I hope I explained that adequately.

### Excel Facts

Copy formula down without changing references
If you have =SUM(F2:F49) in F50; type Alt+' in F51 to copy =SUM(F2:F49) to F51, leaving the formula in edit mode. Change SUM to COUNT.

#### BrianJN1

##### Well-known Member
Jennifer,
I'm not sure if this is appropriate, ie, does my logic hold true?

You have 4 different events (by category, 1, 2, 3, 4).
Each event differs from zero by the Error %.

I think that using a Standard Deviation might be the better way.
 1.54 -1.5 1.9 -2.4 2.3716 2.25 3.61 5.76 13.9916 0.330267 0.370267 0.029733 0.529733 1.870267

<tbody>
</tbody>

Row 1 is your error %. Row 2 is the Square of those values with the sum of the 4 at the end.
Beneath that sum is the STD DEV = Sqrt(SUM/4)
The other 4 values in row 3 are how far the absolute values sit from 1 Standard deviation.
My formula in the first cell of row 3 is: =ABS(ABS(A1)-\$E3)

Be careful. I have a rather limited concept of statistics; as I said, my logic may NOT be exactly right.

Last edited:

#### JenniferMurphy

##### Well-known Member
Jennifer,
I'm not sure if this is appropriate, ie, does my logic hold true?

You have 4 different events (by category, 1, 2, 3, 4).
Each event differs from zero by the Error %.

I think that using a Standard Deviation might be the better way.

I wanted to use the std dev, but couldn't figure out how to calculate it.

 1.54 -1.5 1.9 -2.4 2.3716 2.25 3.61 5.76 13.9916 0.330267 0.370267 0.029733 0.529733 1.870267

<tbody>
</tbody>

Row 1 is your error %. Row 2 is the Square of those values with the sum of the 4 at the end.
Beneath that sum is the STD DEV = Sqrt(SUM/4)

The other 4 values in row 3 are how far the absolute values sit from 1 Standard deviation.
My formula in the first cell of row 3 is: =ABS(ABS(A1)-\$E3)

Be careful. I have a rather limited concept of statistics; as I said, my logic may NOT be exactly right.

Hmmm... Is that really the std dev? And do we want to find out how far a value is from one std dev or how many std devs it is from the mean (z-score)?

How about this? I've added the mean and std dev to rows 9 & 11. I'm not sure if these are valid measures because I'm not sure if they are from the same population.

Then in rows 10 & 12, I calculate the Z score based on those values.

 R/C C D E F G I J K L 3 Streak Length 1 2 3 4 Mean Formula Std Dev Formula 4 #Games 4000 4000 4000 4000 5 Win % 60.00% 60.00% 60.00% 60.00% 6 Exp Streak % 24.00% 14.40% 8.64% 5.18% D6: =(D5^D3)*(1-D5) 7 Expected Tally 384.00 230.40 138.24 82.94 D7: =D6*D4*(1-D5) 8 Actual Tally 390 227 141 81 9 Error (Act-Exp) 6.00 -3.40 2.76 -1.94 0.8540 I9: =AVERAGE(D9:G9) 4.32 K9: =STDEV(D9:G9) 10 Z-Score 1.19 -0.98 0.44 -0.65 D10: =(D9-\$I9)/\$K9 11 Error % 1.54% -1.50% 1.96% -2.40% -0.1005% I11: =AVERAGE(D11:G11) 2.17% K11: =STDEV(D11:G11) 12 Z-Score 0.75 -0.64 0.95 -1.06 D12: =(D11-\$I11)/\$K11

<tbody>
</tbody>

Is any of this meaningful?

#### BrianJN1

##### Well-known Member
Jennifer,
I recognise two things in which I was probably was in error. I made the assumption that the "mean" would be zero, ie, no error. Secondly my divisor should have been 1 less that the values, ie, 3.

What are you doubting about "same population"? Are not the 4 categories of errors coming from the same 4000 game population?
That said, and you'd be more conversant than I, the most accurate means to report deviations would be based upon STDs and Z-scores are a more meaningful way to interpret them.

Yes, your table is meaningful to me and would seem to be the way to go.

#### JenniferMurphy

##### Well-known Member
Jennifer,
I recognise two things in which I was probably was in error. I made the assumption that the "mean" would be zero, ie, no error. Secondly my divisor should have been 1 less that the values, ie, 3.
I think the mean should be zero, if this is a true random variable, but since it's about human behavior, it's almost certainly not perfectly random. Does a human play the same after a loss as after 10 wins in a row? I think not.

What are you doubting about "same population"? Are not the 4 categories of errors coming from the same 4000 game population?
Yes, all of the results come from the same 4000 games. But it seems to me that by tallying the number the different streaks separately, I may have created different (sub)populations. This is over my skill level.

That said, and you'd be more conversant than I, the most accurate means to report deviations would be based upon STDs and Z-scores are a more meaningful way to interpret them.

Yes, your table is meaningful to me and would seem to be the way to go.
I think I'll try this on one of the stats forums.

Thanks

#### BrianJN1

##### Well-known Member
Ah well, if we didn't get to "the prize" then hopefully our exchange has cleared some thoughts or at least offered some new directions.
All the best.

Replies
1
Views
314
Replies
3
Views
2K