formula to compare 2 numbers each with different margin of error

adamelston

New Member
Joined
Jul 22, 2016
Messages
31
Hello, if possible please could someone let me know a good formula for comparing two numbers each with a different margin of error.

E.g. say I have 10(+-3) and 15(+-2). What would be correct formula to check whether difference between the two is outside margin of error? I guess this is a bit of a statistics question - I am not sure how to factor in the 3 and the 2 - do I add them?

Many thanks

A
 

Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest
A,

I'm not sure I understand what the result should look like. What would be the output of the formula in your example?


All the best,
Matt
 
Upvote 0
Is there any context behind these numbers...what they represent and what the ranges mean? I've seen results reported like this where the range represents 1 standard deviation...in other cases, some other number of standard deviations. Still in other cases, the results might be a mean with upper and lower confidence limits that result from a formal uncertainty analysis. In other cases, one might see ranges expressed like this where one set represents acceptance criteria where 90 % or 95 % of some product's measured characteristic must lie within the range...and the other set of values might be the results of evaluating the product. Context is important to advise on how to proceed.
 
Upvote 0
Sorry if this wasn't clear (and in advance if I make it even less clear).

Basically I want to compare e.g. two years of survey responses and assert whether or not year on year changes are significant. In my original example 15(+-2) is greater than 10(+-3) and the difference is statistically significant in that factoring known margins of error gives us 15 (lower bound 13 upper bound 17), and 10 (lower 7, upper 13). I.e. the 'error bars' do not overlap, though they are close for lower bound 15 and upper bound 10 (13 and 13).

I found these formulas online (here: How to compare numbers but allowing a tolerance in Excel?) based on a margin of error of 5 but I am not sure they are doing the job - e.g. the 5 in the first example only seems to be applying to B1, unless I am reading it wrong?

=AND(A1<=B1+5,A1>=B1-5)

=ABS(A1-B1)<=5
 
Upvote 0
Thanks for offering some clarification. Did you get this resolved? If not, please see the following for some additional thoughts and questions?

Since the values emerge from a survey, this provides some context about the term "margin of error" and I'm assuming that term is used in a traditional sense to mean the product of the standard error and some critical value associated with the level of confidence. This context hints at the use of the normal distribution, but that should not necessarily be assumed. Often, survey questions are asked in such a way that the answer is one thing or another, suggesting that a binomial distribution is more appropriate; however, if the sample size is large enough, the normal distribution is often used as an approximation to the binomial distribution. Do you know if the distribution is considered normal; or if not known, do you know anything about the sample size so that the use of the normal distribution can be justified? I ask about the sample size for another reason too, since the margin of error depends on the sample size as well as the level of confidence chosen when the margin of error was determined. Do you know what level of confidence is associated with the two sets of ranges?

Knowing both of these is important to ensure that an "apples-to-apples" comparison is being made? The margin of error together with the sample mean describes the range (i.e., the confidence interval) that would cover (include) the true value of the population mean were the survey to be repeated many times, and this coverage would be correct at a frequency consistent with the level of confidence (e.g., 90 % or 95 % of the time, although this level of confidence hasn't been disclosed). This is a nuance that is frequently misunderstood. The sample means and margins of error that you've described apply only to the survey samples reported. Were the process to be repeated many times over, and multiple samples of the same size drawn from the same population, it is conceivable that different sample means and different margins of error would be found for each of those surveys. The confidence level applies to the success rate of this process correctly bounding the true value of the population mean.

I think that rather than asking whether the "difference between the two is outside margin of error", you really want to ask whether the "difference between the two is 0, taking into account the margins of error". This question gets at whether the results of these two surveys taken a year or so apart are different enough to represent a shift in preferences or attitudes of the respondents. And if not, then the results would represent random sampling variations from the same population whose preferences or attitudes had not changed. To do that, it is better to not rely only on whether the two confidence intervals overlap (or touch each other). The formulas you've described do two things: One formula checks whether the mean result A lies within some range around B. In that case, only the interval around B is investigated, and A is assumed to be absolute, with no interval around it. The other formula compares the difference between sample means, A-B, to determine whether the difference is smaller than some value (5), but that value of 5 is not necessarily related to the margin of error. It appears to be an arbitrary value chosen for the example. Neither of these really answers the bigger question about whether the survey results (sample means) might reflect a significant change in preferences or attitudes.

I would recommend formal hypothesis testing (or test of significance)...that is what would typically be done for this type of question. In this case, we have A = 10 +/- 3 and B = 15 +/- 2, where the +/- establishes a confidence interval based on the margin of error. Do you know if the surveys are reporting absolute quantities, or are they percentages? For example, does 10 +/- 3 mean the interval is 7 to 13, or is it 7 % to 13 %? These details are needed (confidence level, sample size, and whether reported results are absolute or whether they represent some proportion of the sample) to take the next steps, which involve backing out a standard deviation from the margin of error so that hypothesis testing can be done.
 
Upvote 0
Hi @KRice, thank you for your comprehensive response.

Re some of the questions. I am working to a 95% CI and the results I am looking at are percentages and assume a normal distribution. Often I would use t-tests, and I have this set up, but sometimes I want to use confidence intervals and these are calculated at a prior stage (either manually, using the link I included above, or using more complex measures in dax or power bi, working out standard deviation etc). All I am now really looking for is a nice formula to quickly determine whether the difference between my two results, eg. year 1 and year 2 are statistically significant, taking into account the CI for both years, ie. that the difference between the two can't be explained by random sampling - all the other work has been done. Hope that makes sense.
Cheers,
Adam
PS - if i simply replaced the '5' in the following formula with CI1 + CI2 for each of the confidence intervals - might that do it? =ABS(A1-B1)<=5
 
Upvote 0
Adam, thanks for the additional detail. Regarding your last question in the P.S., no, I do not believe that will provide you with a defensible conclusion. That approach would tell you whether the intervals described by the margins of error of the two samples overlap or touch, but that doesn't really answer the question about whether the means could have resulted from two survey groups drawn from the same population. The more robust way to draw this type of conclusion is to perform a hypothesis test where P1 and P2 are the mean population proportions (intended to be inferred from the sample means p1 (or A) and p2 (or B) in your sheet):

Describe the sample statistic, create the hypotheses, and establish the level of significance
In this case, I think the question being asked involves whether the two samples could have been drawn from the same population, a population whose mean characteristic (whatever the survey is trying to assess) remained about the same over the years covered by the surveys. The sample statistic would represent the difference of the population means, where we treat each sample as though it was drawn from a population associated only with it...so a sample mean of p1 from the first survey is intended to represent a random selection of respondents drawn from a population whose mean is P1, and similarly, p2 from the second survey would be representative of a population whose mean is P2. Then the hypotheses focused on the population means would be stated like this:

Null hypothesis: H0: P1-P2=0 or put differently, P1 = P2, that is, we will accept or reject this depending on the outcome of the test statistic below.
Alternate hypothesis: H1: P1-P2 <> 0 or put differently P1 <> P2

Other assumptions that need to be reviewed:
  • Are the survey samples considered large (perhaps >= 30), which goes toward the type of test statistic to be used (probably either Z or t).
  • Are the two survey samples independent and unpaired (also said to be "unmatched", meaning that the same people were not used in both samples, except for the occasional one who may have been chosen randomly for both surveys).
  • What is the nature of the survey response variable? Is the response variable dichotomous or continuous? An example of the former: "Do you favor term limits for federal judges? Yes or No", and for the latter, "How many miles do you drive annually?" The answer to this question changes the form of the equation used to estimate the standard deviation of the sampling distribution of the difference. This estimate is often needed because the standard deviation of the populations is typically not known.
  • Are the population variances believed to be equal (i.e., the variance of the population represented by the first survey is approximately the same as the variance of the population represented by the second survey). This assumption is typically assessed by examining the ratio of the two sample standard deviations to confirm that it falls somewhere between 0.5 and 2. If so, then the assumption of approximately equal population variances is typically accepted as reasonable.
  • Evaluate at a level of significance of alpha = 0.05, a typical value associated with statements that carry a fairly high degree of confidence.
A quick back-of-the envelope assessment, making some assumptions (so please assess whether these are valid):
  • The sample sizes are large (to be confirmed next), so the assumption of normality is reasonable and the Z test is an appropriate test statistic. The response variable is dichotomous, so the margin of error (MOE) for some confidence level is a function of the mean and the sample size only. Let's say the hypothesis is to be tested at the level of significance of alpha =0.05 (i.e., the test rejects H0 at alpha=0.05) and the distribution is two-tailed, so the critical Z value, Z_c=1.96 (two-tailed at 95 % confidence). Then the form of the MOE equation is: MOE = Z_c * sqrt( p * (1-p) ) / sqrt ( n )
The MOE equation is solved twice...one time for each survey. We know the sample proportion means p1=0.10 and p2=0.15. This suggests that survey 1 had a sample size n1=384, and survey 2 had a sample size n2=1225. Both of these are considered large, so the assumption of normality is also justified.

Selection of the test statistic:
Since the sample sizes are large, we look at the mean proportions (p1=0.10 and p2=0.15). These are toward the edges of the distribution, but still probably central enough for a valid test. Then the form of the Z test is given by:
z = (p1 - p2) / sqrt ( p * (1-p) * ( 1/n1 + 1/n2 ) )

where we've already described p1 and p2, but we see that we have a new variable, p, representing the overall proportion of responses that are consistent with p1 and p2, and which is essentially a weighted average. First we consider that the proportion of respondents in the first survey responding one way is p1, which is given by p1 = x1 / n1, where x1 is the number of respondents giving that particular response and n1 is the total number of respondents in the first survey. So x1 = p1 * n1 = 0.10 * 384 = 38. Similarly, x2 = p2 * n2 = 0.15 * 1225 = 184. Turning now to the computation of p:
p = ( x1 + x2 ) / ( n1 + n2 ) = ( 38 + 184 ) / (384 + 1225 ) = 0.1380

State the Decision Rule:
Reject H0 if Z<=-1.960 or if Z>=1.960.

Compute the test statistic:
Substituting into the equation for z...
z = (p1 - p2) / sqrt ( p * (1-p) * ( 1/n1 + 1/n2 ) ) = z = (0.10 - 0.15) / sqrt ( 0.138 * (1-0.138) * ( 1/384 + 1/1225 ) ) = -2.479

Conclusion:
We reject the null hypothesis H0 since -2.479 <= -1.96. There is statistically significant evidence at alpha = 0.05 to show that there is a difference in the means of the two surveys, even though the sample margins of error result in the two intervals touching each other.

You might want to investigate some online resources...all of these are good:

If you have a number of these results to evaluate, you could set this up in Excel following this or a similar approach for each evaluation.
 
Upvote 0
Hi @KRice, thank you for that very comprehensive response and for the links which i will review. I do have t-tests set up already. I was looking for was a quick way of comparing two results (e.g. year 1 vs year 2) and their associated confidence intervals, to assess whether any difference between them is outside of MOE and thus significant. I don't think I am asking what you state in your first para i.e. 'whether the means could have resulted from two survey groups drawn from the same population', or perhaps I am - stats is not my strong point!
All the best,
Adam
 
Upvote 0
Hi Adam...I'm glad to help. My point about the two survey groups reflects a common approach in hypothesis testing. Often, the population is assumed to have some characteristic mean and distribution....and those characteristics do not change. Then suppose that multiple samples are drawn from that population and the sample characteristics are determined (e.g., surveys yield sample means and margins of error). A hypothesis test that compares two means will often establish the null hypothesis that the two means are equal, which would be expected because the two sample groups were drawn from the same population...and if there were no substantial shift in the means as determined by the test statistic, then we would accept the null hypothesis. The conclusion in that case would be not only were the samples similar, but that the population from which they were drawn reflects the same distribution of characteristics (whatever the survey is measuring).

On the other hand, if the hypothesis test leads to a rejection of the null hypothesis, indicating that the means are substantially different, then we would conclude that the difference is most likely due to some other factor other than chance alone. The "chance alone" bit covers the possibility that random selection of the two sample groups could have drawn enough specimens from the tails of the population distribution to skew the sample results enough to produce this result, which suggests that there is a substantial difference even though there is not truly a change in the distribution of characteristics in the population (this is a type of sampling error called a Type I error). This is why the typical statement includes some expression of the significance or confidence level to acknowledge the relatively small chance that sampling alone could have produced the same outcome. In the end, the samples are intended to draw some conclusion about the population from which they were selected, so a comparison of means is fundamentally asking whether the underlying populations from which the samples were drawn should be considered the same.

In your case, you have two surveys taken a year apart, so it is fair to consider that each sample is intended to represent its population at that snapshot in time, and then the survey aims to infer whether the two population snapshots have the same distributions of characteristics.

What is the nature of the survey question(s)? Are they measuring a response variable that is one of two possible answers (e.g., yes/no, true/false, prefer choice A over choice B, etc.)(my example above assumed this, but that may not be correct), or are the response variables continuous in nature (e.g, household income, miles driven, etc.)?
 
Upvote 0

Forum statistics

Threads
1,214,377
Messages
6,119,182
Members
448,872
Latest member
lcaw

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top