Using mathematical methods to spot anomalies

theta

Well-known Member
Joined
Jun 9, 2009
Messages
960
Hi all. If I had a list of numbers (1,000 rows) and wanted to find those which would appear to be anomalies, given the rest of population...what methods could I use?

15
19
65
87
4
23
100
53
14
7

Distance from the mean?
Normal (Gaussian) distribution?
Bi-nomial distribution?
Quantitative methods...

What would be the easist way to 'mark' these items with some sort of index (standard z-score) and perhaps graph it?

Would there be a way to 'tune' this to make it more/less senstitive...
 

Excel Facts

When they said...
When they said you are going to "Excel at life", they meant you "will be doing Excel your whole life".
I would use the standard deviation of the population and flag anything that is outside say 2 or 3 SD from the mean. You could then tune this by adjusting the number of SDs to include or exclude more as you want - you can use the normal distribution to give you how many this will exclude if the population is normally distributed to give you an idea of how many "should" get thrown up (assuming you are sampling from a normal population of course;)).
 
Upvote 0
Thanks for the quick reply!

How would I acheive this using formula if I had a column of 1,000 numbers (say column A1:A1000). No problem using helper columns etc.

New to stats! Calc the standard deviation and then the normal distribution

Would like to be able to visualise this in a graph (but that can wait until I resolve the formulas).
 
Upvote 0
STDEV(A1:A100) calculates teh standard deviation but it can only cope with 254 values. If your data is random that should not be too much of a problem for you - simply take an arbitrary 254 values and use the SD of that.

If its not random or you want a better value you can calculate the SD manually easily enough.

The SD is the square root of the variance and variance is:

((x-mean of x)^2 Summed for all values of x)/(n-1)

Cant remeber why its n-1 (it was a long time ago)

If you are coding it you can use a rearrangement of this such that teh variance is

1/(n-1)*(sum of x^2 for all values of x - (sum of all values of x)^2/n)

So the SD can be coded as:

Code:
=SQRT((1/(COUNT(A1:A1000)-1))*(SUMPRODUCT(A1:A1000,A1:A1000)-((SUM(A1:A1000)^2)/(COUNT(A1:A1000)))))


Hope that helps
 
Upvote 0
Hi

To calculate the standard deviation you can also use directly:

=STDEV(A1:A1000)
 
Upvote 0
pgc01

STDEV is limited to a maximum sample of 254 according to the help unless a single array only counts as one.
 
Upvote 0
pgc01

STDEV is limited to a maximum sample of 254 according to the help unless a single array only counts as one.

Hi energman58

I guess you are referring to the maximum number of arguments of the function in the latest excel versions, not the number of values.

For ex., in this case I'm using the function with 2 arguments but with 20 values:

=STDEV(E1:E10,F1:F10)
 
Upvote 0
If I had a list of numbers (1,000 rows) and wanted to find those which would appear to be anomalies, given the rest of population...what methods could I use?
There's lots of interesting possibilities. But your results must depend on the criteria you want to use.

Years ago Emil Gumbel wrote a significant book on "The Statistics of Extremes" after many years of earlier research, http://en.wikipedia.org/wiki/Extreme_value_theory which has a bearing on this sort of problem.

Sometime later Paul Samuelson (who won a Nobel prize, but not for that) considered the interesting and related question of "How Deviant Can You Be?" http://en.wikipedia.org/wiki/Samuelson's_inequality, that also generated a significant literature.

There's lots of other background available on this sort of topic/problem. Some of it I guess could be coded into Excel if there proved to be sufficient interest.

Maybe also worth pointing out that some distributions don't even have standard distributions, so while Excel's StDev may well produce a number, it doesn't necessarily mean anything unless you have adequate confirming additional information.

And even for distributions that do have standard deviations, most of these are unlikely to be normal ... although you can of course assume whatever you like.
 
Upvote 0
Thanks guys this is helping alot.

Couldn't understand why things were being squared then rooted. Makes sense now...converts all differences into positive number, or 'true' difference...then mean and root reduces it back to the correct scale.

Keep it coming, this info is great

Hoping to look at a list of transactions and find those that appear unusual when compared to the rest of the population
 
Upvote 0
Hi guys, produced my first model using standard deviation. I can't help but notice that if you have an erroneous item, it is still contributing to the mean so may decrease the accuracy.

You could have 3 erroneous items, one of these being exceptionally large...enough to push the mean past the point where the other 2 aren't highlighted?
 
Upvote 0

Forum statistics

Threads
1,214,585
Messages
6,120,394
Members
448,957
Latest member
Hat4Life

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top