Interquartile Outliers - Removing anomalies from data

lewisgmorris

Board Regular
Joined
Oct 23, 2014
Messages
119
Hi All,

I have wrote the following function from what i have read online to mark all the data that is outside the "normal" values of the range.

Data source on google docs sheet here - https://drive.google.com/file/d/0B6BOCseXW1yZMEdELXpIazd5OFk/view

Only problem is that it doesn't seem to take into account times a value is found in the range. I would have expected at least the 500's to not be an outlier

Any one have a better solution to attack this issue? Or see where I may have gone wrong with my code?

p.s this function does work on smaller data sets quite well.



Code:
Sub changeOutlier1()


Dim dblAverage As Double, dblStdDev As Double
Dim ws As Worksheet
Set ws = Worksheets("Stock Detail Use")
Dim lr, x As Integer
lr = ws.Cells(Rows.Count, 1).End(xlUp).Row
Dim rng As Range
Set rng = ws.Range(ws.Cells(2, 2), ws.Cells(lr, 2))
ws.Range("C:C").ClearContents


dblAverage = WorksheetFunction.Average(rng)
dblStdDev = WorksheetFunction.StDev_P(rng)
Dim q1, iqr, q3 As Double


q1 = WorksheetFunction.Quartile(rng, 1)
q3 = WorksheetFunction.Quartile(rng, 3)
iqr = q3 - q1




For x = 2 To lr
    If ws.Cells(x, 2) < q1 - (1.5 * iqr) Or ws.Cells(x, 2) > q1 + (1.5 * iqr) Then
        ws.Cells(x, 3) = True
    End If
Next x


End Sub
 

Excel Facts

Create a chart in one keystroke
Select the data and press Alt+F1 to insert a default chart. You can change the default chart to any chart type
It is not a VBA issue. It is the question what you define to be an outlier in your case. Have you consulted any statistics books? You might even start here: https://en.wikipedia.org/wiki/Outlier.

J.Ty.
 
Upvote 0
yeah, thanks for that. This solution does not give me the result i am after. It doesn't include enough data for me.
 
Upvote 0
I have wrote the following function from what i have read online to mark all the data that is outside the "normal" values of the range.
[....]
Only problem is that it doesn't seem to take into account times a value is found in the range. I would have expected at least the 500's to not be an outlier[.] Any one have a better solution to attack this issue? Or see where I may have gone wrong with my code?

There is nothing wrong with your definition of "outlier". And off-hand, I see nothing wrong your VBA implementation.

If we define the named range "data" as $B$2:$B$6765, we can verify the calculations in Excel as follows:

D2, q1: =QUARTILE(data,1)
D3, q3: =QUARTILE(data,3)
D4, iqr: =D3-D2
D5, 1.5*iqr: =1.5*D4
D6, out1: =D2-D5
D7, out3: =D3+D5
D8, #out1: =COUNTIF(data,"<"&D6)
D9, #out3: =COUNTIF(data,">"&D7)
D10, #data: =COUNT(data)
D11, %out: =(D8+D9)/D10

D3 shows that 75% is less than or equal to 100. So your data is heavily skewed to the low end.

There are no low outliers (D8=0) because out1 is negative, and your smallest data value is 1.

D9 shows there are 329 values that exceed 220 (out3 in D7), which comprise less than 5% of the data (D11).

Note that just because values are "outliers", that does not mean you should necessarily exclude them, unless your objective is to focus on the "central" values for whatever purpose you have in mind.

Otherwise, "outliers" are excluded only if they represent errors in the data.
 
Upvote 0
off-hand, I see nothing wrong [with] your VBA implementation.

Actually, there is one mistake. You wrote:
Rich (BB code):
If ws.Cells(x, 2) < q1 - (1.5 * iqr) Or ws.Cells(x, 2) > q1 + (1.5 * iqr) Then
That should be:
Rich (BB code):
If ws.Cells(x, 2) < q1 - (1.5 * iqr) Or ws.Cells(x, 2) > q3 + (1.5 * iqr) Then

It still treats all of the 500s as outliers.
 
Last edited:
Upvote 0

Forum statistics

Threads
1,215,544
Messages
6,125,438
Members
449,225
Latest member
mparcado

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top