Removing Outliers with standard deviation

ineedhelpp

New Member
Joined
Oct 20, 2019
Messages
4
Hello,

I am working on a project and told to remove all outliers by using the ? +- 3? approach. I am wondering how i should go about this? do i need to calculate the mean and all quartiles to remove the outliers or is there a single excel function that can remove the data outside 3 standard deviations?

Thank you for any help!
 

Kenneth Hobson

Well-known Member
Joined
Feb 6, 2007
Messages
3,092
I guess you could run a macro to delete/remove data. If there are less than 30 data points, I normally use sample standard deviation and average. Do that first in two cells and then do a simple =IF().

I normally set extreme outliers if 3 or more standard deviations which is a z rating of 0. e.g.
Code:
Function f3SDOutlier(x As Variant, xRange As Variant) As Boolean
  Dim sStdDev As Double, xBar As Double
  xBar = WorksheetFunction.Average(xRange)
  sStdDev = WorksheetFunction.StDev(xRange)
  f3SDOutlier = fRating(x, xBar, sStdDev) = 0
End Function

Function fRating(x As Variant, mean As Variant, sd As Variant) As Integer
  Dim z As Double, s As Integer, r As Integer
  Application.Volatile False
  On Error GoTo ExitFunction
  z = (x - mean) / sd
  s = 1
  If z < 0 Then s = -1
  If x = "" Or IsEmpty(x) Then z = 4
  Select Case True
    Case z < 1
      r = s * 5
    Case z >= 1 And z < 1.5
      r = s * 4
    Case z >= 1.5 And z < 2
      r = s * 3
    Case z >= 2 And z < 2.5
      r = s * 2
    Case z >= 2.5 And z < 3
      r = s
    Case Else
      r = 0
  End Select
  fRating = r
ExitFunction:
End Function
It is easy enough to write a formula array function from there. I normally mark the data in red and then remove them manually if appropriate.

The quartile method is more elaborate.
 
Last edited:

joeu2004

Well-known Member
Joined
Mar 2, 2014
Messages
2,580
Office Version
2010
Platform
Windows
I am working on a project and told to remove all outliers by using the u +- 3s approach
Of course, you should do what you are told to do. But unless your data is (expected to be) normally distributed, identifying outliers by +/-sd might not be the right thing to do.

More generally, I prefer to use the IQR method. But I choose +/-1.7IQR in order to approximate +/-3sd for normal distributions.


do i need to calculate the mean and all quartiles to remove the outliers or is there a single excel function that can remove the data outside 3 standard deviations?
Ironically, you do not need to calculate "all quartiles". That would be for the IQR method (and only the 25%ile and 75%ile, and not the mean).

But you do need to calculate the mean (AVERAGE) and standard deviation (STDEVP or STDEV.P) for the +/-3sd method.

Although you could "remove" outliers, it might be sufficient to ignore them in your calculations.

For example, if U1 is =AVERAGE(A1:A1000) and S1 is =STDEVP(A1:A1000), where A1:A1000 is all of your data, the mean and standard deviation of the data "without" (ignoring) outliers are the following array-entered formulas (press ctrl+shift+Enter instead of just Enter):

=AVERAGE(IF(ABS(STANDARDIZE(A1:A1000,U1,S1))<=3, A1:A1000))

=STDEVP(IF(ABS(STANDARDIZE(A1:A1000,U1,S1))<=3, A1:A1000))

You might choose to use STDEV instead of STDEVP in the last formula.

PS.... Again, you should do whatever you are told to do. But in actual practice, most statisticians suggest that outliers should be identified, but only removed or ignored after determining that they represent errors in the data, not merely extremes.
 

Kenneth Hobson

Well-known Member
Joined
Feb 6, 2007
Messages
3,092
In a blank sheet, you can run the first 3 subs. They create the data in column A, show non-outliers in column B, and mark column A outliers in red.
Code:
Sub ColA()  'Random data in column A
  Dim a, i As Long
  i = 10
  a = RndIntPick(1, 100, i)
  ReDim Preserve a(1 To i + 2)
  a(i + 1) = -1000
  a(i + 2) = 120
  Range("A2", Range("A2").End(xlDown)).ClearContents
  Range("A2").Resize(UBound(a)) = WorksheetFunction.Transpose(a)
End Sub

Sub ColB()  'Column A data with outliers removed
  Dim a
   Range("B2", Range("B2").End(xlDown)).ClearContents
  a = NotOutlier3sd(Range("A2", Range("A2").End(xlDown)))
  Range("B2").Resize(UBound(a)) = WorksheetFunction.Transpose(a)
End Sub

Sub Test_Mark3sdoutliersred()
  Mark3sdOutliersRed Range("A2", Range("A2").End(xlDown))
End Sub

Sub Mark3sdOutliersRed(xRange As Range)
  Dim xSStdDev As Double, xMean As Double, v, i As Long
  Dim r As Range, c As Range
  
  xRange.Interior.Color = xlNone
  
  xMean = WorksheetFunction.Average(xRange)
  xSStdDev = WorksheetFunction.StDev(xRange)
  ReDim a(1 To xRange.Count)
  For i = 1 To UBound(a)
    If xRange(i) > xMean + 3 * xSStdDev Or xRange(i) < xMean - 3 * xSStdDev Then
      xRange(i).Interior.Color = vbRed
    End If
  Next i
End Sub

Function NotOutlier3sd(xRange As Range)
  Dim xSStdDev As Double, xMean As Double, v, i As Long, j As Long, a
  
  xMean = WorksheetFunction.Average(xRange)
  xSStdDev = WorksheetFunction.StDev(xRange)
  ReDim a(1 To xRange.Count)
  For i = 1 To UBound(a)
    If xRange(i) <= xMean + 3 * xSStdDev And xRange(i) >= xMean - 3 * xSStdDev Then
      j = j + 1
      a(j) = xRange(i)
    End If
  Next i
  ReDim Preserve a(1 To j)
  NotOutlier3sd = a
End Function

Function RndIntPick(first As Long, last As Long, _
    noPick As Long, Optional bSort As Boolean = False) As Variant
    Dim i As Long, r As Long, Temp As Long, k As Long
    ReDim iArr(first To last) As Variant
    Dim a() As Variant
     
    For i = first To last
        iArr(i) = i
    Next i
     
    Randomize
    For i = 1 To noPick
        r = Int(Rnd() * (last - first + 1 - (i - 1))) + (first + (i - 1))
        Temp = iArr(r)
        iArr(r) = iArr(first + i - 1)
        iArr(first + i - 1) = Temp
    Next i
     
    ReDim Preserve iArr(first To first + noPick - 1)
    ReDim a(1 To noPick)
    For r = 1 To noPick
        a(r) = iArr(LBound(iArr) + r - 1)
    Next r
     
    If bSort = True Then
        RndIntPick = ArrayListSort(a())
    Else
        RndIntPick = a()
    End If
End Function
 
Function ArrayListSort(sn As Variant, Optional bAscending As Boolean = True) As Variant
    With CreateObject("System.Collections.ArrayList")
        Dim cl As Variant
        For Each cl In sn
            .Add cl
        Next
         
        .Sort 'Sort ascendending
        If bAscending = False Then .Reverse 'Sort and then Reverse to sort descending
        ArrayListSort = .toarray()
    End With
End Function
 
Last edited:

Forum statistics

Threads
1,081,981
Messages
5,362,538
Members
400,679
Latest member
alecalec202

Some videos you may like

This Week's Hot Topics

  • populate from drop list with multiple tables
    Hi All, i have a drop list that displays data, what i want is when i select one of those from the list to populate text from different tables on...
  • Find list of words from sheet2 in sheet1 before a comma and extract text vba
    Hi Friends, Trying to find the solution on my task. But did not find suitable one to the need. Here is my query and sample file with details...
  • Dynamic Formula entry - VBA code sought
    Hello, really hope one of you experts can help with this - i've spent hours on this and getting no-where. .I have a set of data (more rows than...
  • Listbox Header
    Have a named range called "AccidentsHeader" Within my code I have: [CODE]Private Sub CommandButton1_Click() ListBox1.RowSource =...
  • Complex Heat Map using conditional formatting
    Good day excel world. I have a concern. Below link have a list of countries that carries each country unique data. [URL...
  • Conditional formatting
    Hi good morning, hope you can help me please, I have cells P4:P54 and if this cell is equal to 1 then i want row O to say "Fully Utilised" and to...
Top