Outlier detection and removal from ranges

RawlinsCross

Active Member
Joined
Sep 9, 2016
Messages
288
Good day,

I have a two ranges (for X and Y data) that I graph on a userform using the traditional method of graphing, exporting to image, and loading the image into a picture control. Oftentimes, the data in the Y data has a few errors in terms of outliers (X data is just dates). I'd like to have a button that reruns the graphing but takes the existing range variables (rXValues and rYValues) and removes the outliers in rYValues and removes the corresponding data in the rXValues. So I'm thinking a function that takes in the two range variables, performs the outlier detection and data removal, and returns the new corrected range variables? I already have the graphing code but am struggling to start the function.

For outlier criteria, I would think the IQR (interquartile range) should be sufficient:
1. Calculate the IQR for the data
2. Multiply the IQR by 1.5
3. Add this value to the value of the third quartile (Q3)
4. Subtract this value to the value of the first quartile (Q1)

Just not sure on how to manipulate the ranged variables where they are paired. Any thoughts?
 

Some videos you may like

Excel Facts

Who is Mr Spreadsheet?
Author John Walkenbach was Mr Spreadsheet until his retirement in June 2019.

RawlinsCross

Active Member
Joined
Sep 9, 2016
Messages
288
Just an update. I have decided to just take the rYValues and treat the data in that one column of data. I calculate the threshold where data is considered an outlier (as above) and replace that value with #n/a. The code works but I can't convert the cleaned array back to a range which can be used downstream of the function. Any thoughts?

Code:
Public Function RemoveOutliers(rDirtyRange As Range) As Range
Dim dQuart1, dQuart3, dIQR, dScaler, dOutlierMax As Double
Dim vCleanArray() As Variant
Dim i As Integer
Dim TempRange As Range

dQuart1 = Application.WorksheetFunction.Quartile(rDirtyRange, 1)
dQuart3 = Application.WorksheetFunction.Quartile(rDirtyRange, 3)
dIQR = dQuart3 - dQuart1
dScaler = 1.5 * dIQR
If dScaler = 0 Then
    dOutlierMax = dQuart3
Else
    dOutlierMax = dQuart3 + dScaler
End If
vCleanArray = rDirtyRange

'Index the array and replace values above dOutlierMax with NA()

For i = LBound(vCleanArray) To UBound(vCleanArray)
    If vCleanArray(i, 1) > dOutlierMax Then vCleanArray(i, 1) = CVErr(xlErrNA)
Next i
For i = LBound(vCleanArray) To UBound(vCleanArray)
    Debug.Print vCleanArray(i, 1)
Next i
  
'vClean array is correct so the code works up to here.
'Now all I need to do is to convert the vCleanArray back into a Range that I can return as the "RemoveOutliers" Range
'Clear any data that may be in the range D116:E145

Range("D116:E1450").Clear

'Copy the non-duplicate coordinates to D116:E116 & below

Set TempRange = Range("D116").Resize(UBound(vCleanArray), 1)
TempRange = Application.WorksheetFunction.Transpose(vCleanArray)
RemoveOutliers = TempRange

End Function
 

RawlinsCross

Active Member
Joined
Sep 9, 2016
Messages
288
Finally got it to work. For variant arrays, I had to physically cycle through the area and assign each value to a range cell. Not sure if this is the most efficient approach so would love feedback. So basically the steps in this function:

1. Function accepts a range which may or may not have extreme outliers
2. First and third quartiles are calculated
3. Interquartile range is calculated (Q3-Q1)
4. Threshold value (dScaler) is calculated
5. Minor if statement if data is uniform - effects the threshold value
6. Pass the range into a variant array
7. Index the array and replace values above the threshold with blank values
8. Copy the new cleaned array to a range on worksheet (numbers physically get copied into cells)
9. Set the function return to this range

Code:
Public Function RemoveOutliers(rDirtyRange As Range) As Range

Dim dQuart1, dQuart3, dIQR, dScaler, dOutlierMax As Double
Dim vCleanArray() As Variant
Dim i As Integer
Dim TempRange As Range
Dim myCell As Range

'Calculate the Quartiles
dQuart1 = Application.WorksheetFunction.Quartile(rDirtyRange, 1)
dQuart3 = Application.WorksheetFunction.Quartile(rDirtyRange, 3)
dIQR = dQuart3 - dQuart1
dScaler = 6 * dIQR

'Somtimes dScaler would return 0 and needed adjustment
If dScaler = 0 Then
    dOutlierMax = dQuart3
Else
    dOutlierMax = dQuart3 + dScaler
End If

'Pass the range into a variant array
vCleanArray = rDirtyRange

'Index the array and replace values above dOutlierMax with blank values
For i = LBound(vCleanArray) To UBound(vCleanArray)
    If vCleanArray(i, 1) > dOutlierMax Then vCleanArray(i, 1) = ""
Next i
  
'Clear any data that may be in the range D116:E1450
Range("D116:E1450").Clear

'Copy the non-duplicate coordinates to D116 & below
Set TempRange = Range("D116").Resize(UBound(vCleanArray), 1)
i = 1
For Each myCell In TempRange
    myCell = vCleanArray(i, 1)
    i = i + 1
Next myCell

Set RemoveOutliers = TempRange
End Function
 

Watch MrExcel Video

Forum statistics

Threads
1,099,004
Messages
5,465,955
Members
406,455
Latest member
mkadam

This Week's Hot Topics

Top