12
values = [8160,8160,6160,22684,0,0,60720,1380,1380,57128]

how can I remove outliers like 0, 57218, 60720 and 22684?

Is there a library which can do this?

javastudent
  • 299
  • 3
  • 12
  • You may want to check this answer: http://stackoverflow.com/a/5767357/2563028 . If you'd like to check out a library, there's underscore. See this answer for an example: http://stackoverflow.com/a/14954540/2563028 – EfrainReyes Dec 28 '13 at 04:54
  • You can also take a look at this library [link](http://lodash.com/) – Mozak Dec 28 '13 at 05:01

5 Answers5

22

This all depends on your interpretation of what an "outlier" is. A common approach:

  • High outliers are anything beyond the 3rd quartile + 1.5 * the inter-quartile range (IQR)
  • Low outliers are anything beneath the 1st quartile - 1.5 * IQR

This is also the approach described by Wolfram's Mathworld.

This is easily wrapped up in a function :) I've tried to write the below clearly; obvious refactoring opportunities do exist. Note that your given sample contains no outlying values using this common approach.

function filterOutliers(someArray) {  

    // Copy the values, rather than operating on references to existing values
    var values = someArray.concat();

    // Then sort
    values.sort( function(a, b) {
            return a - b;
         });

    /* Then find a generous IQR. This is generous because if (values.length / 4) 
     * is not an int, then really you should average the two elements on either 
     * side to find q1.
     */     
    var q1 = values[Math.floor((values.length / 4))];
    // Likewise for q3. 
    var q3 = values[Math.ceil((values.length * (3 / 4)))];
    var iqr = q3 - q1;

    // Then find min and max values
    var maxValue = q3 + iqr*1.5;
    var minValue = q1 - iqr*1.5;

    // Then filter anything beyond or beneath these values.
    var filteredValues = values.filter(function(x) {
        return (x <= maxValue) && (x >= minValue);
    });

    // Then return
    return filteredValues;
}
rynop
  • 41,200
  • 23
  • 87
  • 99
James Peterson
  • 510
  • 3
  • 8
  • 1
    Does it work at all? I tried `filterOutliers([8160,8160,6160,22684,0,0,60720,1380,1380,57128, 1000000000000])` and it return exactly the same array. – Pablo Mar 27 '14 at 04:47
  • 1
    Slight logic mistake in the above code. the filter should return (x < maxValue) && (x > minValue); – Algonomaly Jul 02 '14 at 22:06
  • This returns empty array if q1===q3. Should be return `(x <= maxValue) && (x >= minValue)`. – Timo Kähkönen Oct 12 '15 at 12:28
  • If the data is sorted then it will be more efficient to iterate up and down for min and max and splice off once the points are reached than filtering every item – Dominic Dec 18 '19 at 06:54
  • 1
    `[4421, 3512, 5126, 6012, 7581, 2023, 5012, 2320, 17, 2125]` doesn't remove `17` how can this be? Surely `17` is an outlier here? – Frank Jun 22 '20 at 10:24
  • 2
    @Frank : 17 is not an outlier. The lower bound for your array `(1st.Quadrant - 1.5 * ( IQR ))`is much lower than 17. – epsilon91 Jul 14 '20 at 20:44
7

This is an improved version of @james-peterson solution that updates the syntax to the current Javascript standard and adds a more robust way of finding the two quartiles (implemented according to formulas at https://de.wikipedia.org/wiki/Interquartilsabstand_(Deskriptive_Statistik) ). It uses a faster way of copying the array (see http://jsben.ch/wQ9RU for a performance comparison) and still works for q1 = q3.

function filterOutliers(someArray) {

  if(someArray.length < 4)
    return someArray;

  let values, q1, q3, iqr, maxValue, minValue;

  values = someArray.slice().sort( (a, b) => a - b);//copy array fast and sort

  if((values.length / 4) % 1 === 0){//find quartiles
    q1 = 1/2 * (values[(values.length / 4)] + values[(values.length / 4) + 1]);
    q3 = 1/2 * (values[(values.length * (3 / 4))] + values[(values.length * (3 / 4)) + 1]);
  } else {
    q1 = values[Math.floor(values.length / 4 + 1)];
    q3 = values[Math.ceil(values.length * (3 / 4) + 1)];
  }

  iqr = q3 - q1;
  maxValue = q3 + iqr * 1.5;
  minValue = q1 - iqr * 1.5;

  return values.filter((x) => (x >= minValue) && (x <= maxValue));
}

See this gist: https://gist.github.com/rmeissn/f5b42fb3e1386a46f60304a57b6d215a

Roy
  • 147
  • 2
  • 9
  • The second conditional won't work for anything with an array length < 7 as q3 ends up out of bounds i.e. `Math.ceil(7 * (3/4) + 1) = 7`. A `Math.min` should fix it I guess – Dominic Feb 12 '20 at 03:34
  • Also `q3` in the first conditional will be `NaN` if the array length is 4 since `values[(values.length * (3 / 4)) + 1]` points at nothing. So the it should probably exit if the length is <= 4 – Dominic Feb 12 '20 at 03:43
3

I had some problems with the other two solutions. Problems like having NaN values as q1 and q3 because of wrong indexes. The array length needs to have an -1 because of the 0 index. Then it is checked if the index is a int or decimal, in the case of a decimal the value between two indexes is extracted.

function filterOutliers (someArray) {
    if (someArray.length < 4) {
        return someArray;
    }

    let values = someArray.slice().sort((a, b) => a - b); // copy array fast and sort

    let q1 = getQuantile(values, 25);
    let q3 = getQuantile(values, 75);

    let iqr, maxValue, minValue;
    iqr = q3 - q1;
    maxValue = q3 + iqr * 1.5;
    minValue = q1 - iqr * 1.5;

    return values.filter((x) => (x >= minValue) && (x <= maxValue));
}

function getQuantile (array, quantile) {
    // Get the index the quantile is at.
    let index = quantile / 100.0 * (array.length - 1);

    // Check if it has decimal places.
    if (index % 1 === 0) {
        return array[index];
    } else {
        // Get the lower index.
        let lowerIndex = Math.floor(index);
        // Get the remaining.
        let remainder = index - lowerIndex;
        // Add the remaining to the lowerindex value.
        return array[lowerIndex] + remainder * (array[lowerIndex + 1] - array[lowerIndex]);
    }
}
A. van Hugten
  • 515
  • 4
  • 10
1

Here is the implementation to filter upper outliers from a given collection. This approach follows a similar methodology as the provided answers above.

The if case will be checking the length of collection if it is 4n or 4n + 1. In that case, we need to get an average of two elements to have our quartiles.

4n and 4n+1 cases

Otherwise, in cases of 4n + 2 and 4n + 3, we directly can access the upper/lower quartile.

4n+2 and 4n+3 cases


const outlierDetector = collection => {
    const size = collection.length;

    let q1, q3;

    if (size < 2) {
        return collection;
    }

    const sortedCollection = collection.slice().sort((a, b) => a - b);

    if ((size - 1) / 4 % 1 === 0 || size / 4 % 1 === 0) {
        q1 = 1 / 2 * (sortedCollection[Math.floor(size / 4) - 1] + sortedCollection[Math.floor(size / 4)]);
        q3 = 1 / 2 * (sortedCollection[Math.ceil(size * 3 / 4) - 1] + sortedCollection[Math.ceil(size * 3 / 4)]);
    } else {
        q1 = sortedCollection[Math.floor(size / 4)];
        q3 = sortedCollection[Math.floor(size * 3 / 4)];
    }

    const iqr = q3 - q1;
    const maxValue = q3 + iqr * 1.5;

    return sortedCollection.filter(value => value >= maxValue);
};

Cea
  • 195
  • 1
  • 10
0

This method actually fails if the set of your data contains duplicated values. E.g. 1, 2, 2, 2, 2, 2, 3, 10.

I struggled with it for a while, but then I discovered something called Grubbs'test. So far it seems reliable at least in my case.

Here's a link to demo (and source): http://xcatliu.com/grubbs/

xb1itz
  • 855
  • 9
  • 15