0

Given an array which is supposed to be incrementing linearly, but

  • might be missing some numbers and
  • might have some unexpected numbers thrown in,

how would you build an algorithm to remove all the outliers from the array?

Examples of possible arrays:

1,2,3,4,1,1,1,100,5,6,7

1,2,4,100,5,6,7

1,2,4,100,101,5,6,7,300

2,3,4,5,6,7,300

In all of the examples above, you should be able to know that the array is supposed to be either 1-7 or 2-7.

Some real-life example arrays:

1, 2, 295, 296, 297, 4, 5, 6, 8, 9, 10, 11, 12, 13, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 6, 6, 6, 6, 5, 5, 6, 6, 6, 6, 6, 4, 6, 6, 3, 4, 6, 6, 6, 5, 6, 6, 6, 4, 5, 6, 3, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 4, 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 4, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6, 3, 3, 6, 6, 6, 3, 6, 6, 4, 4, 6, 6, 6, 6, 6, 3, 6, 6, 6, 3, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 3, 6, 6, 6, 6, 6, 6, 5, 6, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 15, 18, 20, 21, 22, 23, 24, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 26, 712, 383, 114, 118, 225, 304, 323, 349, 357, 550, 556, 590, 649, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51

One solution that I came up with is to filter out all values that are more than N (=5?) greater than the previous valid value, as well as all values that are less than the previous valid value.

const filterOutliers = (someArray) => {
  let previousValidValue = null;
  return someArray.filter((x, index) => {
    //Assume the first value is valid - although this assumption might not always be true.
    if(!previousValidValue) {
      previousValidValue = x;
      return true;
    }
    // if the number is less than the previous valid value, remove it
    if(x < previousValidValue) {
      return false;
    }
    // if the number is more than 5 greater than the last valid value, remove it
    if(x > previousValidValue + 5) {
      return false;
    }
    previousValidValue = x;
    return true;
  })
}

Potentially relevant link: Javascript: remove outlier from an array?

Liron
  • 1,890
  • 18
  • 36
  • Basic approach for these jobs would be to generate a simple linear regression line equation (since you say mostly linear) and then eliminating the ones deviating from the line equation (the outliers) by a delta value of your choice. [Here is a good tutorial on generating the linear regression line equation](http://onlinestatbook.com/2/regression/intro.html) – Redu Apr 22 '19 at 09:46
  • @Redu: standard linear regression by least squares is very sensitive to outliers. This won't work with the given data sets. – Yves Daoust Apr 22 '19 at 10:19

1 Answers1

0

It seems that your inlier data values do have a constant increment. So compute the increments, take the mode and keep the sequences of values that follow this increment (to a suitable tolerance).

Yves Daoust
  • 48,767
  • 8
  • 39
  • 84