4

While using Isolation Forest for anomaly detection in data should we train the model with only normal data or mix of both normal as well as outlier data? Also what is the best algorithm for anomaly detection for multivariate data? I want minimum false positives.

  1. I am looking at contamination level less than 5% .
  2. Also what is the best ML algorithm for anomaly detection for multivariate data so that it gives minimum false positives.

Note: I know that false positives reduction is a matter of tuning the model but I wanted to know the most efficient algorithm. from blogs I have understood that IsolationForest is one of the newest and most efficient unsupervised anomaly detection algorithm.

Nir_AI
  • 41
  • 2

1 Answers1

1

Currently, scikit-learn v0.20.3 has isolation forests implemented. IForests are fairly good with handling high dimensional, multivariate data:

"the data is recursively partitioned with axis-parallel cuts at randomly chosen partition points in randomly selected attributes, so as to isolate the instances into nodes with fewer and fewer instances until the points are isolated into singleton nodes containing one instance." -- Charu C. Aggarwal (in Chapter 5 of Outlier Analysis)

I can't say for a fact that it gives the minimum false positives because it would really depend on many factors including your training data. As far as I can tell, it does a good job identifying anomalies and/or outliers (even with discrete time series).

You can set the contamination parameter to whatever percent your heart desires as long as it's a float in (0., 0.5).

"The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function."

The default is 0.1 (or 10%), so you could set contamination=0.04 (4%).

from sklearn.ensemble import IsolationForest

clf = IsolationForest(contamination=0.04)
PeterWhy
  • 68
  • 1
  • 2
  • 9
  • Thanks for your answer. May I ask you kindly to have a look at related post [here](https://stackoverflow.com/questions/66643736/incorrect-results-of-isolationforest)? – Mario Mar 16 '21 at 12:26