38

I'd like to ask everyone a question about how correlated features (variables) affect the classification accuracy of machine learning algorithms. With correlated features I mean a correlation between them and not with the target class (i.e the perimeter and the area of a geometric figure or the level of education and the average income). In my opinion correlated features negatively affect eh accuracy of a classification algorithm, I'd say because the correlation makes one of them useless. Is it truly like this? Does the problem change with the respect of the classification algorithm type? Any suggestion on papers and lectures are really welcome! Thanks

Titus Pullo
  • 3,303
  • 10
  • 41
  • 60

2 Answers2

29

Correlated features do not affect classification accuracy per se. The problem in realistic situations is that we have a finite number of training examples with which to train a classifier. For a fixed number of training examples, increasing the number of features typically increases classification accuracy to a point but as the number of features continue to increase, classification accuracy will eventually decrease because we are then undersampled relative to the large number of features. To learn more about the implications of this, look at the curse of dimensionality.

If two numerical features are perfectly correlated, then one doesn't add any additional information (it is determined by the other). So if the number of features is too high (relative to the training sample size), then it is beneficial to reduce the number of features through a feature extraction technique (e.g., via principal components)

The effect of correlation does depend on the type of classifier. Some nonparametric classifiers are less sensitive to correlation of variables (although training time will likely increase with an increase in the number of features). For statistical methods such as Gaussian maximum likelihood, having too many correlated features relative to the training sample size will render the classifier unusable in the original feature space (the covariance matrix of the sample data becomes singular).

bogatron
  • 16,253
  • 4
  • 49
  • 45
  • 1
    It's also important to mention that machine learning algorithms are very computationally intensive, and reducing the the features to independent components (or at least principal components) can greatly reduce the amount of resources required. – Srikant Krishna Feb 11 '13 at 19:19
  • My response focused only on the given question of classification accuracy but you make a good (and relevant) point. In addition to increased system requirements, training and classification times can grow exponentially with the number of features. – bogatron Feb 11 '13 at 19:55
  • Even features which are correlating highly can provide valueable further information such as in the case of classification. – Nikolas Rieble Oct 25 '17 at 09:17
2

In general, I'd say the more uncorrelated the features are, the better the classifier performance is going to be. Given a set of highly correlated features, it may be possible to use PCA techniques to make them as orthogonal as possible to improve classifier performance.

Alptigin Jalayr
  • 679
  • 4
  • 12
  • That is not true. Since PCA tries to pick components with maximum variance, high correlation will cause PCA to inflate the affect of the components. – krthkskmr Dec 04 '16 at 07:21