I know that this is meant to explain variance but the description on Wikipedia stinks and it is not clear how you can explain variance using this technique
Can anyone explain it in a simple way?
I know that this is meant to explain variance but the description on Wikipedia stinks and it is not clear how you can explain variance using this technique
Can anyone explain it in a simple way?
Principal component analysis is a useful technique when dealing with large datasets. In some fields, (bioinformatics, internet marketing, etc) we end up collecting data which has many thousands or tens of thousands of dimensions. Manipulating the data in this form is not desirable, because of practical considerations like memory and CPU time. However, we can't just arbitrarily ignore dimensions either. We might lose some of the information we are trying to capture!
Principal component analysis is a common method used to manage this tradeoff. The idea is that we can somehow select the 'most important' directions, and keep those, while throwing away the ones that contribute mostly noise.
For example, this picture shows a 2D dataset being mapped to one dimension:
Note that the dimension chosen was not one of the original two: in general, it won't be, because that would mean your variables were uncorrelated to begin with.
We can also see that the direction of the principal component is the one that maximizes the variance of the projected data. This is what we mean by 'keeping as much information as possible.'
Spent the day learning PCA, hope my cartoon translates the intuition over to you!
I have also tried to briefly explain the utility of PCA and related it to an analogy (no maths) to help give that feeling of "learning closure".
Visual Intuition (zoom in)
Intuition via Utility
I think the main usage for PCA is to be able to categorise different distinct "things" e.g. Shiny cells vs. Dark cells in a way that leads to least error (in terms of predicting the right colour cell). E.g. Imagine sam was hiding behind me and I pinched a cell off the left side of his body then asked you to guess the color of the cell, by looking at the winning photo, or even the winning line, you can make a very good guess it will be a "dark cell".
Intuition via Analogy
So my understanding is that PCA is like taking a "picture" in a lower dimension, but the various methods used out there attempt to make the picture as informative as possible by deciding which "angle" to take the picture from (notice for 1D the angle of "squishing line" also vary).
Good video
PCA basically is a projection of a higher-dimensional space into a lower dimensional space while preserving as much information as possible.
I wrote a blog post where I explain PCA via the projection of a 3D-teapot...
...onto a 2D-plane while preserving as much information as possible:
Details and full R-code can be found in the post:
http://blog.ephorie.de/intuition-for-principal-component-analysis-pca