-3

I want to use the "princomp" function of Matlab but this function gives the eigenvalues in a sorted array. This way I can't find out to which column corresponds which eigenvalue. For Matlab,

m = [1,2,3;4,5,6;7,8,9];
[pc,score,latent] = princomp(m);

is the same as

m = [2,1,3;5,4,6;8,7,9];
[pc,score,latent] = princomp(m);

That is, swapping the first two columns does not change anything. The result (eigenvalues) in latent will be: (27,0,0) The information (which eigenvalue corresponds to which original (input) column) is lost. Is there a way to tell matlab to not to sort the eigenvalues?

Amro
  • 121,265
  • 25
  • 232
  • 431
Sunny
  • 31
  • 1
  • 1
  • 2
  • 5
    This... isn't how PCA/eigendecomposition works. Not to be mean, but a textbook would probably help you a lot better than this community could. – btown Feb 14 '11 at 11:19
  • Let's say that the columns of the matrix are the name of the cities for instance. The PCA will give you the eigenvectors and eigenvalues. Now, what you usually do, is to take the eigenvectors with the biggest eigenvalues and this will be your new basis. But I want to know which city has the biggest eigenvalue (and the corresponding eigenvector) for example. – Sunny Feb 14 '11 at 11:27
  • 3
    @Sunny The eigenvalues correspond to eigenvectors, which give you a linear combination of the columns/cities. They show how the different cities correlate. – Michael J. Barber Feb 14 '11 at 11:32
  • @Michael J. Barber: Yes, this is right, but the order of the cities is lost, because the eigenvalues (and thus their eigenvectors) are sorted in a descending order by matlab. This means, I don't have a way to find out to which city the largest eigenvalue apply... – Sunny Feb 14 '11 at 11:33
  • 2
    @Sunny The eigenvalues do not correspond to specific rows/columns of the matrix. The components of the eigenvectors do that. – Michael J. Barber Feb 14 '11 at 11:46
  • @Michael J. Barber: I agree, but how can I now map this eigenvectors returned by matlab to my original (input) column? – Sunny Feb 14 '11 at 11:53
  • Let's say the question would be: give me the three cities with the three most largest eigenvalues. In matlab it is impossible to find it out, because the sorting destroys this information – Sunny Feb 14 '11 at 11:54
  • No, you still don't understand eigenvalues. The eigenvalues do NOT correspond to columns of the matrix. There is NO correspondence in that respect. –  Feb 14 '11 at 11:57
  • @Sunny It's not a well-posed question. – Michael J. Barber Feb 14 '11 at 12:01
  • I think the question is well posed: principal components should extract the principal components. But I would like to know "who" these components are... I understand that the aim is the dimensionality reduction and a new basis composed by eigenvectors, but the components are the buiseness here, and I need them. If the eigenvectors would not be sorted by Matlab, then it would be easy to find the components. – Sunny Feb 14 '11 at 12:22
  • 1
    @Sunny. I think you have basic misconception with linear algebra. Perhaps more help can found here http://math.stackexchange.com/. Thanks – eat Feb 14 '11 at 12:48
  • @Sunny - what you do not understand is that the oder of the eigenvalues is ARBITRARY, and not even always completely sorted, for those eigenvalues which are close together in magnitude. It has NOTHING to do with the order of the columns. It is time for basic linear algebra here. –  Feb 14 '11 at 13:53
  • Excuse me, but it is you that did not understand the question: my question was clear: matlab princomp sorts the eigenvalues its way and not the way the original input matrix columns are. The principal components cannot be found after the computation of the princomp. So if nobody knows the answer about how I can retrieve the components, which are in my example cities, then do not write use textbooks or explaining me what eigenvectors are. Just simply say I do not know. – Sunny Feb 14 '11 at 13:54
  • 1
    @Sunny: -1, Please don't fight like this, there have already many knowledgeable one suggested to you, that you are lacking basic linear algebra. Straighten up your linear algebra skills and you'll notice how straightforward 'principal component analysis' actually is. Thanks – eat Feb 14 '11 at 20:04
  • Sigh.................. I do know. –  Feb 15 '11 at 04:23

2 Answers2

16

With PCA, each principle component returned will be a linear combination of the original columns/dimensions. Perhaps an example might clear up any misunderstanding you have.

Lets consider the Fisher-Iris dataset comprising of 150 instances and 4 dimensions, and apply PCA on the data. To make things easier to understand, I am first zero-centering the data before calling PCA function:

load fisheriris
X = bsxfun(@minus, meas, mean(meas));    %# so that mean(X) is the zero vector

[PC score latent] = princomp(X);

Lets look at the first returned principal component (1st column of PC matrix):

>> PC(:,1)
      0.36139
    -0.084523
      0.85667
      0.35829

This is expressed as a linear combination of the original dimensions, i.e.:

PC1 =  0.36139*dim1 + -0.084523*dim2 + 0.85667*dim3 + 0.35829*dim4

Therefore to express the same data in the new coordinates system formed by the principal components, the new first dimension should be a linear combination of the original ones according to the above formula.

We can compute this simply as X*PC which is the exactly what is returned in the second output of PRINCOMP (score), to confirm this try:

>> all(all( abs(X*PC - score) < 1e-10 ))
    1

Finally the importance of each principal component can be determined by how much variance of the data it explains. This is returned by the third output of PRINCOMP (latent).


We can compute the PCA of the data ourselves without using PRINCOMP:

[V E] = eig( cov(X) );
[E order] = sort(diag(E), 'descend');
V = V(:,order);

the eigenvectors of the covariance matrix V are the principal components (same as PC above, although the sign can be inverted), and the corresponding eigenvalues E represent the amount of variance explained (same as latent). Note that it is customary to sort the principal component by their eigenvalues. And as before, to express the data in the new coordinates, we simply compute X*V (should be the same as score above, if you make sure to match the signs)

Amro
  • 121,265
  • 25
  • 232
  • 431
  • Why do you center the data w.r.t. the mean? I tried the following... took the pca of data using `pca` and then tried to reconstruct it using `data*PC` -> doesn't work. However, it works with mean-centering. My question is if MATLAB `pca` does mean centering of data internally (from their docs of `pca`), then why reconstruction of data doesn't work when we don't explicitly perform mean centering? – Autonomous Oct 28 '14 at 20:38
  • @ParagS.Chandakkar: I could show an example, but it's too long for a comment. Do you mind creating a new question? – Amro Oct 28 '14 at 21:25
  • I actually got the answer myself. I actually should have thought a little before posting a comment. Thanks. – Autonomous Oct 28 '14 at 21:48
0

"The information (which eigenvalue corresponds to which original (input) column) is lost."

Since each principal component is a linear function of all input variables, each principal component (eigenvector, eigenvalue), corresponds to all of the original input columns. Ignoring possible changes in sign, which are arbitrary in PCA, re-ordering the input variables about will not change the PCA results.

"Is there a way to tell matlab to not to sort the eigenvalues?"

I doubt it: PCA (and eigen analysis in general) conventionally sorts the results by variance, though I'd note that princomp() sorts from greatest to least variance, while eig() sorts in the opposite direction.

For more explanation of PCA using MATLAB illustrations, with or without princomp(), see:

Principal Components Analysis

Predictor
  • 974
  • 6
  • 9