4

I am using PCA on 100 images. My training data is 442368x100 double matrix. 442368 are features and 100 is number of images. Here is my code for finding the eigenvector.

[ rows, cols] = size(training);
maxVec=rows;
maxVec=min(maxVec,rows);
train_mean=mean(training,2);
A=training-train_mean*ones(1,cols);
A=A'*A;
[evec,eval]=eig(A);
[eval ind]  =  sort(-1*diag(eval));
evec= evec(:, ind(1:100));

Now evec is an eigenvector matrix of order of 100x100 double and now I have got 100 eigenvectors sorted.

Questions:

Now, if I want to transform my testing data using above calculated eigenvectors then how can I use these eigenvectors? My testing data is 442368x50 double but my eigenvector matrix is 100x100 double. The inner matrix dimensions don't agree. How can I find the dot product of my testing data and eigenvector matrix?

rayryeng
  • 96,704
  • 21
  • 166
  • 177
Rafay Zia Mir
  • 2,036
  • 5
  • 20
  • 45

1 Answers1

8

What you are doing is essentially dimensionality reduction. You currently have the top 100 eigenvectors that determine the basis vectors that retain the largest variance in your data. What you want to do now is to project your test data onto these same basis vectors. BTW, you do have an error with your covariance matrix calculation. This is performed on a per feature basis but you are performing this on a per image basis.... so that's not correct. You have to swap the order of the transpose in your calculation. You also must divide by the total number of examples minus 1 to complete the calculation and produce an unbiased estimator:

A = (1/(cols-1))*(A*A.');

Transposing A first then multiplying assumes that each column is a feature but this is not the case for you. If you recall from dimensionality reduction, we currently have a matrix of eigenvectors where each column is an eigenvector. If you want to finally perform the reduction, it is simply a multiplication of the data matrix that is mean subtracted with the eigenvector matrix. It is important to note that the order of the eigenvectors in this matrix is such that the basis vector encompassing the largest variance that can be explained by your data appears first. That is why the sorting is performed on the eigenvalues as the eigenvector with the largest eigenvalue embodies this property. However, this operation assumes that each column is a feature and your data matrix is such that each row is a feature. If you want to perform reconstruction on your original training data, you'll need to transpose the mean subtracted data before doing this multiplication. However, this will make each example in a row. From your code, each column is an example and so you can transpose the eigenvector matrix instead:

% Assuming you did (1/(cols-1))*(A*A.') to compute the eigenvectors
Atrain = training - train_mean*ones(1, cols);
Areconstruct = evec.' * Atrain;

Areconstruct will contain the reconstructed data where each column is corresponding reprojected example. I also needed to store the mean subtracted feature matrix as your code overwrites this with the covariance matrix. If you want to perform this reprojection on your test data, you must mean subtract with the features computed from your training data, then apply the multiplication above. Assuming your data is stored in test_data, simply do the following:

cols_test = size(test_data, 2);
B = test_data - train_mean*ones(1, cols_test);
Breconstruct = evec.' * B;

Breconstruct contains the reprojected data onto the basis vectors which will now be a 100 x 50 matrix where each column is a reprojected example from your test data.


A word of warning

This code will probably run very slow or worst case not run at all as your covariance matrix is quite large in size. It is highly advisable that you reduce the total number of features a priori as much as possible before trying dimensionality reduction. As you have stated in your comments, each example is simply an unrolled version of an image as a long vector so try resizing the image down to something manageable. In addition, it is usually customary to low-pass filter (Gaussian blur for example) the resized image prior to using as it prevents aliasing.

Also, check out the recommendation I have for using Singular Value Decomposition later on in this post. It should be faster than using the eigenvectors of the covariance matrix.


Can you make your code more efficient?

I would improve on this code by using bsxfun and also you can use sort with the descend flag so you don't have to multiply your values by -1 prior to sorting to get the indices in descending order. bsxfun allows you to efficiently mean subtract your features without performing the duplication of having the mean of each feature be repeated for as many examples that you have (i.e. with ones(1, cols)).

Specifically:

[ rows, cols] = size(training);
maxVec=rows;
maxVec=min(maxVec,rows);
train_mean=mean(training,2);
A = bsxfun(@minus, training, train_mean); % Change
%A=training-train_mean*ones(1,cols); 
Acov = (1/(cols-1))*(A*A.'); % Change - correct formula
[evec,eval]=eig(Acov);
%[eval ind]  =  sort(-1*diag(eval));
[eval, ind]  =  sort(diag(eval), 'descend'); % Change
evec= evec(:, ind(1:100));

Finally for your test data:

B = bsxfun(@minus, test_data, train_mean);
Breconstruct = evec.' * B;

A word of advice - Use SVD

Using the eigenvectors for dimensionality reduction is known to be unstable - specifically when it comes to computing eigenvectors for high dimensional data such as what you have. It is advised that you use the Singular Value Decomposition (SVD) framework to do so instead. You can view this Cross Validated post on the relationship between the eigenvectors of the covariance matrix and using SVD for performing PCA:

https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca

As such, compute the SVD on the covariance matrix and the columns of V are the eigenvectors you need to perform the computation. The added benefit with SVD is the eigenvectors are already ordered based on their variance and so the first column of V would be the basis vector that points in the direction with the largest variance. As such, you don't need to do any sorting as you did with the eigenvectors.

Therefore, you would use this with the SVD:

Acov = (1/(cols-1))*(A*A.'); 
[U,S,V] = svd(Acov);
Areconstruct = V(:, 1:100).' * A;

For your testing data:

B = bsxfun(@minus, test_data, train_mean);
Breconstruct = V(:, 1:100).' * B;

Further Reading

You can take a look at my post on dimensionality reduction using eigenvectors and eigenvalues from the covariance matrix from my answer here: What does selecting the largest eigenvalues and eigenvectors in the covariance matrix mean in data analysis?

It also gives you a brief overview of why this operation is done to perform PCA or dimensionality reduction. However, I would highly advise you use SVD to do what you need. It is faster and more stable than using the eigenvectors of the covariance matrix.

Community
  • 1
  • 1
rayryeng
  • 96,704
  • 21
  • 166
  • 177
  • Many thanks rayryen for such a detailed answer. Also thanks for sparing time for this. I have tried this code. As "A" is 442368x100, so Acov=A*A.'; is giving me error, "Requested 442368x442368 (1458.0GB) array exceeds maximum array size preference. Creation of arrays greater than this limit may take a long time and cause MATLAB to become unresponsive." – Rafay Zia Mir Sep 26 '16 at 17:42
  • i think Acov=A*A.' should yield 100 by 100 matrix. Am i correct? – Rafay Zia Mir Sep 26 '16 at 17:44
  • @RafayZiaMir The covariance matrix is done on a per feature basis. You are doing it on a per image basis, so that's why you're getting 100 x 100. If you did it with `A.'*A`, you are now assuming that each feature is an image while each feature is now an example so you've basically swapped their definitions. It should be 442368 x 442368, but no one has this much memory available on conventional computers. If you want this to work, try reducing the number of features first by perhaps resizing your images before attempting to use PCA. However, if `A.'*A` is your intent, then so be it. – rayryeng Sep 26 '16 at 18:00
  • 2
    amazing post +1 – NKN Sep 26 '16 at 18:09
  • @NKN :D Wow I was not expecting to get 6 votes on this already lol. Thank you. It means a lot coming from a machine learning expert such as you. – rayryeng Sep 26 '16 at 18:12
  • @rayryeng thanks for your detailed answer and detailed comment. Now it has started making sense to me what you are saying. its an amazing post. – Rafay Zia Mir Sep 26 '16 at 18:43
  • @RafayZiaMir ah thank you :) BTW, I've also added one more addition. The covariance matrix needs to be scaled by the number of examples which I've now done. I didn't notice this until now. BTW, is there a way to reduce the total number of features first before hand? Are these simply images where the pixels are unrolled into long vectors? If they are, try scaling your images down by 75% or something manageable first. – rayryeng Sep 26 '16 at 18:45
  • yup, these are just images where pixel matrix is reshaped into log vectors. Yup i also have scaled them down and started training SVM classifier. Before i used HOG and LBP to compute features but ROC and PRC results(Area Under Curve) were not so good,so i thought of reducing the dimensions first. – Rafay Zia Mir Sep 26 '16 at 18:53
  • 1
    @RafayZiaMir OK, try scaling down the images first to make it manageable, then use whatever feature extractor you want on the reduced dimensions. It is also common practice to low-pass filter (Gaussian blur for example) the images after you downsample to avoid aliasing. BTW, if I may add something, consider using Convolutional Neural Networks to perform classification instead, as these simply take in raw pixels to perform the training. Extracting out features for a machine learning algorithm is what has been done in the past, but there are much better tools that provide better performance. – rayryeng Sep 26 '16 at 18:56