Computing the SVD of a rectangular matrix

Question

I have a matrix like M = K x N ,where k is 49152 and is the dimension of the problem and N is 52 and is the number of observations.

I have tried to use [U,S,V]=SVD(M) but doing this I get less memory space.

I found another code which uses [U,S,V]=SVD(COV(M)) and it works well. My questions are what is the meaning of using the COV(M) command inside the SVD and what is the meaning of the resultant [U,S,V]?

score 2 · Accepted Answer · edited May 23 '17 at 11:46

Finding the SVD of the covariance matrix is a method to perform Principal Components Analysis or PCA for short. I won't get into the mathematical details here, but PCA performs what is known as dimensionality reduction. If you like a more formal treatise on the subject, you can read up on my post about it here: What does selecting the largest eigenvalues and eigenvectors in the covariance matrix mean in data analysis?. However, simply put dimensionality reduction projects your data stored in the matrix M onto a lower dimensional surface with the least amount of projection error. In this matrix, we are assuming that each column is a feature or a dimension and each row is a data point. I suspect the reason why you are getting more memory occupied by applying the SVD on the actual data matrix M itself rather than the covariance matrix is because you have a significant amount of data points with a small amount of features. The covariance matrix finds the covariance between pairs of features. If M is a m x n matrix where m is the total number of data points and n is the total number of features, doing cov(M) would actually give you a n x n matrix, so you are applying SVD on a small amount of memory in comparison to M.

As for the meaning of U, S and V, for dimensionality reduction specifically, the columns of V are what are known as the principal components. The ordering of V is in such a way where the first column is the first axis of your data that describes the greatest amount of variability possible. As you start going to the second columns up to the nth column, you start to introduce more axes in your data and the variability starts to decrease. Eventually when you hit the nth column, you are essentially describing your data in its entirety without reducing any dimensions. The diagonal values of S denote what is called the variance explained which respect the same ordering as V. As you progress through the singular values, they tell you how much of the variability in your data is described by each corresponding principal component.

To perform the dimensionality reduction, you can either take U and multiply by S or take your data that is mean subtracted and multiply by V. In other words, supposing X is the matrix M where each column has its mean computed and the is subtracted from each column of M, the following relationship holds:

US = XV

To actually perform the final dimensionality reduction, you take either US or XV and retain the first k columns where k is the total amount of dimensions you want to retain. The value of k depends on your application, but many people choose k to be the total number of principal components that explains a certain percentage of your variability in your data.

For more information about the link between SVD and PCA, please see this post on Cross Validated: https://stats.stackexchange.com/q/134282/86678

@kaleem You're welcome. If you require no more help, please consider accepting my answer. That can be done by clicking on the checkmark icon at the top of my post, to the left below the up and down arrow buttons. Thanks and good luck! — rayryeng, Jul 18 '16 at 21:35
please answer this question f=sin(2*pi*1000/fs*n)+sin(2*pi*2000/fs*n)+sin(2*pi*3000/fs*n)+sin(2*pi*4000/fs*n); where fs=9000 and n=[1:9000]. My first question is in this particular case how to make the signal bandlimited and second what is the significance of the length of n. — kaleem, Jul 22 '16 at 03:00
I don't understand your question. The signal is theoretically band limited. The bandwidth is the sinusoid with the largest frequency. The significance of n allows you to generate the output signal. I suggest you read a MATLAB tutorial before asking more questions. — rayryeng, Jul 22 '16 at 04:14

score 0 · Answer 2 · edited May 23 '17 at 11:44

Instead of [U, S, V] = svd(M), which tries to build a matrix U that is 49152 by 49152 (= 18 GB !), do svd(M, 'econ'). That returns the “economy-class” SVD, where U will be 52 by 52, S is 52 by 52, and V is also 52 by 52.

cov(M) will remove each dimension’s mean and evaluate the inner product, giving you a 52 by 52 covariance matrix. You can implement your own version of cov, called mycov, as

function [C] = mycov(M)
  M = bsxfun(@minus, M, mean(M, 1)); % subtract each dimension’s mean over all observations
  C = M' * M / size(M, 1);

(You can verify this works by looking at mycov(randn(49152, 52)), which should be close to eye(52), since each element of that array is IID-Gaussian.)

There’s a lot of magical linear algebraic properties and relationships between the SVD and EVD (i.e., singular value vs eigenvalue decompositions): because the covariance matrix cov(M) is a Hermitian matrix, it’s left- and right-singular vectors are the same, and in fact also cov(M)’s eigenvectors. Furthermore, cov(M)’s singular values are also its eigenvalues: so svd(cov(M)) is just an expensive way to get eig(cov(M)) , up to ±1 and reordering.

As @rayryeng explains at length, usually people look at svd(M, 'econ') because they want eig(cov(M)) without needing to evaluate cov(M), because you never want to compute cov(M): it’s numerically unstable. I recently wrote an answer that showed, in Python, how to compute eig(cov(M)) using svd(M2, 'econ'), where M2 is the 0-mean version of M, used in the practical application of color-to-grayscale mapping, which might help you get more context.

You can change your `bsxfun` call to use `@minus` instead of `@plus`. It's not essential, but to me using `@plus` with the negative of the mean seems rather odd. — rayryeng, Jul 18 '16 at 12:18
@rayryeng I know what you mean—I started using `@plus` and `@times` all the time because I did’t want to keep track of which direction `@minus` and `@rdivide` and `@ldivide` worked . I will amend the answer to `minus` to encourage readers to be less lazy than me. It’s also faster since it avoids creating yet another intermediary array. — Ahmed Fasih, Jul 18 '16 at 12:28

Computing the SVD of a rectangular matrix

2 Answers2