Dimension Reduction

Question

I'm trying to reduce a high-dimension dataset to 2-D. However, I don't have access to the whole dataset upfront. So, I'd like to generate a function that takes an N-dimensional vector and returns a 2-dimensional vector, such that if I give it to vectors that are close in N-dimensional space, the results are close in 2-dimensional space.

I thought SVD was the answer I needed, but I can't make it work.

For simplicity, let N=3 and suppose I have 15 datapoints. If I have all the data upfront in a 15x3 matrix X, then:

[U, S, V] = svd(X);
s = S; %s is a the reduced version of S, since matlab is case-sensitive.
s(3:end,3:end)=0;
Y=U*s;
Y=Y(1:2,:);

does what I want. But suppose I get a new datapoint, A, a 1x3 vector. Is there a way to use U, S, or V to turn A into the appropriate 1x2 vector?

If SVD is a lost cause, can someone tell me what I should be doing instead?

Note: This is Matlab code, but I don't care if the answer is C, Java, or just math. If you can't read Matlab, ask and I'll clarify.

Ugh, s vs. S really tricks the eye. ;) – Alex Feinman Oct 08 '09 at 17:02 — Alex Feinman, Oct 08 '09 at 17:02

score 3 · Accepted Answer · edited May 23 '17 at 12:29

3

SVD is a fine approach (probably). LSA (Latent Semantic Analysis) is based around it, and has basically the same dimensionality approach. I've talked about that (at length) at: lsa-latent-semantic-analysis-how-to-code-it-in-php or check out the LSA tag here on SO.

I realize it's an incomplete answer. Holler if you want more help!

edited May 23 '17 at 12:29

Community

1
1

answered Oct 08 '09 at 15:53

Gregg Lind

18,936
15
63
80

Thanks, that was helpful. In order to turn U into U', do I simply truncate everything after the second column, or is it fancier than that? – PlexLuthor Oct 08 '09 at 15:57
I'm pretty sure it's exactly that simple (assuming matlab orders the columns such that the cols and eigenvals correspond) – Gregg Lind Oct 08 '09 at 16:07
Ok. I just played around with it in the way I thought you said it would work, but I still can't take new 3-d data and get the 2-d projection without recalculating the whole UxSxV set. Did I miss something in LSA? That is, I have X (15x3), U, S, V, U', S', V', and now I get A (1x3). What should I do to get a 1x2 version of A? – PlexLuthor Oct 08 '09 at 16:40
Duh, divide by V* is what I was looking for. I don't know why I missed that earlier. – PlexLuthor Oct 08 '09 at 17:00
It sounds like you have it quite well in hand :) I can never remember the exact formulae, so I just noodle around until I get the right size end matrix, just as you are! – Gregg Lind Oct 08 '09 at 18:03

score 2 · Answer 2 · answered Oct 09 '09 at 01:08

% generate some random data (each row is a d-dimensional datapoint)
%data = rand(200, 4);
load fisheriris
data = meas;        % 150 instances of 4-dim

% center data
X = bsxfun(@minus, data, mean(data));

% SVD
[U S V] = svd(X, 'econ');       % X = U*S*V''

% lets keep k-components so that 95% of the data variance is explained
variances = diag(S).^2 / (size(X,1)-1);
varExplained = 100 * variances./sum(variances);
index = 1+sum(~(cumsum(varExplained)>95));

% projected data = X*V = U*S
newX = X * V(:,1:index);
biplot(V(:,1:index), 'scores',newX, 'varlabels',{'d1' 'd2' 'd3' 'd4'});

% mapping function (x is a row vector, or a matrix with multiple rows vectors)
mapFunc = @(x) x * V(:,1:index);
mapFunc([1 2 3 4])

score 0 · Answer 3 · answered Oct 09 '09 at 01:23

0

I don't think there's a built-in way to update an existing SVD within Matlab. I google'd for "SVD update" and found this paper among the many results.

answered Oct 09 '09 at 01:23

Victor Liu

3,464
1
21
33

Dimension Reduction

3 Answers3