4

I'm trying to reduce a high-dimension dataset to 2-D. However, I don't have access to the whole dataset upfront. So, I'd like to generate a function that takes an N-dimensional vector and returns a 2-dimensional vector, such that if I give it to vectors that are close in N-dimensional space, the results are close in 2-dimensional space.

I thought SVD was the answer I needed, but I can't make it work.

For simplicity, let N=3 and suppose I have 15 datapoints. If I have all the data upfront in a 15x3 matrix X, then:

[U, S, V] = svd(X);
s = S; %s is a the reduced version of S, since matlab is case-sensitive.
s(3:end,3:end)=0;
Y=U*s;
Y=Y(1:2,:);

does what I want. But suppose I get a new datapoint, A, a 1x3 vector. Is there a way to use U, S, or V to turn A into the appropriate 1x2 vector?

If SVD is a lost cause, can someone tell me what I should be doing instead?

Note: This is Matlab code, but I don't care if the answer is C, Java, or just math. If you can't read Matlab, ask and I'll clarify.

PlexLuthor
  • 568
  • 2
  • 7
  • 16

3 Answers3

3

SVD is a fine approach (probably). LSA (Latent Semantic Analysis) is based around it, and has basically the same dimensionality approach. I've talked about that (at length) at: lsa-latent-semantic-analysis-how-to-code-it-in-php or check out the LSA tag here on SO.

I realize it's an incomplete answer. Holler if you want more help!

Community
  • 1
  • 1
Gregg Lind
  • 18,936
  • 15
  • 63
  • 80
  • Thanks, that was helpful. In order to turn U into U', do I simply truncate everything after the second column, or is it fancier than that? – PlexLuthor Oct 08 '09 at 15:57
  • I'm pretty sure it's exactly that simple (assuming matlab orders the columns such that the cols and eigenvals correspond) – Gregg Lind Oct 08 '09 at 16:07
  • Ok. I just played around with it in the way I thought you said it would work, but I still can't take new 3-d data and get the 2-d projection without recalculating the whole UxSxV set. Did I miss something in LSA? That is, I have X (15x3), U, S, V, U', S', V', and now I get A (1x3). What should I do to get a 1x2 version of A? – PlexLuthor Oct 08 '09 at 16:40
  • Duh, divide by V* is what I was looking for. I don't know why I missed that earlier. – PlexLuthor Oct 08 '09 at 17:00
  • It sounds like you have it quite well in hand :) I can never remember the exact formulae, so I just noodle around until I get the right size end matrix, just as you are! – Gregg Lind Oct 08 '09 at 18:03
2
% generate some random data (each row is a d-dimensional datapoint)
%data = rand(200, 4);
load fisheriris
data = meas;        % 150 instances of 4-dim

% center data
X = bsxfun(@minus, data, mean(data));

% SVD
[U S V] = svd(X, 'econ');       % X = U*S*V''

% lets keep k-components so that 95% of the data variance is explained
variances = diag(S).^2 / (size(X,1)-1);
varExplained = 100 * variances./sum(variances);
index = 1+sum(~(cumsum(varExplained)>95));

% projected data = X*V = U*S
newX = X * V(:,1:index);
biplot(V(:,1:index), 'scores',newX, 'varlabels',{'d1' 'd2' 'd3' 'd4'});

% mapping function (x is a row vector, or a matrix with multiple rows vectors)
mapFunc = @(x) x * V(:,1:index);
mapFunc([1 2 3 4])
Amro
  • 121,265
  • 25
  • 232
  • 431
0

I don't think there's a built-in way to update an existing SVD within Matlab. I google'd for "SVD update" and found this paper among the many results.

Victor Liu
  • 3,464
  • 1
  • 21
  • 33