I have N sensors which are been sampled M times, so I have an N by M readout matrix. If I want to know the relation and dependencies of these sensors simplest thing is to do a Pearson's correlation which gives me an N by N correlation matrix. Now let's say if I have the correlation matrix but I want to explore tho possible readout space that can lead to such correlation matrix what can I do? So the question is: given an N by N correlation matrix how you can get to a matrix that would have such correlation matrix? Any comment is appreciated.

judging from the discussion below, I suspect that you might get additional help from asking your question in stats.stackexchange.com. Just state your problem a bit clearer, for what purpose you want to explore the readout space? – mpiktas Jan 16 '11 at 07:20

Thanks for the suggestion, but I am getting great help here. – Ali Jan 16 '11 at 18:20

Thanks for the suggestion, but I am getting great help here. In fact I want to know if certain structures in the correlation matrix is possible with real world values in the readings! This is a hypothesis driven kinda of problem. The sensors are actual biological neurons in a real brain and the read outs are their spike generation rate (firing rate, number of spikes/second). There are theories with partial empirical evidence support that claim certain observations result from specific correlation structure between neural responses. Correlation > neural responses > biological plausibility – Ali Jan 16 '11 at 18:30

1What about just sampling from a Gaussian with given covariance matrix? Covariance matrix of the resulting dataset will be close to what you want – Yaroslav Bulatov Jan 16 '11 at 21:09

This could be very interesting! What is a "Gaussian with a given covarience matrix"? How can I generate one? Is it the same as drawing samples from a 2D Gaussian pdf and multiplying it with the target covarience matrix? [Hey, I can hear some of you laughing, but don't! this happens when you don't take math after high school] – Ali Jan 17 '11 at 11:50
4 Answers
You can do a choleskydecomposition, such that if your correlationmatrix is R then if
$ L*transpose(L) = R $
then $ L = cholesky(R)$
Here L is triangular of size N by N. Now you could also understand L as a simple rotation of your original readoutmatrix where the M columns are rotated to fit into N (<=M ) columns/vectorspace.
So you may apply any arbitrary rotation to the columns of L, which may arbitrarily be extended to some M columns by appending nullcolumns.
You may also apply a "stretching" of the rows in that reconstructed "readout"matrix by multiplication of each row by some standarddeviation since the Lmatrix represents rows with norm 1.
(If you want you can download my (free) MatMateprogram for matrixvisualization and see with a simple script what I mean here/how this works. Here is the download page .)
[update] Here I give an example. After reading more comments I understand, that the goal was to construct a datamatrix given a correlationmatrix. In the example I constructed a random correlationmatrix 4 by 4. Then I produced data and then I check, whether the data, corrleated, reproduce the correlationmatrix. They do, and because the matrixalgebra is simple we can also see that this is exact.
;===========================================
;MatMateListing vom:16.01.2011 18:33:13
;============================================
[1] cor = covtocorr(cov) // this is the correlationmatrix from a given covariance.
cor :
1.00 0.38 0.22 0.85
0.38 1.00 0.30 0.57
0.22 0.30 1.00 0.27
0.85 0.57 0.27 1.00
[2] lad = cholesky(cor) // this provides a "loadings"matrix by choleskydecomposition
// of matrix cor
lad :
1.00 0.00 0.00 0.00
0.38 0.93 0.00 0.00
0.22 0.42 0.88 0.00
0.85 0.27 0.39 0.25
Now we want to construct 200 columns of data. As I wrote earlier, the cholesky
matrix L of a given correlationmatrix R is just a rotation of the datamatrix
One should add, that the data must be centered (mean=0) and normed(stddev=1)
Now I do the inverse: I construct a valid rotationmatrix t by just rotating a
dummy randommatrix of 200 columns to the first column(rotation method:"triangular").
[3] n=200
[4] dummy = randomu(1,n)
[5] t = gettrans(dummy,"tri")
[6] hide t,dummy
So I've got a valid rotationmatrix t.
Now I provide columns to rotate the cholesky/loadingsmatrix within
and insert the cholesky matrix in the first 4 columns
[7] data = (lad  null(v,nv)) * sqrt(n)
Then I rotate that datamatrix with that random rotation
[8] data = data * t
[9] disp = data[*,1..10] // show the first 10 columns of the created dataset
disp :
1.71 0.11 0.20 0.05 0.06 0.21 0.16 0.14 0.11 0.21
0.23 13.12 0.08 0.02 0.02 0.08 0.06 0.05 0.04 0.08
2.22 5.77 12.34 0.01 0.01 0.05 0.03 0.03 0.02 0.05
0.46 3.83 5.63 3.52 0.05 0.18 0.13 0.12 0.09 0.17
Check whether that data really give the correlationmatrix
[10] chk_cor = data * data' /n
chk_cor :
1.00 0.38 0.22 0.85
0.38 1.00 0.30 0.57
0.22 0.30 1.00 0.27
0.85 0.57 0.27 1.00
We see, that the correlationmatrix is reproduced.
The "mystery" here is simply, that by the definition of correlation (of normed data, to make the example short)
R = data * data' / n
here data may have arbitrary many columns
and t*t' is identitymatrix, so
R = data * t * t' * data' / n
R = (data*t) * (data*t)' / n
and the choleskyfactorization is just the operation to get L and L' from a correlationmatrix
L = cholesky(R) ==> L*L' = R
(Thus the result is also exact)
[end update]
[update 2] In response to the remark of mpiktas I'll add a variant which allows to set the standarddeviation and the means in the generated data explicitely.
The arguing in the first update was to show, how the data form a mdimensional vectorspace, and the choleskydecomposition of the according correlationmatrix can simply be understood as a special rotation of the data in that vectorspace.
However, means of data generated the way I described above are not guaranteed to be zero. Soe here is a variant which produces means=zero, stddevs=1 and the data are then scalable to other stddevs and translatable to prediefined means.
Here the idea is to use the Lmatrix as "recipe for composition" of uncorrelated "factors" as understood in factor analysis/PCA. Having a centered and normed matrix of data D it is assumed that the choleskymatrix L describes a linear combination of uncorrelated data of unitvariance ("factors") in a matrix F with the same number of columns as D such that
L * F = D
Now, some random F can be created by a randomgenerator providing normal distribution with mean=0 and stddev=1 per row. If the rows in F were truely uncorrelated, we had by the above composition L * F = D a simple solution. After that we could scale the rows of D by standarddeviations and also translate to predefined means.
Unfortunately randomgenerators do not exactly give uncorrelated data, so the correlations in F must be removed first. This is possible by the inverse of the choleskymatrix of the correlations of F. Here is a script pseudocode
v = 4 // set number of variables/of rows, make a small example
n = 200 // set number of observations/ of columns
F = randomn(v,n,0,1) // generate randomdata in F where rows have mean=0, stddev=1
covF = F * F' /n
cholF = cholesky(covF)
L = cholesky( R ) // R is the targetted correlationmatrix
data = L * inv(cholF) * F
sdevs = diag([1,2,3,4]) // provide four stddevs
means = colvector([20,10,40,30]) // provide four means
simuldata = sdevs*data + means // here the meansvector must be expanded to
// be added to all columns in simuldata
The simulated data are in the matrix simuldata.
While we can set stddevs and means explicitely there is still one problem pertaining: by the leftmultiplication of inv(cholF) and L the normaldistribution in simuldata as originally provided by the randomgenerator in F may be affected/distorted a bit. I don't have an idea yet how to solve this...
 32,738
 3
 60
 134

Thank you very much for the explanation. I didn't know about Cholesky decomposition. Two questions: 1 Is it the same as gutting the square root of the correlation matrix and multiplying it by a random N by M matrix? 2 Are there other classes of answers to this question, I mean at the end I want to explore the read out space given the correlation matrix (as a constraint) does this method covers all the possible locations in that space that can satisfy this constraint? [sorry I don't have a good math background and I might be using the wrong terminology] – Ali Jan 15 '11 at 14:21

No to 1. Cholesky finds **L** such that **L * L' = R** where **L'** is the transpose. The squareroot of **R** would be **M** with **M * M = R** so in general **L <> M** – Gottfried Helms Jan 15 '11 at 15:14

It should be noted that if you have only correlation matrix you lose information about means and standard deviations of your original variables. This might be important. – mpiktas Jan 15 '11 at 18:16

Gottfried, but a correlation matrix is the same as its transpose isn't it? so here these are the same. You are right generally these two are different. – Ali Jan 16 '11 at 00:01

mpiktas, you are right. But from this space has other constraints that I didn't mention here so that I know the means and variances (also each sensor reading comes from a Poisson distribution, or a normal distribution depending on some other factors). – Ali Jan 16 '11 at 00:04

It should be noted that having matrix $L$ i solves the problem as it is stated in OP question only partially. The rotated original data can be useless, if we want to calculate some statistic from original data which is not rotation invariant. – mpiktas Jan 16 '11 at 19:02

@mpiktas: true. But there was no additional restriction yet, so I think a general solution/a general hint would be appropriate at this stage of the analysis of the OP's problem ... – Gottfried Helms Jan 16 '11 at 19:33

@Gottfried, yes it is appropriate, but since it is a not complete solution, it should be mentioned as so. – mpiktas Jan 16 '11 at 19:45

@mpiktas: true, I was a bit sloppy there. I'll edit the answer later and supply a similar solution where the sdev and mean is adjustable. – Gottfried Helms Jan 16 '11 at 21:44
What you want to do is called sampling from a multivariate distribution. It's not hard to look up how to do this, although in your case if some of your variables are assumed normal and some Poisson you're going to have to calculate the joint PDF of your model yourself. Rather than trying to explain the whole process here, I'll give you some things to read:
http://en.wikipedia.org/wiki/Multivariate_normal_distribution
EDIT: Had to remove some dead links that aren't archived. Anyway, just google "sampling from a multivariate normal distribution" or something to that effect.
 23,542
 3
 41
 105

I don't see how this helps get observation matrix matching given covariance matrix – Yaroslav Bulatov Jan 16 '11 at 08:13

He can just take M sample vectors. Of course the sample covariance won't match the true covariance precisely, but it's clear from context that what he actually wants to do is sample his model, not compute the fiber over a given covariance matrix as he says. (And for that matter, none of the solutions proposed here does that, either.) – Daniel McLaury Jan 16 '11 at 08:19

ok, I see, it should be close to target covariance for high enough $n$ – Yaroslav Bulatov Jan 16 '11 at 08:29

Correct, but more to the point I think sampling is actually what he wants, and talking about the fiber over a given covariance matrix is his first attempt at describing this. – Daniel McLaury Jan 16 '11 at 17:43
You could make your "10%" implementation faster by using gradient descent
Here's an example of doing it for covariance matrix because it's easier.
You have $k\times k$ covariance matrix C and you want to get $k \times n$ observation matrix $A$ that produces it.
The task is to find $A$ such that
$$A A' = n^2 C$$
You could start with a random guess for $A$ and try to minimize the sum of squared errors. Using Frobenius norm, we can write this objective as follows
$$J=\A A'n^2 C\^2_F$$
Let $D$ be our matrix of errors, ie $D=A A'n^2 C$. Checking with the Matrix Cookbook I get the following for the gradient
$$\frac{\partial J}{\partial a_{iy}} \propto a_{iy} \sum_j D_{i,j}$$
In other words, your update to data matrix for sensor $i$, observation $y$ should be proportional to the sum of errors in $i$th row of covariance matrix multiplied by the value of $i,y$'th observation.
Gradient descent step would be to update all weights at once. It might be more robust to update just the worst rows, ie calculate sum of errors for every row, update entries corresponding to the worst rows, then recalculate errors.
 4,108
 1
 19
 38

But $A$ is not uniquely defined, meaning there is infinitely many matrices $A$ which satisfy the equation $AA'=n^2C$. – mpiktas Jan 16 '11 at 18:57

May I tell you guys my silly solution to this problem? Only if you won't laugh at me! I thought I need to explore this space so I want to randomly sample it (the space of sensor readings). I know what the correlation matrix should look like and I know that the sensor readings are coming from a Gaussian distribution. So I generated a random N by M matrix and started tweaking the values in small steps. and check if each change moves the matriv toward the target or away and kept the changes toward the target. So I choose a random cell in this matrix and increase it by 10% and calculate the correlation matrix and compare it to the target correlation matrix the difference is smaller than what it was before the 10% increase I keep the change and move to next randomly selected cell and continue until I get close enough to the target correlation matrix. This method, although it is silly, works well and I can get different samples of the sensor reading space. What you guys think?! In practice I am working on rather large matrices like N = 8000, M = 1000
 1,531
 4
 16
 17

2That's essentially how statistical models are fit, with the exception that they are more smart about "10%" increase – Yaroslav Bulatov Jan 16 '11 at 06:54

I didn't know that, that's good to know. By the way I was a bit smarter than just going 10% but not that smart! – Ali Jan 16 '11 at 18:17