46

I know that for the $2$-dimensional case: given a correlation $\rho$ you can generate the first and second values, $ X_1 $ and $X_2$, from the standard normal distribution. Then from there make $X_3$ a linear combination of the two $X_3 = \rho X_1 + \sqrt{1-\rho^2}\,X_2$ then take $$ Y_1 = \mu_1 + \sigma_1 X_1, \quad Y_2 = \mu_2 + \sigma_2 X_3$$

So that now $Y_1$ and $Y_2$ have correlation $\rho$.

How would this be scaled to $n$ variables? With the condition that the end variables satisfy a given correlation matrix? I'm guessing at least n variables will need to be generated then a reassignment through a linear combination of them all will be required... but I'm not sure how to approach it.

jameselmore
  • 5,038
  • 3
  • 24
  • 36
  • If you happen to be using Matlab, see [my function here](https://github.com/horchler/SDETools/blob/master/SDETools/sde_correlate.m) and/or if you have the Statistics Toolbox, see the [mvnrnd](http://www.mathworks.com/help/stats/mvnrnd.html) function. – horchler Jul 17 '13 at 21:09
  • @horchler Actually using R, almost done writing the script... but I'll check yours out seeing as I'm getting a close but not correct result. – jameselmore Jul 17 '13 at 21:24
  • R doesn't appear to have a builtin [`cholcov`](http://www.mathworks.com/help/stats/cholcov.html) function (just `chol`) so you'll just need to make sure that you actually use correlation matrices (ones on the diagonal) rather than covariance matrices to meet the positive semi-definite criterion required for Cholesky decomposition. You can use R's [`cov2cor`](http://rfunction.com/archives/851) to convert if needed. – horchler Jul 17 '13 at 21:38
  • I couldn't find the appropriate library so I started making my own algorithm based on the link provided by @JosephK in his answer – jameselmore Jul 17 '13 at 21:40

2 Answers2

50

If you need to generate $n$ correlated Gaussian distributed random variables $$ \bf Y \sim \mathcal N(\bf \mu, \Sigma) $$ where $\textbf{Y} = (Y_1,\dots,Y_n)$ is the vector you want to simulate, $\mu =(\mu_1,\dots, \mu_n)$ the vector of means and $\Sigma$ the given covariance matrix,

  1. you first need to simulate a vector of uncorrelated Gaussian random variables, $\bf Z $
  2. then find a square root of $\Sigma$, i.e. a matrix $\bf C$ such that $\bf C \bf C^\intercal = \Sigma$.

Your target vector is given by $$ \bf Y = \bf \mu + \bf C \bf Z. $$

A popular choice to calculate $\bf C$ is the Cholesky decomposition.

JosephK
  • 743
  • 6
  • 9
  • 2
    Exactly what I was looking for. Thanks a lot – jameselmore Jul 17 '13 at 20:39
  • 1
    It's not altogether clear that this is correct: The question says "the correlation between the observed outcomes will be the same as in the matrix". If you choose from a multivariate normal with a certain correlation, generally the sample correlation will not equal the population correlation. If the idea is to make the _sample_ correlation equal to the specified value, then one is sampling from the _conditional_ distribution given that value. That can be done; I've written an algorithm for it in case $n=2$. – Michael Hardy Jul 17 '13 at 21:45
  • 2
    @MichaelHardy Where is this algorithm? Also, I'm primarily interested in cases for larger $n$, but most specifically I care about the case where $n=3$. – jameselmore Jul 18 '13 at 14:54
  • 2
    All clear, but the statement "then find a square root of Σ, i.e. a matrix C such that CC⊺=Σ" may be confusing as C is not strictly the square root of Σ – dayum Sep 10 '18 at 02:05
3

$\newcommand{\Cov}{\mathrm{Cov}}$

I wanted to add that comment to the previous answer :

Why is it true ?

Define $Y = \mu + C Z$, and recall that each component of $Z$ is i.i.d. normal :

$$ \Cov( Y ) = \Cov(CZ + \mu ) = \Cov(CZ ) = C \cdot \Cov (Z) \cdot C^T = C I C^T = \Sigma $$ which is the desired result : $$ \implies Y \sim \mathcal N (\mu, \Sigma) $$

Sam
  • 877
  • 3
  • 14
Marine Galantin
  • 2,775
  • 1
  • 8
  • 27