7

I want to calculate Pearson's correlation coefficent in Matlab (without using Matlab's corr function).

Simply, I have two vectors A and B (each of them is 1x100) and I am trying to calculate the Pearson's coefficient like this:

P = cov(x, y)/std(x, 1)std(y,1)

I am using Matlab's cov and std functions. What I don't get is, the cov function returns me a square matrix like this:

corrAB =
    0.8000    0.2000
    0.2000    4.8000

But I expect a single number as the covariance so I can come up with a single P (pearson's coefficient) number. What is the point I'm missing?

Ramala
  • 285
  • 2
  • 4
  • 7
  • Do you mean `P = cov(x,y)/sqrt(var(x)*var(y));`? The diagonal should be 1. The off diagonal is what you want. – Rich C Apr 13 '11 at 12:16
  • you are right, I updated the question. Is the "off diagonal" in above example are 0.2000 and 0.2000? So should I do another calculation with them or just go with 0.2? – Ramala Apr 13 '11 at 13:17
  • In you're example, 0.2 is the off diagonal. However, the 0.8 and 4.8 should both be 1. So something is wrong with your calc. Just do corr(x,y) to check. Read the help to understand why it returns a matrix. It was unexpected to me the first time also. – Rich C Apr 13 '11 at 16:16
  • My arrays are like: x =[4 5 5 3 5], y = [4 4 0 0 0]. Maybe because of that, there are values like 4.8. I'll read the docs, thanks. – Ramala Apr 13 '11 at 16:28
  • @RichC: the diagonals need not be 1. The will be 1 only if the variances of both samples are exactly the same. – abcd Apr 13 '11 at 17:48
  • @yoda: you're right. I was thinking P was the correlation matrix, but only the off diagonal elements are correct. The diagonal elements are nonsense. – Rich C Apr 13 '11 at 21:49
  • @RichC: the diagonal elements are not nonsense... they are the variances of `x` and `y` :) – abcd Apr 13 '11 at 23:38
  • @yoda: the diagonals of P as defined above are nonsense. – Rich C Apr 14 '11 at 19:24
  • @RichC: There's some confusion here. The matrix output, `corrAB` that Ramala gave in the question is correct, and the diagonals are the variances. As for the matrix `P` that he defined (denominator needs to be enclosed in parenthesis), the diagonals are `sigma_x/sigma_y` and `sigma_y/sigma_x` respectively. Still not nonsense, as its a direct measure of how much the deviation in one sample is, compared to the other. – abcd Apr 15 '11 at 00:05

2 Answers2

10

I think you're just confused with covariance and covariance matrix, and the mathematical notation and MATLAB's function inputs do look similar. In math, cov(x,y) means the covariance of the two variables x and y. In MATLAB, cov(x,y) calculates the covariance matrix of x and y. Here cov is a function and x and y are the inputs.

Just to make it clearer, let me denote the covariance by C. MATLAB's cov(x,y) returns a matrix of the form

C_xx    C_xy
C_yx    C_yy

As RichC pointed out, you need the off-diagonals, C_xy (note that C_xy=C_yx for real variables x and y). A MATLAB script that gives you the Pearson's coefficient for two variables x and y, is:

C=cov(x,y);
p=C(2)/(std(x)*std(y));
abcd
  • 40,177
  • 7
  • 71
  • 97
2

From the docs:

cov(X,Y), where X and Y are matrices with the same number of elements, is equivalent to cov([X(:) Y(:)]).

use:

C = cov(X,Y);
coeff = C(1,2) / sqrt(C(1,1) * C(2,2))
  • Is the "coeff" variable is Pearson coefficient? or you meant covariance? Because in the coefficient formula, I need to divide the covariance by standart deviations of X and Y. – Ramala Apr 13 '11 at 13:19