Questions tagged [pearson]

in statistics, Pearson's r, the Pearson product moment correlation coefficient, shows the extent of a linear relationship between two data sets on a scale from -1 to 1.

Overview

Pearson product-moment correlation coefficient is given by the following equation:

enter image description here

where,

pXY = Pearson’s correlation coefficient;
Cov(X,Y) = covariance of random variables X and Y;
Var(X) = variance of random variable X;
Var(Y) = variance of random variable Y;


Tag usage

Questions on tag should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

145 questions
38
votes
1 answer

ValueError: shape mismatch: objects cannot be broadcast to a single shape

I am using the SciPy's pearsonr(x,y) method and I cannot figure out why the following error is happening: ValueError: shape mismatch: objects cannot be broadcast to a single shape It computes the first two (I am running several thousand of these…
Alex Brashear
  • 726
  • 3
  • 8
  • 14
11
votes
2 answers

Scipy: Pearson's correlation always returning 1

I am using Python library scipy to calculate Pearson's correlation for two float arrays. The returned value for coefficient is always 1.0, even if the arrays are different. For example: [-0.65499887 2.34644428] [-1.46049758 3.86537321] I am…
user2291379
  • 113
  • 1
  • 1
  • 4
8
votes
2 answers

Dropping 'nan' with Pearson's r in scipy/pandas

Quick question: Is there a way to use 'dropna' with the Pearson's r function in scipy? I'm using it in conjunction with pandas, and some of my data has holes in it. I know you used to be able suppress 'nan' with Spearman's r in older versions of…
Lodore66
  • 889
  • 2
  • 11
  • 28
7
votes
2 answers

Pearson's Coefficient and Covariance calculation in Matlab

I want to calculate Pearson's correlation coefficent in Matlab (without using Matlab's corr function). Simply, I have two vectors A and B (each of them is 1x100) and I am trying to calculate the Pearson's coefficient like this: P = cov(x, y)/std(x,…
Ramala
  • 285
  • 2
  • 4
  • 7
7
votes
0 answers

Minimal p-value for scipy.stats.pearsonr

I am running scipy.stats.pearsonr on my data, and I get (0.9672434106763087, 0.0) It is reasonable that the r-value is high and the p-value is very low. However, p is obviously not 0, so I would like to know what p=0.0 means. Is it p<10^-10,…
7
votes
4 answers

How do you compute the confidence interval for Pearson's r in Python?

In Python, I know how to calculate r and associated p-value using scipy.stats.pearsonr, but I'm unable to find a way to calculate the confidence interval of r. How is this done? Thanks for any help :)
pixelphantom
  • 501
  • 1
  • 5
  • 16
6
votes
1 answer

Why Pearson correlation output is NaN?

I'm trying to get the Pearson correlation coefficient between to variables in R. This is the scatterplot of the variables: ggplot(results_summary, aes(x =D_in, y = D_ex)) + geom_point(col=ifelse(results_summary$FDR < 0.05,…
Geparada
  • 2,499
  • 6
  • 24
  • 39
6
votes
4 answers

Collaborative Filtering Program: What to do for a Pearson Score When There Isn't Enough Data

I'm building a recommendation engine using collaborative filtering. For similarity scores, I use a Pearson correlation. This is great most of the time, but sometimes I have users that only share a 1 or 2 fields. For example: User 1{ a: 4 b:…
6
votes
1 answer

How is NaN handled in Pearson correlation user-user similarity matrix in a recommender system?

I am generating a user-user similarity matrix from a user-rating data (particularly MovieLens100K data). Computing correlation leads to some NaN values. I have tested in a smaller dataset: User-Item rating matrix I1 I2 I3 I4 U1 4 0 5 5 U2 4 …
phoxis
  • 52,327
  • 12
  • 74
  • 110
5
votes
1 answer

cor() behavior in R different between individual vectors and data.frame

i'm trying to get the Pearson correlation coefficient for all rows in a data frame relative to each other. there are values that are empty (NA) and this seems to be presenting a problem that I don't encounter when running cor() on 2 vectors with…
hawkhandler
  • 243
  • 1
  • 4
  • 9
5
votes
4 answers

What is wrong with the pearson algorithm from “Programming Collective Intelligence”?

This function is from the book "Programming Collective Intelligence”, and is supposed to calculate the Pearson correlation coefficient for p1 and p2, which is supposed to be a number between -1 and 1. If two critics rate items very similarly the…
Hobhouse
  • 13,237
  • 12
  • 33
  • 42
5
votes
1 answer

Approximate the distribution of a sum of binomial random variables in R

My goal is approximate the distribution of a sum of binomial variables. I use the following paper The Distribution of a Sum of Binomial Random Variables by Ken Butler and Michael Stephens. I want to write an R script to find Pearson approximation to…
4
votes
1 answer

Pearson correlation on big numpy matrices

I have a 24000 * 316 numpy matrix, each row represents a time series with 316 time points, and I am computing pearson correlation between each pair of these time series. Meaning as a result I would have a 24000 * 24000 numpy matrix having pearson…
mersa
  • 75
  • 1
  • 9
4
votes
1 answer

What's the difference between Pearson correlation similarity and adjust cosine similarity?

While they are very similar, I am sure there is some difference between Pearson correlation similarity and adjust cosine similarity, because all the papers and web pages divide them into two different kinds. However none of them provide a clear…
4
votes
1 answer

How to normalize Pearson Correlation between 0 and 1?

I came across the formula for the Pearson Correlation but it gives values between -1 and 1. How would I modify the formula so that it gives values between 0 and 1?
covfefe
  • 1,819
  • 4
  • 26
  • 64
1
2 3
9 10