Correlate two dataframes while having lots of NAs

Question

Supposedly I have two dataframes, which include a number of NAs:

DF1=data.frame(a=c(20,70,50,NA),b=c(40,90,30,20),c=c(60,110,NA,40))
DF2=data.frame(e=c(200,700,NA,400),f=c(400,900,500,200),g=c(600,1100,NA,700))

I'd still like to correlate the rows in those two, while giving out NA if one of the values in a correlated pair is already a NA. Tried a for-loop:

for (M in 1:nrow(DF1)) {
  Test=cor(DF1[M,],DF2[M,],use="na.or.complete") 
  print(Test)
}

... which gave me this:

  e  f  g
a NA NA NA
b NA NA NA
c NA NA NA
   e  f  g
a NA NA NA
b NA NA NA
c NA NA NA
   e  f  g
a NA NA NA
b NA NA NA
c NA NA NA
   e  f  g
a NA NA NA
b NA NA NA
c NA NA NA

What am I doing wrong?

try every option of `use =` in `?cor`. Take the one you're happy with. — Andre Elrico, Oct 26 '18 at 10:12

AkselA · Accepted Answer · 2018-10-26T16:59:34.953

1

It's usually much more straight forward to do operations like this on columns, so we'll transpose the data.frames and switch the dimensions in the loop.

DF1 <- t(DF1)
DF2 <- t(DF2)

for (M in 1:ncol(DF1)) {
  Test=cor(DF1[,M], DF2[,M], use="na.or.complete") 
  print(Test)
}

Or using sapply()

sapply(1:ncol(DF1), function(x) cor(DF1[,x], DF2[,x], use="na.or.complete"))

edited Oct 26 '18 at 16:59

answered Oct 26 '18 at 10:40

AkselA

7,593
2
19
31

score 0 · Answer 2 · answered Oct 26 '18 at 10:46

If you try to calculate the correlations between the rows of the two data frames (4 correlations total, 1 correaltion per row) you can try this:

for (M in 1:nrow(DF1)) {
Test=cor(as.numeric(DF1[M,]),as.numeric(DF2[M,])) 
print(Test)
}

[1] 1
[1] 1
[1] NA
[1] NA

Correlate two dataframes while having lots of NAs

2 Answers2