41

I want to find all the names of columns with NA or missing data and store these column names in a vector.

# create matrix
a <- c(1,2,3,4,5,NA,7,8,9,10,NA,12,13,14,NA,16,17,18,19,20)
cnames <- c("aa", "bb", "cc", "dd", "ee")
mymatrix <- matrix(a, nrow = 4, ncol = 5, byrow = TRUE)
colnames(mymatrix) <- cnames
mymatrix
#      aa bb cc dd ee
# [1,]  1  2  3  4  5
# [2,] NA  7  8  9 10
# [3,] NA 12 13 14 NA
# [4,] 16 17 18 19 20

The desired result: columns "aa" and "ee".

My attempt:

bad <- character()
for (j in 1:4){     
  tmp <- which(colnames(mymatrix[j, ]) %in% c("", "NA"))
  bad <- tmp
}

However, I keep getting integer(0) as my output. Any help is appreciated.

Henrik
  • 56,228
  • 12
  • 124
  • 139
lever
  • 604
  • 1
  • 5
  • 10

3 Answers3

88

Like this?

colnames(mymatrix)[colSums(is.na(mymatrix)) > 0]
# [1] "aa" "ee"

Or as suggested by @thelatemail:

names(which(colSums(is.na(mymatrix)) > 0))
# [1] "aa" "ee"
Henrik
  • 56,228
  • 12
  • 124
  • 139
  • Exactly! Thank you. I still don't understand why I wasn't able to solve it using which(colnames) – lever Dec 04 '13 at 00:44
  • 6
    An alternative without neeing to refer back to `mymatrix` is `names(which(colSums(is.na(mymatrix))>0))` – thelatemail Dec 04 '13 at 01:02
  • @lever - because the colnames were never NA - the NA's are the values in each column of actual data, not the names. Try `colnames(mymatrix)` to see that there's no sight of `NA` – thelatemail Dec 04 '13 at 01:06
  • @thelatemail - thanks for the explanation. it's just as valuable as the solution for a beginning programmer – lever Dec 04 '13 at 01:12
  • and also, "NA" is not the same as NA - the first is a text string containing two letters, the second a representation of no value. – Jon Jun 19 '18 at 14:32
  • WORKS on R 3.4/RStudio 1.3 – oaxacamatt Nov 19 '20 at 16:10
20

R 3.1 introduced an anyNA function, which is more convenient and faster:

colnames(mymatrix)[ apply(mymatrix, 2, anyNA) ]

Old answer:

If it's a very long matrix, apply + any can short circuit and run a bit faster.

apply(is.na(mymatrix), 2, any)
#   aa    bb    cc    dd    ee 
# TRUE FALSE FALSE FALSE  TRUE 
colnames(mymatrix)[apply(is.na(mymatrix), 2, any)]
# [1] "aa" "ee"
Neal Fultz
  • 8,413
  • 36
  • 49
13

If you have a data frame with non-numeric columns, this solution is more general (building on previous answers):

R 3.1 +

names(which(sapply(mymatrix, anyNA)))

or

names(which(sapply(mymatrix, function(x) any(is.na(x)))))

verbamour
  • 745
  • 7
  • 15