0

I have a data that contains a variety of NA values. The easiest one is NA, which could be easily found by using is.na(). However, some are just blank values, and some are N/A values.

For NA values, I used colnames(data)[colSums(is.na(data)) > 0] to find the column name of those containing NA values. I wanted to do the same for those with blanks and N/A.

The data looks like this:

data = read.csv("file")

 id     description       hosts      zipcode   room available   no room
3432    It is good       Michael P.   10203          T            3
3433                     Sam E.       12030          T            9
1023    It is not bad                  NA            F            NA
2020                       N/A         NA            F            NA

id: numeric unique description: text hosts: text zipcode: numeric unique room available: factor no room: numeric

I can find N/A values data[data=="N/A"] like this but this doesn't give me the column names.

NelsonGon
  • 11,358
  • 5
  • 21
  • 44
tehehe
  • 3
  • 1

1 Answers1

1

If those are the only cases, something like this could work:

na_cols = sapply(df, function(x) sum(ifelse(x == '' | is.na(x) == TRUE | x == 'N/A', 1, 0)))
names(na_cols[na_cols > 0])

If there were more "NA" conditions, you'd need to add to the ifelse statement.