31

I've got dataframe that has batch ID and the results of six tests performed on each batch. The data looks like this:

batch_id  test1  test2  test3  test4  test5  test6
001       0.121     NA  0.340  0.877  0.417  0.662
002       0.229  0.108     NA  0.638     NA  0.574

(there are a few hundred rows in this dataframe, only one row per batch_id)

I'm looking for a way to count how many NAs there are for each batch_id (for each row). I feel like this should be do-able with a few lines of R code at the most, but I'm having trouble actually coding it. Any ideas?

zx8754
  • 42,109
  • 10
  • 93
  • 154
Shark7
  • 379
  • 1
  • 5
  • 8
  • 1
    @BenBolker Generally, I have the impression that answers to recent posts are often more appropriate, modern, or efficient than those in the alleged duplicates - especially if the duplicate post is several years old (not the case here). In this specific case, however, I'm not even sure that we're dealing with a duplicate since the linked question specifically asked for a `dplyr` solution, unlike the OP of this post. – RHertel Jun 14 '16 at 05:10
  • OK, although this particular question isn't that old (Feb of this year) and the *answers* (esp. @windrunn3r.1990's answer) overlap a lot . Should I/we vote to reopen? – Ben Bolker Jun 14 '16 at 12:51
  • @BenBolker I did not see the question you linked to when I searched for a solution. The answer to that question by Justin is what I was looking for. Should I delete my question? – Shark7 Jun 14 '16 at 23:28
  • No, duplicates are fine as long as they're marked as such. – Ben Bolker Jun 14 '16 at 23:30
  • @ BenBolker OK. Should select one of the answers to the question I posted? Tim Biegeleisen posted a solution that works well, so I feel that he should get some credit. – Shark7 Jun 14 '16 at 23:33

2 Answers2

82

You can count the NAs in each row with this command:

rowSums(is.na(dat))

where dat is the name of your data frame.

Sven Hohenstein
  • 75,536
  • 15
  • 130
  • 155
38

You could add a new column to your data frame containing the number of NA values per batch_id:

df$na_count <- apply(df, 1, function(x) sum(is.na(x)))
Tim Biegeleisen
  • 387,723
  • 20
  • 200
  • 263