0

I need to get the count of missing values across rows. I was able to do that using the apply function as follows:

x1=c(1:5,NA,8)
x2=c(1:4,NA,NA,8)
data_cmb=data.frame(x1,x2)
data_cmb$sum_na=apply(data_cmb,1,function(x)
  sum(is.na(x)))

data_cmb
  x1 x2 sum_na
1  1  1      0
2  2  2      0
3  3  3      0
4  4  4      0
5  5 NA      1
6 NA NA      2
7  8  8      0

I am learning dplyr these days. So I was wondering whether I can do the same thing using dplyr package in r. Will that be a possibility ?

I appreciate any comment.

Thank you

student_R123
  • 702
  • 6
  • 20

2 Answers2

1

In dplyr you can use rowwise to count NA values by row.

library(dplyr)

data_cmb %>%
  rowwise() %>%
  mutate(sum_na = sum(is.na(c_across())))

#     x1    x2 sum_na
#  <dbl> <dbl>  <int>
#1     1     1      0
#2     2     2      0
#3     3     3      0
#4     4     4      0
#5     5    NA      1
#6    NA    NA      2
#7     8     8      0

Another option is pmap_dbl :

data_cmb %>% mutate(sum_na = purrr::pmap_dbl(., ~sum(is.na(c(...)))))

An efficient approach in base R would be using rowSums with is.na :

data_cmb$sum_na <- rowSums(is.na(data_cmb))

which can be written with dplyr pipes as :

data_cmb %>% mutate(sum_na =  rowSums(is.na(.)))
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143
  • What's the advantage of the `pmap_dbl` option? The syntax is pretty hard to follow, especially compared to something as straightforward as `rowSums` – camille Feb 02 '21 at 06:08
0

We can use apply in base R

apply(data_cmb, 1, function(x) sum(is.na(x)))
#[1] 0 0 0 0 1 2 0
akrun
  • 674,427
  • 24
  • 381
  • 486