1

I have a dataframe as shown below

    Id         Date         Col1       Col2     Col3        Col4
    30         2012-03-31              A42.2    20.46        NA  
    36         1996-11-15   NA                  V73          55
    96         2010-02-07   X48        Z16      13
    40         2010-03-18   AD14                20.12        36
    69         2012-02-21              22.45                     
    11         2013-07-03   81         V017                  TCG11         
    22         2001-06-01                       67
    83         2005-03-16   80.45      V22.15   46.52        X29.11 
    92         2012-02-12   
    34         2014-03-10   82.12      N72.22   V45.44

I am trying to count the number of NA or Empty cells across each row and the final expected output is as follows

    Id         Date         Col1       Col2     Col3        Col4       MissCount
    30         2012-03-31              A42.2    20.46        NA        2
    36         1996-11-15   NA                  V73          55        2
    96         2010-02-07   X48        Z16      13                     1
    40         2010-03-18   AD14                20.12        36        1
    69         2012-02-21              22.45                           3
    11         2013-07-03   81         V017                  TCG11     1    
    22         2001-06-01                       67                     3
    83         2005-03-16   80.45      V22.15   46.52        X29.11    0
    92         2012-02-12                                              4   
    34         2014-03-10   82.12      N72.22   V45.44                 1

The last column MissCount will store the number of NAs or empty cells for each row. Any help is much appreciated.

Emily Fassbender
  • 261
  • 3
  • 11
  • 7
    Probably something like `rowSums(is.na(df) | df == "")` will do – David Arenburg Nov 21 '16 at 08:21
  • 1
    it could be solved at input stage by having `read.table("filename",sep=",",na.strings=c("",,NA),stringsAsFactors=FALSE)`, this will result in only NA values and you can use @DavidArenburg solution to count all NA's – Silence Dogood Nov 21 '16 at 08:43

4 Answers4

4

The one-liner

rowSums(is.na(df) | df == "")

given by @DavidArenburg in his comment is definitely the way to go, assuming that you don't mind checking every column in the data frame. If you really only want to check Col1 through Col4, then using an apply function might make more sense.

apply(df, 1, function(x) {
                sum(is.na(x[c("Col1", "Col2", "Col3", "Col4")])) +
                sum(x[c("Col1", "Col2", "Col3", "Col4")] == "", na.rm=TRUE)
             })

Edit: Shortened code

apply(df[c("Col1", "Col2", "Col3", "Col4")], 1, function(x) {
                    sum(is.na(x)) +
                    sum(x == "", na.rm=TRUE)
                 })

or if data columns are exactly like the example data:

apply(df[3:6], 1, function(x) {
                        sum(is.na(x)) +
                        sum(x == "", na.rm=TRUE)
                     })
LAP
  • 6,330
  • 2
  • 11
  • 24
Tim Biegeleisen
  • 387,723
  • 20
  • 200
  • 263
  • I think you could shorten the code to `apply(df[c("Col1", "Col2", "Col3", "Col4")], 1, function(x)` and then within the function just call the `x`, right? Or, if the data is structured as in the example, just `apply(df[3:6], 1, function(x)` – LAP Nov 21 '16 at 08:35
  • 1
    @LeoP. Feel free to make an edit in this case. I just wanted to make the point that the columns would have to be specified _somewhere_ (though your suggestion might be a bit less ugly than mine ^ ^). – Tim Biegeleisen Nov 21 '16 at 08:37
1

This should do it.

yourframe$MissCount = rowSums(is.na(yourframe) | yourframe == "" | yourframe == " "))
burton030
  • 367
  • 3
  • 5
  • 21
1

You can use by_row from library purrr:

library(purrr)

#sample data frame
x <- data.frame(A1=c(1,NA,3,NA),
                A2=c("A","B"," ","C"),
                A3=c(" "," ",NA,"t"))

Here you apply a function on each row, you can edit it according to your condition. And you can use whatever function you want. In the following example, I counted empty or NA entries in each row by using sum(...):

by_row(x, function(y) sum(y==" "| (is.na(y))),
       .to="MissCount",
       .collate = "cols"
       )

You will get:

# A tibble: 4 x 4
     A1     A2     A3 MissCount
  <dbl> <fctr> <fctr>     <int>
1     1      A                1
2    NA      B                2
3     3            NA         2
4    NA      C      t         1
OmaymaS
  • 1,321
  • 9
  • 18
0

We can use

Reduce(`+`, lapply(df, function(x) is.na(x)|!nzchar(as.character(x))))
akrun
  • 674,427
  • 24
  • 381
  • 486