1

I need to identify leavers in a survey dataset. For this, I would like to add another column to my data which counts the consecutive NA, beginning with one specific column and then counting backwards.

I already counted the overall NA as explained here, and although a high count of NA is a pretty good indicator, I'd like to make sure people didn't just skip through parts of the questionnaire instead of outright leaving.

Here's some example data:

df <- structure(list(f1 = c(3, 3, 1, 2, 3, 2, 2, NA, 2, 3), f2num = c(170, 
NA, 182, 173, 169, NA, NA, NA, 153, 178), f3num = c(105, NA, 
77, 80, 58, NA, NA, NA, 45, 81), f4num = c(2, NA, 0, NA, NA, 
NA, 1, NA, 0, 0), f5num = c(9, NA, 1, NA, NA, NA, 2, NA, 0, 2
), f6num = c(NA, NA, NA, NA, NA, NA, 0, NA, NA, NA), f7 = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), f7num = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), f8num = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), f9 = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_)), .Names = c("f1", "f2num", "f3num", "f4num", 
"f5num", "f6num", "f7", "f7num", "f8num", "f9"), row.names = c(NA, 
10L), class = "data.frame")

> df
   f1 f2num f3num f4num f5num f6num f7 f7num f8num f9
1   3   170   105     2     9    NA NA    NA    NA NA
2   3    NA    NA    NA    NA    NA NA    NA    NA NA
3   1   182    77     0     1    NA NA    NA    NA NA
4   2   173    80    NA    NA    NA NA    NA    NA NA
5   3   169    58    NA    NA    NA NA    NA    NA NA
6   2    NA    NA    NA    NA    NA NA    NA    NA NA
7   2    NA    NA     1     2     0 NA    NA    NA NA
8  NA    NA    NA    NA    NA    NA NA    NA    NA NA
9   2   153    45     0     0    NA NA    NA    NA NA
10  3   178    81     0     2    NA NA    NA    NA NA

My expected output should look like this:

> df
   f1 f2num f3num f4num f5num f6num f7 f7num f8num f9 consNA
1   3   170   105     2     9    NA NA    NA    NA NA      5
2   3    NA    NA    NA    NA    NA NA    NA    NA NA      9
3   1   182    77     0     1    NA NA    NA    NA NA      5
4   2   173    80    NA    NA    NA NA    NA    NA NA      7
5   3   169    58    NA    NA    NA NA    NA    NA NA      7
6   2    NA    NA    NA    NA    NA NA    NA    NA NA      9
7   2    NA    NA     1     2     0 NA    NA    NA NA      4
8  NA    NA    NA    NA    NA    NA NA    NA    NA NA     10
9   2   153    45     0     0    NA NA    NA    NA NA      5
10  3   178    81     0     2    NA NA    NA    NA NA      5

Jthorpe's answer to this question got me as far as

t(apply(df,1,function(x)which.min(rev(is.na(x)))-1))

     1 2 3 4 5 6 7 8 9 10
[1,] 5 9 5 7 7 9 4 0 5  5

which is obviously almost what I need, but it does not work if everything is NA (see row 8).

halfer
  • 18,701
  • 13
  • 79
  • 158
LAP
  • 6,330
  • 2
  • 11
  • 24
  • What if there are two sets of `NA` ? `NA NA 2 3 4 `NA NA` or something like that. Which one would you like to count? – Ronak Shah Jun 22 '17 at 09:34
  • I want to count from a self-defined point (a column in the dataset) backwards until the first occurence of non-`NA`. See row 7 in the example data. In case of the example data I start at the last column and go backwards through the whole dataset. – LAP Jun 22 '17 at 09:36

1 Answers1

2

This is a bit clumsy but it works :

df$consNA <- apply(df, 1, function(x) sum(cumsum(!is.na(rev(x))) == 0))

df$consNA
#[1]  5  9  5  7  7  9  4 10  5  5

For every row, we reverse its order and count the first set of NAs until any non-NA is encountered.

Ronak Shah
  • 286,338
  • 16
  • 97
  • 143