Identify rows with complete data in R by adding details in additional column

Question

For a sample dataframe:

df1 <- structure(list(id = structure(1:5, .Label = c("a", "b", "c", 
"d", "e"), class = "factor"), cat = c(5L, 7L, 6L, 2L, 8L), dog = c(7L, 
NA, 6L, 13L, 2L), sheep = c(NA, 6L, 3L, 6L, 2L), cow = c(2L, 
10L, 8L, 9L, 1L), rabbit = c(5L, 3L, NA, 2L, 4L), pig = c(7L, 
NA, 12L, 5L, NA)), .Names = c("id", "cat", "dog", "sheep", "cow", 
"rabbit", "pig"), class = "data.frame", row.names = c(NA, -5L
))

I want to add an extra column 'complete.farm' to identify which rows have values in columns 'sheep' AND 'cow' AND 'pig'. Any rows with NAs in one or more of these columns should get a 0 and rows with real values should get a 1.

If anyone could give me some advice on this, I would really appreciate it. I usually use complete cases to subset my dataframe, but this time, I only want to add this information in a column.

LyzandeR · Accepted Answer · 2015-05-25T22:49:48.030

2

This seems to work:

> df1$complete.farm <- ifelse( !is.na(df1$pig) & !is.na(df1$sheep) & !is.na(df1$cow), 1,0)
> df1
  id cat dog sheep cow rabbit pig complete.farm
1  a   5   7    NA   2      5   7             0
2  b   7  NA     6  10      3  NA             0
3  c   6   6     3   8     NA  12             1
4  d   2  13     6   9      2   5             1
5  e   8   2     2   1      4  NA             0

ifelse is vectorised so you just mention the condition on the first argument with 1 as the confirmed and 0 the non-confirmed.

Another (simpler) way as per @thelatemail 's comment below:

df1$col <- as.numeric(complete.cases(df1[c("sheep","cow","pig")]))

> df1
  id cat dog sheep cow rabbit pig complete.farm col
1  a   5   7    NA   2      5   7             0   0
2  b   7  NA     6  10      3  NA             0   0
3  c   6   6     3   8     NA  12             1   1
4  d   2  13     6   9      2   5             1   1
5  e   8   2     2   1      4  NA             0   0

edited May 25 '15 at 22:49

answered May 25 '15 at 22:43

LyzandeR

34,139
12
63
78

`complete.cases` would be simpler - `complete.cases(df1[c("sheep","cow","pig")])` – thelatemail May 25 '15 at 22:44
@thelatemail Thanks for this. This is indeed a better solution. I always forget about `complete.cases`. I haven't used it enough and I forget about it. Please do post it as an answer. It is simpler than mine and more intuitive. – LyzandeR May 25 '15 at 22:45
1

No probs - just edit your answer if you like. I think this is a duplicate question anyway and I may close it if I can find a match. – thelatemail May 25 '15 at 22:47
Thanks @thelatemail. There is definitely somewhere as a duplicate. I still think you should take credit for a good answer such as this though (and a few duplicate questions are ok if you ask me; makes it easier for users to find it in the future). If you decide to post it as a separate answer I will gladly remove it from my answer. – LyzandeR May 25 '15 at 22:52
Great - I knew complete cases would work as per the duplicate question, but I specifically wanted advice as to how to add this info to an extra column - so thanks! – KT_1 May 25 '15 at 22:55
Great then! Glad I could be of help :) (full credit to thelatemail for reminding me of `complete.cases`) – LyzandeR May 25 '15 at 22:56

Identify rows with complete data in R by adding details in additional column

1 Answers1