select observations based on multiple conditions

Question

For each subject, one observation (or row) is kept based on the following:

if var2=c and var3=B then keep that observation. if not, check if var2=c and var3=S and keep that observation. if not, then then check if var2=L and var3=B, finally, check if var2=L and var3=S , if not put 0.

Any help would be appreciated.

My data looks like this:

id   var1   var2    var3
1   100 L   S
1   100 L   B
1   2   C   B
1   2   C   S
2   5   C   S
2   10  L   S
2   NA  L   B
2   NA  C   B

My desired result is:

id   var1   var2    var3
1   2   C   B
2   5   C   S

I don't get it...Can you explain the conditions better please? — Sotos, Jul 13 '16 at 12:55
The first code block -- where you show the assignment of values -- doesn't make sense. For instance "var2=c and var3=B if not then ". What do you mean "if not then"? What is being tested here? Do you mean "`if(var3=="B") {var2 — Hack-R, Jul 13 '16 at 13:04
Try reading this answer: http://stackoverflow.com/questions/4935479/how-to-combine-multiple-conditions-to-subset-a-data-frame-using-or — Pete900, Jul 13 '16 at 13:16
but for `id =2` you also have `C` and `B`. Why do you take `C` and `S` — Sotos, Jul 13 '16 at 13:21
I fixed incorrect spelling and capitalization but someone keeps changing it back to the wrong way... — Hack-R, Jul 13 '16 at 14:58
@Hack-R That was probably me (the last time at least), I changed some `.` values to `NA` and I might missed something :) — Sotos, Jul 13 '16 at 16:26

score 0 · Answer 1 · answered Jul 13 '16 at 13:42

Here is an idea using dplyr,

library(dplyr)
x <- c('CB', 'CS', 'LB', 'LS') #vector with conditions
df %>% 
  group_by(id) %>% 
  na.omit() %>% 
  slice(order(match(paste0(var2, var3), x))[1])

#Source: local data frame [2 x 4]
#Groups: id [2]

#     id  var1   var2   var3
#  <int> <int> <fctr> <fctr>
#1     1     2      C      B
#2     2     5      C      S

DATA

dput(df)
structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), var1 = c(100L, 
100L, 2L, 2L, 5L, 10L, NA, NA), var2 = structure(c(2L, 2L, 1L, 
1L, 1L, 2L, 2L, 1L), .Label = c("C", "L"), class = "factor"), 
    var3 = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L), .Label = c("B", 
    "S"), class = "factor")), .Names = c("id", "var1", "var2", 
"var3"), class = "data.frame", row.names = c(NA, -8L))

@sri It doesn't have to be combined but It is easier to compare and `match` based on 'importance' — Sotos, Jul 14 '16 at 08:13

select observations based on multiple conditions

1 Answers1