How to combine multiple conditions to subset a data-frame using "OR"?

Question

I have a data.frame in R. I want to try two different conditions on two different columns, but I want these conditions to be inclusive. Therefore, I would like to use "OR" to combine the conditions. I have used the following syntax before with lot of success when I wanted to use the "AND" condition.

my.data.frame <- data[(data$V1 > 2) & (data$V2 < 4), ]

But I don't know how to use an 'OR' in the above.

IRTFM · Accepted Answer · 2017-07-07T16:08:19.190

267

my.data.frame <- subset(data , V1 > 2 | V2 < 4)

An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:

new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]

Some people criticize the use of which as not needed, but it does prevent the NA values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which would be:

 new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4)  , ]

Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...

> NA & 1
[1] NA
> 0 & NA
[1] FALSE

Order of arguments may matter when using '&".

edited Jul 07 '17 at 16:08

answered Feb 08 '11 at 16:26

IRTFM

240,863
19
328
451

1

This is the highest voted question and then one finds: http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset – PatrickT Dec 09 '14 at 13:15
1

The advantage is compactness and easy of comprehension. The disadvantage is lack of utility in function building tasks. If one wants to replicate this with `[` one needs to wrap in `which` or use additional `!is.na` constraints. – IRTFM Dec 09 '14 at 16:47
Is the 'which' required and if not why do you use it? – Cleb Jul 28 '15 at 22:25
1

It's not "required", but you may get a different result if you leave out the `which`. If both V1 and V2 are NA you would get a row of NA's at that position if you left out the `which`. I work with large datasets and even a relatively small percentage of NA's will really fill up my screen with junk output. Some people think this is a feature. I don't. – IRTFM Jul 29 '15 at 00:06
how do you include a call to `grepl` or `grep` with this to also do pattern matching for desired rows, in addition to these conditionals? – user5359531 Jul 07 '17 at 22:17
`grepl` should work. If you have a counter-example, it would make a good new question. – IRTFM Jul 08 '17 at 00:29
`subset` IS robust to the existence of `NA`s in a dataframe: `vc – Erdogan CEVHER Jul 31 '18 at 10:31
@42-, Many thanks for `NA & 1; 0 & NA # NA; FALSE`. I did not notice this nuance ever before. In the latter (`0 & NA`), why we get `FALSE` instead of `NA` as in `NA & 1 # NA`? – Erdogan CEVHER Jul 31 '18 at 10:39
Subtle coercing: `class(0); class(1) # numeric; numeric`. However, different coercing works for `1& NA; 0 & NA # NA; FALSE`. Interesting ! – Erdogan CEVHER Jul 31 '18 at 10:57
`!is.na` solution seems to be more prone to warnings in factors upon comparison: `df 4 | df$V2 == "H"), ]; df[!is.na(df$V1 | df$V2) & (df$V1 > 4 | df$V2 == "H"), ]` – Erdogan CEVHER Jul 31 '18 at 11:17
@42-, Thank you for the WHICH clarification. I was getting undesired results when using the "OR" or even the "AND" operator. Using "ifelse" for example, I think just one NA on one of the conditioning variables would yield NA result and not apply the "ELSE" condition... – Juan C Aug 13 '18 at 13:11

score 32 · Answer 2 · answered Feb 08 '11 at 16:21

32

You are looking for "|." See http://cran.r-project.org/doc/manuals/R-intro.html#Logical-vectors

my.data.frame <- data[(data$V1 > 2) | (data$V2 < 4), ]

answered Feb 08 '11 at 16:21

ncray

990
8
4

This is NOT robust to the existence of `NA`s in a dataframe: `vc – Erdogan CEVHER Jul 31 '18 at 10:25

mpalanco · Answer 3 · 2015-06-24T10:19:31.610

18

Just for the sake of completeness, we can use the operators [ and [[:

set.seed(1)
df <- data.frame(v1 = runif(10), v2 = letters[1:10])

Several options

df[df[1] < 0.5 | df[2] == "g", ] 
df[df[[1]] < 0.5 | df[[2]] == "g", ] 
df[df["v1"] < 0.5 | df["v2"] == "g", ]

df$name is equivalent to df[["name", exact = FALSE]]

Using dplyr:

library(dplyr)
filter(df, v1 < 0.5 | v2 == "g")

Using sqldf:

library(sqldf)
sqldf('SELECT *
      FROM df 
      WHERE v1 < 0.5 OR v2 = "g"')

Output for the above options:

          v1 v2
1 0.26550866  a
2 0.37212390  b
3 0.20168193  e
4 0.94467527  g
5 0.06178627  j

edited Jun 24 '15 at 10:19

answered Jun 18 '15 at 17:18

mpalanco

10,839
1
53
63

1

how would you do this for 1 AND condition and 3 OR conditions contingent so for example: my.data.frame 10 & ((data$V1 > 2) | (data$V2 < 4) | (data$V4 <5), ]. When I do this it doesn't work – R Guru Jan 21 '16 at 15:35
1

Wow! The `sqldf` package is too good. Very handy especially when `subset()` gets a bit painful :) – Dawny33 Jun 22 '16 at 12:05

How to combine multiple conditions to subset a data-frame using "OR"?

3 Answers3

Linked

Related