0

Problem

This question is an extension on the topic of subsetting using multiple logical conditions—particularly strict inequalities—to subset a data frame in R (see here and here).

Say my variable ranges from 0 to 100. I need to create a subset that returns values that are between 50 and 100, but also values less than 25.

# Data
df$var = seq(1:100)

# Desired Subset
df$var[df$var > 50 & df$var < 100 & df$var < 25]

Question

  • What is the best way to make a subset involving multiple inequalities using base R?
  • Are solutions using non-base R packages more elegant?
Danielle
  • 551
  • 5
  • 22
  • Two good `dplyr` and `data.table` solutions. Could someone address how this would work using `[ ]` subsetting for comparison? Or, explain why it isn't possible? – Danielle May 30 '17 at 15:47

3 Answers3

1

You can use dplyr filters for this. Use "|" for "or".

library(dplyr)
df %>% 
  filter(var < 25 | (var > 50 & var < 100)) 
neilfws
  • 26,280
  • 5
  • 44
  • 53
  • Is there a way to use this logic, but have the result be only one vector of the data set? Starting with `df$var %>%` doesn't work.I have a similar problem trying to use: `df$var(df$var[which(df$var < 25 | (df$var > 50 & df$var< 100)),])`. – Danielle May 29 '17 at 22:53
  • 1
    If you mean that the output should just be the `var` column, then add `%>% select(var)` to the end of the code in my answer. – neilfws May 29 '17 at 22:55
  • And if you mean that the output should be a vector, then add `%>% unlist(use.names = FALSE)` after the `select`. – neilfws May 29 '17 at 23:56
  • Great explanation for *how* to get just one column (i.e. `df$var`) and format it as a vector. I'm also interested to know the *why* starting with `df$var %in%` wouldn't work. – Danielle May 30 '17 at 20:43
  • 1
    Basically, because dplyr works on data frames and its functions work on columns or rows. – neilfws May 30 '17 at 21:57
1

We can use data.table

 library(data.table)
 setDT(df)[var < 25 |(var > 50 & var < 100)]
akrun
  • 674,427
  • 24
  • 381
  • 486
1

As OP asks for base-R subset method and by looking at the answers (they seem to desire what OP wants), following will be helpful:

df$var[(df$var > 50 & df$var < 100) | df$var < 25]

If you want to have variables between 50 and 100 and also less than 25, then you need to use | operator (equivalent of OR, as you can see in the other answers) to get your desired subset. Look below for the output;

>
#[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 51 52 53 54 
#[29] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 
#[57] 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
M--
  • 18,939
  • 7
  • 44
  • 76