1

I want to filter an entire group based on a value at a specified row.

In the data below, I'd like to remove all rows of group ID, according the value of Metric for Hour == '2'. (Note that I am not trying to filter based on two conditions here, I'm trying to filter based on one condition but at a specific row)

Sample data:

ID <- c('A','A','A','A','A','B','B','B','B','C','C')
Hour <- c('0','2','5','6','9','0','2','5','6','0','2')
Metric <- c(3,4,1,6,7,8,8,3,6,1,1)

x <- data.frame(ID, Hour, Metric)

   ID Hour Metric
1   A    0      3
2   A    2      4
3   A    5      1
4   A    6      6
5   A    9      7
6   B    0      8
7   B    2      8
8   B    5      3
9   B    6      6
10  C    0      1
11  C    2      1

I want to filter each ID based on whether Metric > 5 for Hour == '2'. The result should look like this (all rows of ID B are removed):

       ID Hour Metric
    1   A    0      3
    2   A    2      4
    3   A    5      1
    4   A    6      6
    5   A    9      7
    10  C    0      1
    11  C    2      1

A dplyr-based solution would be preferred, but any help is much appreciated.

holastello
  • 531
  • 5
  • 12
  • 1
    This question might be a duplicate but I don't see the answers in the linked questions. – Alex Oct 26 '17 at 23:41

2 Answers2

4

Adapting How to filter (with dplyr) for all values of a group if variable limit is reached?

we get:

x %>%
    group_by(ID) %>%
    filter(any(Metric[Hour == '2'] <= 5))

# # A tibble: 7 x 3
# # Groups:   ID [2]
# ID   Hour Metric
# <fctr> <fctr>  <dbl>
#     1      A      0      3
# 2      A      2      4
# 3      A      5      1
# 4      A      6      6
# 5      A      9      7
# 6      C      0      1
# 7      C      2      1

These type of problems can be also answered by first creating a by group intermediate variable, to flag whether rows should be removed.

Method 1:

x %>%
    group_by(ID) %>%
    mutate(keep_group = (any(Metric[Hour == '2'] <= 5))) %>%
    ungroup %>%
    filter(keep_group) %>%
    select(-keep_group)

Method 2:

groups_to_keep <-
    x %>%
    filter(Hour == '2', Metric <= 5) %>%
    select(ID) %>%
    distinct() # N.B. this sorts groups_to_keep by ID which may not be desired
#    ID
# 1  A
# 2  C

x %>%
    inner_join(groups_to_keep, by = 'ID')
#    ID Hour Metric
# 1  A    0      3
# 2  A    2      4
# 3  A    5      1
# 4  A    6      6
# 5  A    9      7
# 6  C    0      1
# 7  C    2      1

Method 3 - as suggested by @thelatemail (safe with respect to duplicates in ID):

groups_not_to_keep <-
    x %>% 
    filter(Hour == 2, Metric > 5) %>% 
    select(ID)

x %>%
    anti_join(groups_not_to_keep, by = 'ID')
Alex
  • 13,293
  • 9
  • 60
  • 113
2

Not in (!()) should be useful here. Try this

library(dplyr)
filter(x, Metric > 5 & Hour == '2')$ID # gives B
subset(x, !(ID  %in% filter(x, Metric > 5 & Hour == '2')$ID))
nghauran
  • 6,022
  • 2
  • 14
  • 23