How to remove all rows of a dataframe which have a particular value in one column except for the row with the largest value in another column?

Question

Okay that title could probably be clearer but I'm not sure how else to word it.

Here is an example of the dataframe that I'm working with.

index | run | time_step | users
1       1        1          12
2       1        2          11
3       2        1          12
4       2        2          10
5       1        3           9
6       2        3          10
7       2        4           9
8       2        5           8
9       2        6           6
10      1        4           5
11      3        1          12
12      3        2           8

So what I want to cut the dataframe such that the only rows that are left are indices 9, 10, and 12. That is trivial in this example but the full dataset is significantly larger with a couple 10,000 runs.

How would you cut rows out in way that finds the largest value of time_step for each run and keeps that row but none of the other rows with the same run value?

edit: for clarification the results would look like this

index | run | time_step | users
9       2        6           6
10      1        4           5
12      3        2           8

With `dplyr`, you can do `df %>% group_by(run) %>% slice(which.max(time_step)) ` — Ronak Shah, Apr 04 '19 at 11:32
Thanks that did it! I wasn't familiar with `dplyr` so thanks for showing me. — Y Ahmed, Apr 04 '19 at 11:41

How to remove all rows of a dataframe which have a particular value in one column except for the row with the largest value in another column?

0 Answers0