How to get top n % and button n% in data frame in R

Question

Here is my data:

dat <- read.table(text = "id    val1    val2    vt
1   14  12  19
2   13  13  12
3   12  12  13
4   12  13  13
5   12  14  22
6   12  12  14
7   12  13  14
8   12  14  12
9   13  13  14
10  13  14  14
11  14  14  14
12  13  14  17
13  13  14  31
14  13  13  14
15  13  14  13
16  13  14  23
                
", header = TRUE)

I want to get the top 25 % and the bottom 45% according to vt.

Here is the output top25%

id  val1    val2    vt
13  13  14  31
16  13  14  23
5   12  14  22
1   14  12  19

and the top 45% is

id  val1    val2    vt
7   12  13  14
9   13  13  14
10  13  14  14
11  14  14  14
14  13  13  14
3   12  12  13
4   12  13  13
15  13  14  13
2   13  13  12
8   12  14  12

I have tried subset() with quantile, it seems it does not work for the bottom n%. Is it possible to do it with dplyr? I have checked the other links, they have not provided for the bottom n%. In addition, I do not want to get them by any group.

PLease check the `slice_max` and `slice_min` function from the tidyverse. — deschen, Jan 22 '21 at 14:25
I Have edited the questions. Please open it if at all possible — user330, Jan 22 '21 at 15:16

score 1 · Answer 1 · answered Jan 22 '21 at 14:24

1

Use dplyr::slice_min() and dplyr::slice_max().

library(dplyr)
library(magrittr)

df <- read.table(text = "id    val1    val2    vt
1   14  12  19
2   13  13  12
3   12  12  13
4   12  13  13
5   12  14  22
6   12  12  14
7   12  13  14
8   12  14  12
9   13  13  14
10  13  14  14
11  14  14  14
12  13  14  17
13  13  14  31
14  13  13  14
15  13  14  13
16  13  14  23
                
", header = TRUE)

df %>% slice_max(order_by = vt, prop = 0.25)
#   id val1 val2 vt
# 1 13   13   14 31
# 2 16   13   14 23
# 3  5   12   14 22
# 4  1   14   12 19

df %>% slice_min(order_by = vt, prop = 0.45)
#    id val1 val2 vt
# 1   2   13   13 12
# 2   8   12   14 12
# 3   3   12   12 13
# 4   4   12   13 13
# 5  15   13   14 13
# 6   6   12   12 14
# 7   7   12   13 14
# 8   9   13   13 14
# 9  10   13   14 14
# 10 11   14   14 14
# 11 14   13   13 14

answered Jan 22 '21 at 14:24

Dunois

1,634
7
19

If you have similar values, it does not properly slice the top and button n% – user330 Jan 22 '21 at 14:51
@user330 Could you elaborate? – Dunois Jan 22 '21 at 15:19
For example, if 31 appears 5 times, you get 5 rows instead of 4 rows, – user330 Jan 22 '21 at 15:32
So you don't want identical `vt` values to be treated as distinct (values)? – Dunois Jan 22 '21 at 16:55
The identical values are not an issue. I want to code behave correctly. For example, if we have 16 values, it needs to slice 4 at the top and 4 at the butoom – user330 Jan 22 '21 at 21:10
@user330 I'm sorry but I'm completely lost now. Could you please illustrate *clearly* what your expected output is by updating the OP with an example? From what I can parse of it as it stands right now, you have the expected outputs in there--and my solution here matches that. If I am mistaken, please do correct me. Your recent comment does not really compute w.r.t. your originally stated objective. – Dunois Jan 22 '21 at 23:47
@user330 wait, what? Why? The solution doesn't address your problem though? (Or does it?) You should not accept the answer if it hasn't actually solved your problem, IMHO. – Dunois Jan 23 '21 at 19:22

score 0 · Answer 2 · answered Jan 22 '21 at 14:48

Perhaps you can try findInterval + quantile like below

res <- with(dat, split(dat, findInterval(vt, quantile(vt, c(.45, .75)), left.open = TRUE)))
res_45bottom <- head(res, 1)[[1]]
res_25top <- tail(res, 1)[[1]]

such that

> res_45bottom
   id val1 val2 vt
2   2   13   13 12
3   3   12   12 13
4   4   12   13 13
6   6   12   12 14
7   7   12   13 14
8   8   12   14 12
9   9   13   13 14
10 10   13   14 14
11 11   14   14 14
14 14   13   13 14
15 15   13   14 13

> res_25top
   id val1 val2 vt
1   1   14   12 19
5   5   12   14 22
13 13   13   14 31
16 16   13   14 23

How to get top n % and button n% in data frame in R

2 Answers2