dplyr: making top_n break ties

Asked May 16 '19 at 12:55

Active May 16 '19 at 12:55

Viewed 35 times

I'm migrating from plyr to dplyr. I replaced

ddply(data, c("sampleno"), function(s) s[which.max(s$voice_score),])

with

data %>% group_by(sampleno) %>% top_n(1, voice_score)

but ran into a problem because top_n includes multiple entries in case of a tie, which is not what I want. It doesn't matter how it breaks symmetry so long as I only get one result -- how should I do this?

asked May 16 '19 at 12:55

Mohan

4,674
5
26
45

You can `arrange()` and keep the first entry, i.e. `df %>% group_by(sampleno) %>% arrange(voice_score) %>% slice(1L)` or `df %>% group_by(sampleno) %>% slice(which.max(voice_score))` – Sotos May 16 '19 at 12:58
`top_n` is essentially a wrapper around `min_rank` and `filter`, as explained in the docs. `min_rank` is itself a wrapper around the base `rank`, whose docs explain the different ties options. You can split these tasks to filter for the lowest rank using whichever tie method you want. – camille May 16 '19 at 13:08

dplyr: making top_n break ties

0 Answers0