0

I'm migrating from plyr to dplyr. I replaced

ddply(data, c("sampleno"), function(s) s[which.max(s$voice_score),])

with

data %>% group_by(sampleno) %>% top_n(1, voice_score)

but ran into a problem because top_n includes multiple entries in case of a tie, which is not what I want. It doesn't matter how it breaks symmetry so long as I only get one result -- how should I do this?

Mohan
  • 4,674
  • 5
  • 26
  • 45
  • You can `arrange()` and keep the first entry, i.e. `df %>% group_by(sampleno) %>% arrange(voice_score) %>% slice(1L)` or `df %>% group_by(sampleno) %>% slice(which.max(voice_score))` – Sotos May 16 '19 at 12:58
  • `top_n` is essentially a wrapper around `min_rank` and `filter`, as explained in the docs. `min_rank` is itself a wrapper around the base `rank`, whose docs explain the different ties options. You can split these tasks to filter for the lowest rank using whichever tie method you want. – camille May 16 '19 at 13:08

0 Answers0