4

I'll use the built-in chickwts data as an example.

Here's the data, there are 5 feed types.

> head(chickwts)

  weight      feed
1    179 horsebean
2    160 horsebean
3    136 horsebean
4    227 horsebean
5    217 horsebean
6    168 horsebean

> table(chickwts$feed)

   casein horsebean   linseed  meatmeal   soybean sunflower 
       12        10        12        11        14        12 

What I want is the top rows by weight for every feed type. However, I need a different number for each feed type? For example,

top_n_feed <-
  c(
    "casein" = 3,
    "horsebean" = 5,
    "linseed" = 3,
    "meatmeal" = 6,
    "soybean" = 3,
    "sunflower" = 2
  )

How can I do this using dplyr?

To get the top n rows of each feed type by weight I can use code as below, but I'm not sure how to extend this to a different number for each feed type.

chickwts %>%
  group_by(feed) %>% 
  slice_max(order_by = weight, n = 5)
876868587
  • 2,802
  • 2
  • 16
  • 43

4 Answers4

6

This isn't really something that dplyr names easy. I'd recommend merging in the data and then filtering.


tibble(feed=names(top_n_feed), topn=top_n_feed) %>% 
  inner_join(chickwts) %>% 
  group_by(feed) %>% 
  arrange(desc(weight), .by_group=TRUE) %>% 
  filter(row_number() <= topn) %>%
  select(-topn)

MrFlick
  • 163,738
  • 12
  • 226
  • 242
2

Any time you have a named list think purrr::imap. Avoid joins if not required, particuarly when working at scale.

library(dplyr)
library(purrr)

top_n_feed <- c(
    "casein" = 3,
    "horsebean" = 5,
    "linseed" = 3,
    "meatmeal" = 6,
    "soybean" = 3,
    "sunflower" = 2
  )

imap_dfr(top_n_feed, ~ filter(chickwts, feed %in% .y) %>% 
           slice_max(order_by = weight, n = .x))

   weight      feed
1     404    casein
2     390    casein
3     379    casein
4     227 horsebean
5     217 horsebean
6     179 horsebean
7     168 horsebean
8     160 horsebean
9     309   linseed
10    271   linseed
11    260   linseed
12    380  meatmeal
13    344  meatmeal
14    325  meatmeal
15    315  meatmeal
16    303  meatmeal
17    263  meatmeal
18    329   soybean
19    327   soybean
20    316   soybean
21    423 sunflower
22    392 sunflower
Ewen
  • 1,113
  • 7
  • 13
1

Another way using split and map2:

library(dplyr)
library(purrr)

chickwts %>%
filter(feed %in% names(top_n_feed)) %>%
split(.$feed) %>% 
map2_dfr(top_n_feed[names(.)], ~slice_max(.x, order_by = weight, n = .y))
mt1022
  • 15,027
  • 4
  • 36
  • 59
0

Bring top_n_feed in chickwts dataframe and select top n rows for each group.

library(dplyr)

tibble::enframe(top_n_feed, name = 'feed') %>% 
        left_join(chickwts, by = 'feed') %>%
        group_by(feed) %>%
        top_n(first(value), weight)

#   feed      value weight
#   <chr>     <dbl>  <dbl>
# 1 casein        3    390
# 2 casein        3    379
# 3 casein        3    404
# 4 horsebean     5    179
# 5 horsebean     5    160
# 6 horsebean     5    227
# 7 horsebean     5    217
# 8 horsebean     5    168
# 9 linseed       3    309
#10 linseed       3    260
# … with 12 more rows

For some reason I was not able to make slice_sample work for this example.

Ronak Shah
  • 286,338
  • 16
  • 97
  • 143