How to get the top cases for each group using dplyr?

Question

I have a data table with 3 columns: ID, Type, and Count. For each ID, I want to get the Type with top 2 Count in this ID, and flatten the result into one row. For example, if my data table is like below:

ID   Type   Count
A    1      8
B    1      3
A    2      5
A    3      2
B    2      1
B    3      4

Then I want my output to be two rows like below:

ID   Top1Type   Top1TypeCount  Top2Type   Top2TypeCount
A    1          8              2          5
B    3          4              1          3

Can anyone tell me how to achieve this using the dplyr library in R? Thank you very much.

Use `top_n` to select the top `n` cases – Jaap Nov 13 '15 at 07:29 — Jaap, Nov 13 '15 at 07:29

Jaap · Answer 1 · 2015-11-13T08:42:18.213

It's mostly better to keep your data in a long/tidy format. To achieve that, you can use:

df1 %>% group_by(ID) %>% top_n(2, Count) %>% arrange(ID)

which gives:

      ID  Type Count
  (fctr) (int) (int)
1      A     1     8
2      A     2     5
3      B     1     3
4      B     3     4

When you have ties, you can use slice to select an equal number of observations for each group:

# some example data
df2 <- structure(list(ID = structure(c(1L, 2L, 1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor"), 
                      Type = c(1L, 1L, 2L, 3L, 2L, 3L), 
                      Count = c(8L, 3L, 8L, 8L, 1L, 4L)), 
                 .Names = c("ID", "Type", "Count"), class = "data.frame", row.names = c(NA, -6L))

Without slice():

df2 %>% group_by(ID) %>% top_n(2, Count) %>% arrange(ID)

gives:

      ID  Type Count
  (fctr) (int) (int)
1      A     1     8
2      A     2     8
3      A     3     8
4      B     1     3
5      B     3     4

With the use of slice():

df2 %>% group_by(ID) %>% top_n(2, Count) %>% arrange(ID) %>% slice(1:2)

gives:

      ID  Type Count
  (fctr) (int) (int)
1      A     1     8
2      A     2     8
3      B     1     3
4      B     3     4

With arrange you can determine the order of the cases and thus which are selected by slice. The following:

df2 %>% group_by(ID) %>% top_n(2, Count) %>% arrange(ID, -Type) %>% slice(1:2)

gives this result:

      ID  Type Count
  (fctr) (int) (int)
1      A     3     8
2      A     2     8
3      B     3     4
4      B     1     3

Thanks Jaap for your solution. By using top_n function if there are ties in the Count, then all ties will be returned, is there any way to only return exactly 2 types? — Carter, Nov 13 '15 at 08:05
@Carter I added an example illustrating the use of `slice()` which can be used to select an equal number of cases for each group — Jaap, Nov 13 '15 at 08:25

akrun · Accepted Answer · 2015-11-13T08:43:13.917

Using data.table, we convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'ID', we order the 'Count' in descending order, subset the first two rows (head(.SD, 2)). Then, we create a sequence column ('N') grouped by 'ID', and dcast from 'long' to 'wide'. The data.table dcast can take multiple value.var columns.

library(data.table)#v1.9.6+
DT <- setDT(df1)[order(-Count), head(.SD, 2) , by = ID]
DT[, N:= 1:.N, by = ID]

dcast(DT, ID~paste0('Top', N), 
            value.var=c('Type', 'Count'), fill = 0)
#    ID Type_Top1 Type_Top2 Count_Top1 Count_Top2
#1:  A         1         2          8          5
#2:  B         3         1          4          3

data

df1 <- structure(list(ID = c("A", "B", "A", "A", "B", "B"), 
Type = c(1L, 
1L, 2L, 3L, 2L, 3L), Count = c(8L, 3L, 5L, 2L, 1L, 4L)), 
.Names = c("ID", 
"Type", "Count"), class = "data.frame", row.names = c(NA, -6L))

thank you for your help akrun, it works – Carter Nov 13 '15 at 08:04 — Carter, Nov 13 '15 at 08:04

How to get the top cases for each group using dplyr?

2 Answers2

data