Subsetting a data frame according to recursive rows and creating a column for ordering

Question

Consider the sample data

df <-
  structure(
    list(
      id = c(1L, 1L, 1L, 1L, 2L, 2L, 3L),
      A = c(20L, 12L, 13L, 8L, 11L, 21L, 17L),
      B = c(1L, 1L, 0L, 0L, 1L, 0L, 0L)
    ),
    .Names = c("id", "A", "B"),
    class = "data.frame",
    row.names = c(NA,-7L)
  )

Each id (stored in column 1) has varying number of entries for column A and B. In the example data, there are four observations with id = 1. I am looking for a way to subset this data in R so that there will be at most 3 entries for for each id and finally create another column (labelled as C) which consists of the order of each id. The expected output would look like:

df <-
  structure(
    list(
      id = c(1L, 1L, 1L, 2L, 2L, 3L),
      A = c(20L, 12L, 13L, 11L, 21L, 17L),
      B = c(1L, 1L, 0L, 1L, 0L, 0L),
      C = c(1L, 2L, 3L, 1L, 2L, 1L)
    ),
    .Names = c("id", "A", "B","C"),
    class = "data.frame",
    row.names = c(NA,-6L)
  )

Your help is much appreciated.

How are the 3 enteries selected? the top 3 `A` values or any 3 values? — Ronak Shah, Aug 08 '18 at 05:33
Most probably these two links should solve your question . - https://stackoverflow.com/questions/27766054/getting-the-top-values-by-group and https://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame — Ronak Shah, Aug 08 '18 at 05:41

score 1 · Accepted Answer · answered Aug 08 '18 at 08:00

1

Like this?

library(data.table)
dt <- as.data.table(df)
dt[, C := seq(.N), by = id]
dt <- dt[C <= 3,]
dt
#    id  A B C
# 1:  1 20 1 1
# 2:  1 12 1 2
# 3:  1 13 0 3
# 4:  2 11 1 1
# 5:  2 21 0 2
# 6:  3 17 0 1

answered Aug 08 '18 at 08:00

Matthew Hui

574
2
14

nghauran · Answer 2 · 2018-08-08T08:20:02.260

Here is one option with dplyr and considering the top 3 values based on A (based of the comments of @Ronak Shah).

library(dplyr)
df %>%
        group_by(id) %>%
        top_n(n = 3, wt = A) %>% # top 3 values based on A
        mutate(C = rank(id, ties.method = "first")) # C consists of the order of each id
# A tibble: 6 x 4
# Groups:   id [3]
     id     A     B     C
  <int> <int> <int> <int>
1     1    20     1     1
2     1    12     1     2
3     1    13     0     3
4     2    11     1     1
5     2    21     0     2
6     3    17     0     1

Subsetting a data frame according to recursive rows and creating a column for ordering

2 Answers2