0

Consider the sample data

df <-
  structure(
    list(
      id = c(1L, 1L, 1L, 1L, 2L, 2L, 3L),
      A = c(20L, 12L, 13L, 8L, 11L, 21L, 17L),
      B = c(1L, 1L, 0L, 0L, 1L, 0L, 0L)
    ),
    .Names = c("id", "A", "B"),
    class = "data.frame",
    row.names = c(NA,-7L)
  )

Each id (stored in column 1) has varying number of entries for column A and B. In the example data, there are four observations with id = 1. I am looking for a way to subset this data in R so that there will be at most 3 entries for for each id and finally create another column (labelled as C) which consists of the order of each id. The expected output would look like:

df <-
  structure(
    list(
      id = c(1L, 1L, 1L, 2L, 2L, 3L),
      A = c(20L, 12L, 13L, 11L, 21L, 17L),
      B = c(1L, 1L, 0L, 1L, 0L, 0L),
      C = c(1L, 2L, 3L, 1L, 2L, 1L)
    ),
    .Names = c("id", "A", "B","C"),
    class = "data.frame",
    row.names = c(NA,-6L)
  )

Your help is much appreciated.

Mike H.
  • 12,940
  • 1
  • 24
  • 35
T Richard
  • 501
  • 2
  • 7
  • 1
    How are the 3 enteries selected? the top 3 `A` values or any 3 values? – Ronak Shah Aug 08 '18 at 05:33
  • Most probably these two links should solve your question . - https://stackoverflow.com/questions/27766054/getting-the-top-values-by-group and https://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame – Ronak Shah Aug 08 '18 at 05:41

2 Answers2

1

Like this?

library(data.table)
dt <- as.data.table(df)
dt[, C := seq(.N), by = id]
dt <- dt[C <= 3,]
dt
#    id  A B C
# 1:  1 20 1 1
# 2:  1 12 1 2
# 3:  1 13 0 3
# 4:  2 11 1 1
# 5:  2 21 0 2
# 6:  3 17 0 1
Matthew Hui
  • 574
  • 2
  • 14
1

Here is one option with dplyr and considering the top 3 values based on A (based of the comments of @Ronak Shah).

library(dplyr)
df %>%
        group_by(id) %>%
        top_n(n = 3, wt = A) %>% # top 3 values based on A
        mutate(C = rank(id, ties.method = "first")) # C consists of the order of each id
# A tibble: 6 x 4
# Groups:   id [3]
     id     A     B     C
  <int> <int> <int> <int>
1     1    20     1     1
2     1    12     1     2
3     1    13     0     3
4     2    11     1     1
5     2    21     0     2
6     3    17     0     1
nghauran
  • 6,022
  • 2
  • 14
  • 23