0

I want to create an ID column that identifies two groups of observations based on the sum of each value as they relate to a target value

Say I have this dataset:

id <- rep(1:5)
val <- c(1, 2, 4, 5, 6)

dat <- data.frame(id, val)

I calculate the sum of val (=18) and divide by 2 (=9). I then want to create an ID column that groups observations so that their sum is equal (or is as close to as possible) to 9. This new column would then be:

dat$group_id <- c(A, A, B, B, A)

Is there a good way to automate this process for many groups of observations, assuming that in some cases there is not an exact way to group observations to reach the target value?

  • 1
    Not only will there be cases where there is not an exact way, there may be multiple solutions unless your data set is very small. Just two values that are the same creates the potential for at least two solutions. – dcarlson May 05 '21 at 21:08
  • You're right. In my case, the groups are going to be around 5 - 6 rows, so fairly small. Though there are likely multiple solutions even in this case – Thomas J. Brailey May 05 '21 at 21:10
  • 1
    It's a bin packing problem, see: https://stackoverflow.com/questions/29424130/creating-groups-of-equal-sum-in-r – tmfmnk May 05 '21 at 21:16

0 Answers0