0

I'd like to add a series variables to a dataframe:

patent <- c(1,2,2)
temp1 <- c(TRUE,FALSE,FALSE)
temp2 <- c(FALSE,TRUE,TRUE)
df <- data.frame(patent,temp1,temp2)
df

  patent temp1 temp2
       1  TRUE FALSE
       2 FALSE  TRUE
       2 FALSE  TRUE

What I would like to do is to generate column sums for each temp var, by patent, in a loop / function / using dplyr / etc. Here is what I'm ultimately looking for:

  patent temp1 temp2 new1 new2
       1  TRUE FALSE    1    0
       2 FALSE  TRUE    0    2
       2 FALSE  TRUE    0    2
MrFlick
  • 163,738
  • 12
  • 226
  • 242
  • 1
    Possible duplicate of [Sum across multiple columns with dplyr](https://stackoverflow.com/questions/28873057/sum-across-multiple-columns-with-dplyr) – InfiniteFlash Jan 12 '18 at 21:23
  • 1
    Not a duplicate of that question (although still very possibly a dupe), since this one asks for sums within `patent` groupings. – IRTFM Jan 12 '18 at 23:02

3 Answers3

1

You can try the following without specifying each column individually

library(dplyr)
df %>%
  group_by(patent) %>%
  mutate_at(vars(contains("temp")), sum) %>%
  ungroup() %>%
  select(-patent) %>%
  setNames(paste0("new", seq_len(ncol(df)-1))) %>%
  cbind(df, .)

  # patent temp1 temp2 new1 new2
# 1      1  TRUE FALSE    1    0
# 2      2 FALSE  TRUE    0    2
# 3      2 FALSE  TRUE    0    2
CPak
  • 12,079
  • 2
  • 20
  • 38
1

Another solution. We can calculate the sum of each patent to each column first using summarise_all(funs(sum(.))), and then we joined the result to the original data frame using left_join. No need to specify individual column names.

library(dplyr)

df2 <- df %>%
  group_by(patent) %>%
  summarise_all(funs(sum(.))) %>%
  setNames(sub("temp", "new", names(.))) %>%
  left_join(df, ., by = "patent")
df2
#   patent temp1 temp2 new1 new2
# 1      1  TRUE FALSE    1    0
# 2      2 FALSE  TRUE    0    2
# 3      2 FALSE  TRUE    0    2

Or we can use mutate_all to calculate the sum and store the result to new columns.

df2 <- df %>%
  group_by(patent) %>%
  mutate_all(funs("sum" = sum(.))) %>%
  ungroup()
df2
# # A tibble: 3 x 5
#   patent temp1 temp2 temp1_sum temp2_sum
#    <dbl> <lgl> <lgl>     <int>     <int>
# 1   1.00 T     F             1         0
# 2   2.00 F     T             0         2
# 3   2.00 F     T             0         2

If the column names need to be exactly the same as the example, we can do the following.

df2 <- df %>%
  group_by(patent) %>%
  mutate_all(funs("sum" = sum(.))) %>%
  ungroup() %>%
  setNames(ifelse(grepl("_sum$", names(.)), sub("temp", "new", names(.)), names(.))) %>%
  setNames(sub("_sum$", "", names(.)))
df2
# # A tibble: 3 x 5
#   patent temp1 temp2  new1  new2
#    <dbl> <lgl> <lgl> <int> <int>
# 1   1.00 T     F         1     0
# 2   2.00 F     T         0     2
# 3   2.00 F     T         0     2
www
  • 35,154
  • 12
  • 33
  • 61
0

Using dplyr you can do

library(dplyr)
df %>% group_by(patent) %>%
  mutate(new1=sum(temp1), new2=sum(temp2))

#   patent temp1 temp2  new1  new2
#    <dbl> <lgl> <lgl> <int> <int>
# 1      1  TRUE FALSE     1     0
# 2      2 FALSE  TRUE     0     2
# 3      2 FALSE  TRUE     0     2
MrFlick
  • 163,738
  • 12
  • 226
  • 242