1

I have a task that I'd like to accomplish in dplyr but haven't been able to sort how to do so.

I have a dataframe with years, a factor, and a value. I want to create a new column (mutate) that sums all the values within a year (group_by) and takes the value and divides by the year sum. Below shows what I want to accomplish and I have the first three columns in my df.

year  factor    value    share
1977     a      564907   value / sum(value for year 1977)
1977     l     2852949   value / sum(value for year 1977)
1978     a      504028   value / sum(value for year 1978)
1978     1      413120   value / sum(value for year 1978)
1978     y     2553088   value / sum(value for year 1978)
1979     a      497766   value / sum(value for year 1979)
1979     c      789007   value / sum(value for year 1979)

As expected,

group_by(year) %>% summarize(year.total = sum(value)) 

drops the value column so I can't continue with creating the share column.

I think I need a conditional mutate, something like %>% mutate(share = value / (sum value for all years that matches current row year)). And yes, the number of rows per year is variable.

David Arenburg
  • 87,271
  • 15
  • 123
  • 181
zazizoma
  • 345
  • 4
  • 15

0 Answers0