I have a task that I'd like to accomplish in dplyr but haven't been able to sort how to do so.
I have a dataframe with years, a factor, and a value. I want to create a new column (mutate) that sums all the values within a year (group_by) and takes the value and divides by the year sum. Below shows what I want to accomplish and I have the first three columns in my df.
year factor value share
1977 a 564907 value / sum(value for year 1977)
1977 l 2852949 value / sum(value for year 1977)
1978 a 504028 value / sum(value for year 1978)
1978 1 413120 value / sum(value for year 1978)
1978 y 2553088 value / sum(value for year 1978)
1979 a 497766 value / sum(value for year 1979)
1979 c 789007 value / sum(value for year 1979)
As expected,
group_by(year) %>% summarize(year.total = sum(value))
drops the value column so I can't continue with creating the share column.
I think I need a conditional mutate, something like %>% mutate(share = value / (sum value for all years that matches current row year)). And yes, the number of rows per year is variable.