Calculating row percentages using Dplyr

Question

Okay so I have a dataframe like this:

ID Month Year DOW  Value
1  Jan   2019 Fri  20
1  Jan   2019 Sat  39
1  Feb   2019 Fri  30
1  Feb   2019 Sat  24
2  Jan   2019 Fri  20
2  Jan   2019 Sat  12
2  Feb   2019 Fri   1
2  Feb   2019 Sat   3

My goal is to determine the percent of each row in reference to the Month Year;

Manually calculating this the answer should be:

ID Month Year DOW  Value   Percent
1  Jan   2019 Fri  20      .338
1  Jan   2019 Sat  39      .661
1  Feb   2019 Fri  30      .554
1  Feb   2019 Sat  24      .444
2  Jan   2019 Fri  20      .625
2  Jan   2019 Sat  12      .375
2  Feb   2019 Fri   1      .25
2  Feb   2019 Sat   3      .75

Notice the Month Year combos always add to 1.

Lastly, I would like to take the average of the Percent just calculated of ID 1 and 2

Month Year DOW  Avg
Jan   2019 Fri  0.482
Jan   2019 Sat  0.518
Feb   2019 Fri  0.402
Feb   2019 Sat  0.597

The goal is to do so using dplyr

score 1 · Accepted Answer · answered Oct 23 '20 at 15:08

Does this work:

>library(dplyr)
> df %>% group_by(ID, Month, Year) %>% mutate(Percent = Value/sum(Value)) %>% 
group_by(Month, Year,DOW) %>% summarise(Avg = mean(Percent)) %>% as.data.frame()
`summarise()` regrouping output by 'Month', 'Year' (override with `.groups` argument)
  Month Year DOW       Avg
1   Feb 2019 Fri 0.4027778
2   Feb 2019 Sat 0.5972222
3   Jan 2019 Fri 0.4819915
4   Jan 2019 Sat 0.5180085
>

Data used:

structure(list(ID = c(1, 1, 1, 1, 2, 2, 2, 2), Month = c("Jan", 
"Jan", "Feb", "Feb", "Jan", "Jan", "Feb", "Feb"), Year = c(2019, 
2019, 2019, 2019, 2019, 2019, 2019, 2019), DOW = c("Fri", "Sat", 
"Fri", "Sat", "Fri", "Sat", "Fri", "Sat"), Value = c(20, 39, 
30, 24, 20, 12, 1, 3)), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -8L), spec = structure(list(
    cols = list(ID = structure(list(), class = c("collector_double", 
    "collector")), Month = structure(list(), class = c("collector_character", 
    "collector")), Year = structure(list(), class = c("collector_double", 
    "collector")), DOW = structure(list(), class = c("collector_character", 
    "collector")), Value = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

Gregor Thomas · Answer 2 · 2020-10-23T15:13:08.873

0

your_data = your_data %>%
  group_by(ID, Month, Year) %>%
  mutate(Percent = Value / sum(Value))

your_data %>% 
  filter(ID %in% c(1, 2)) %>%
  group_by(Month, Year, DOW) %>%
  summarize(Avg = mean(Percent)

edited Oct 23 '20 at 15:13

answered Oct 23 '20 at 15:06

Gregor Thomas

104,719
16
140
257

why does your first group_by not take ID into account? – John Thomas Oct 23 '20 at 15:08
@JohnThomas because you said *"My goal is to determine the percent of each row in reference to the Month Year;"* and didn't mention ID. I mistakenly added DOW, but looking more closely looks like `ID` needs to be added instead - just added. – Gregor Thomas Oct 23 '20 at 15:12

score 0 · Answer 3 · answered Oct 23 '20 at 15:36

use base

df$Percent <- with(df, ave(Value, list(ID, Month), FUN = prop.table))

  ID Month Year DOW Value   Percent
1  1   Jan 2019 Fri    20 0.3389831
2  1   Jan 2019 Sat    39 0.6610169
3  1   Feb 2019 Fri    30 0.5555556
4  1   Feb 2019 Sat    24 0.4444444
5  2   Jan 2019 Fri    20 0.6250000
6  2   Jan 2019 Sat    12 0.3750000
7  2   Feb 2019 Fri     1 0.2500000
8  2   Feb 2019 Sat     3 0.7500000

Calculating row percentages using Dplyr

3 Answers3