I am looking to return the corresponding value for a max value for each group in a dataframe in R. Searching I can only find solutions for python and excel.
I seem to get the right answers but in a strange format:
Example:
set.seed(423)
df = data.frame(week = c(rep(1, 7), rep(2, 7), rep(3, 7)),
day = c(1:7, 1:7, 1:7),
value = runif(21))
df
week day value
1 1 1 0.89368600
2 1 2 0.63863225
3 1 3 0.19254541
4 1 4 0.57557113
5 1 5 0.78458928
6 1 6 0.55080956
7 1 7 0.59388856
8 2 1 0.02040073
9 2 2 0.17663162
10 2 3 0.33647923
11 2 4 0.53304330
12 2 5 0.22939499
13 2 6 0.43232959
14 2 7 0.71889969
15 3 1 0.97318020
16 3 2 0.20320008
17 3 3 0.58991593
18 3 4 0.88450876
19 3 5 0.61154896
20 3 6 0.68123761
21 3 7 0.48162899
library('dplyr')
group_by(df, week) %>%
summarize(max.day = .[which(value == max(value, na.rm = T)), 'day'])
week max.day$ NA NA
<dbl> <int> <int> <int>
1 1 1 7 1
2 2 NA NA NA
3 3 NA NA NA
The value for max.day (1, 7, 1) appear correct, as can be seen if you match the values from this code to the original df:
group_by(df, week) %>%
summarise(value = max(value))
week value
<dbl> <dbl>
1 1 0.894
2 2 0.719
3 3 0.973
But what I want (and what I expected from the code) is a table that looks as follows:
week max.day
1 1 1
2 2 7
3 3 1
What am I doing wrong here?
Also, will this code work if i have a large dataset in which the max value might repeat for certain groups. Essentially will my .[which(value == max(value, na.rm = T)), 'day']
be applied group-wise, or is this just looking at the entire vector?