1

I am looking to return the corresponding value for a max value for each group in a dataframe in R. Searching I can only find solutions for python and excel.

I seem to get the right answers but in a strange format:

Example:

set.seed(423)
df = data.frame(week = c(rep(1, 7), rep(2, 7), rep(3, 7)), 
                day = c(1:7, 1:7, 1:7), 
                value = runif(21))
    df
   week day      value
1     1   1 0.89368600
2     1   2 0.63863225
3     1   3 0.19254541
4     1   4 0.57557113
5     1   5 0.78458928
6     1   6 0.55080956
7     1   7 0.59388856
8     2   1 0.02040073
9     2   2 0.17663162
10    2   3 0.33647923
11    2   4 0.53304330
12    2   5 0.22939499
13    2   6 0.43232959
14    2   7 0.71889969
15    3   1 0.97318020
16    3   2 0.20320008
17    3   3 0.58991593
18    3   4 0.88450876
19    3   5 0.61154896
20    3   6 0.68123761
21    3   7 0.48162899

library('dplyr')

group_by(df, week) %>%
  summarize(max.day = .[which(value == max(value, na.rm = T)), 'day'])

   week max.day$    NA    NA
  <dbl>    <int> <int> <int>
1     1        1     7     1
2     2       NA    NA    NA
3     3       NA    NA    NA

The value for max.day (1, 7, 1) appear correct, as can be seen if you match the values from this code to the original df:

group_by(df, week) %>%
  summarise(value = max(value))

   week value
  <dbl> <dbl>
1     1 0.894
2     2 0.719
3     3 0.973

But what I want (and what I expected from the code) is a table that looks as follows:

  week max.day
1    1       1
2    2       7
3    3       1

What am I doing wrong here?

Also, will this code work if i have a large dataset in which the max value might repeat for certain groups. Essentially will my .[which(value == max(value, na.rm = T)), 'day'] be applied group-wise, or is this just looking at the entire vector?

M--
  • 18,939
  • 7
  • 44
  • 76
MorrisseyJ
  • 675
  • 5
  • 11
  • setNames(do.call("rbind", lapply(split(df, df$week), function(x){x[which.max(x$value),c("week", "day")]})), c("week", "max_day")) – hello_friend Nov 05 '19 at 09:44

1 Answers1

3

We can use which.max. If there are ties for max 'value' i.e. more than one max value for each 'week', then which.max returns the index of the first max 'value', use that to subset the corresponding 'day'

library(dplyr)
df %>%
   group_by(week) %>% 
   summarise(max.day = day[which.max(value)])
# A tibble: 3 x 2
#   week max.day
#  <int>   <int>
#1     1       1
#2     2       7
#3     3       1

With ==, there is the possibility of matching multiple elements if there are ties and summarise can return only single row/group resulting in conflict of interest and ultimately error


Another option is to either filter or slice the rows if the intention is to return the row

df %>%
   group_by(week) %>%
   slice(which.max(value)) %>%
   select(week, max.day = day)

data

df <- structure(list(week = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), day = c(1L, 2L, 
3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L), value = c(0.893686, 0.63863225, 0.19254541, 0.57557113, 
0.78458928, 0.55080956, 0.59388856, 0.02040073, 0.17663162, 0.33647923, 
0.5330433, 0.22939499, 0.43232959, 0.71889969, 0.9731802, 0.20320008, 
0.58991593, 0.88450876, 0.61154896, 0.68123761, 0.48162899)), 
   class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21"))
akrun
  • 674,427
  • 24
  • 381
  • 486