1

I want to use ddply or group_by to mutate an existing dataframe based on the values in one of the columns in the dataframe.

I have a dataframe with 3 columns. I want to identify the ROI within each ID and Condition that has the maximum value in df$Value. So for the following df, ROI 3 would be called Max for ID 1+Match condition, ROI 4 would be Max for ID 1+NoMatch Condition and so on.

set.seed(1)
df <- data.frame("ID"=sort(rep_len(1:2, 12)), "ROI"=rep_len(1:6, 12), "Condition"=rep_len(c(rep_len("Match", 3), rep_len("NoMatch", 3)), 12), "Value"=runif(12), MaxROI="None")

I tried using some combinations of ddply and group_by. For instance:

ddply(df, c("ID", "Condition"), mutate, MaxROI[which.max(Value)]="Max")

#generates an error
#Error: unexpected '=' in "ddply(df, c("ID", "Condition"), mutate, MaxROI[which.max(Value)]="

I have looked here, but I don't want to filter the dataframe to keep the rows with max values, but mutate the existing df.

Thank you,

Mrinmayi

Mrinmayi
  • 27
  • 1
  • 4

1 Answers1

1

We can use dplyr. After grouping by 'ID', 'Condition', create the column 'Max' by comparing the 'Value' with max of 'Value' in case_when to create the "Max" string where there is a max 'Value' or else by default it is NA

library(dplyr)
df %>% 
   group_by(ID, Condition) %>% 
   dplyr::mutate(Max =case_when(Value == max(Value) ~ "Max"))
akrun
  • 674,427
  • 24
  • 381
  • 486
  • Thanks a lot for your response! Ideally, I would like to label an ROI as max, instead of importing the actual Value. I could do that with an additional step of: ` df[df$value==df$Max, "MaxROI" – Mrinmayi Jun 17 '20 at 21:11
  • 1
    Works perfectly for cases where there is a max! Thanks a lot! – Mrinmayi Jun 18 '20 at 16:56
  • @Mrinmayi thanks. If it works, please consider to accept solution by clicking on the tick mark. It would also improve your reputation – akrun Jun 18 '20 at 18:43