0

I am curious how to create another dataset in R, which would store maximum value for a factor variable and matching observation for that maximum value.

Here is a fragment of dataset with just 4 subjects and a code:

library(data.table) my.data <- structure(list(Subject = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), Supervisor = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Emmi", "Pauli"), class = "factor"), Time = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 2L, 3L, 3L, 3L), .Label = c("01.02.2016 09:45", "01.02.2016 09:48", "01.03.2016 09:55"), class = "factor"), Trials = c(1L, 2L, 3L, 4L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 3L, 4L), Force = c(403.8, 464.6, 567.6, 572.9, 572.4, NA, 533.1, 547, 532.6, 503.8,464.6, 367.6, 372.9), ForceProduction = c(1073, 1149.6, 1944.7, 1906.4, 2260.9, NA, 2634.5, 2471.6, 1187.9, 1073, 1149.6,1944.7, 1906.4)), .Names = c("Subject", "Supervisor", "Time", "Trials", "Force", "ForceProduction"), class = "data.frame", row.names = c(NA, -13L))

DT=as.data.table(my.data) new.data <- DT[,.SD[which.max(Force)],by=Trials]

Each subject did 2-4 trials. I need to select max value among all trials for a given subject based on Force. So I am interested in max value of Force column. All other observation related to this max Force should be preserved, those that are not in line with max Force should be abondened.

The code result is strange. Just for 3 subjects, ignoring the rest. And not best trial. But I think that I am totally wrong somewhere.

Can you please direct me to a better solution?

Uwe Keim
  • 36,867
  • 50
  • 163
  • 268
  • Please provide a minimal, reproducible example which can be easily copy&pasted into an R session. [Here are a few tips](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to do just that. – Roman Luštrik Nov 24 '16 at 20:34

1 Answers1

0

Here's a simply dplyr chain that should give you what you want. Grouping by each subject, filter only the values where Force is a maximum for that subject.

library(dplyr)

my.data %>% 
  group_by(Subject) %>% 
  filter(Force == max(Force, na.rm = TRUE))
Jake Kaupp
  • 7,097
  • 2
  • 21
  • 34