I would like to extract the rows with pre.post == "pre" and the maximum value on "year" within each participant (ID) in a data frame.
My data looks like this:
df <- read.table(text="pre.post ID year
post 800033 2011
pre 800033 2009
post 800033 2011
pre 800033 2010
post 800076 2009
pre 800076 2008
post 800076 2009
pre 800076 2007
pre 800100 2018
post 800125 2009
pre 800125 2008
post 800222 2010
post 800223 2013
pre 800223 2011", header = T)
In the final data set I wish to have only one row per person. This row should satisfy pre.post == "pre"
and the value of year should be the maximum value for year within this person.
For example, for ID == 800033
, I aim to have only row for (with year == 2010
).
Individuals without any pre-value (e.g. ID == 800222
) should be deleted in the final data set. Individuals with only one pre-value (e.g. ID == 800223
and ID == 800100
) the only available row should be kept.
The final data set should look like this:
df.1 <- read.table(text="pre.post ID year pre.post outcome1 outcome2
1 pre 800033 2010 A 3 4
2 pre 800076 2008 B 2 7
3 pre 800100 2018 C 3 4
4 pre 800125 2008 A 2 8
5 pre 800223 2011 C 4 NA", header = T)
I already tried this code (using dplyr):
df %>% group_by(ID) %>% filter(pre.post == "pre") %>% summarise(year = max(year))
However, I then lose the other variables and my new data set just includes ID and year.
Do you have advice on how to get this done?