Select most recent date, by row in R

Question

I have a problem that I have reduced to the following task. For a dataframe with IDs and dates;

set.seed(123)
myids <- sample(c('a001', 'a002', 'a003'), 12, replace = TRUE)
mydates <- as.Date(sample(c("2007-06-22", "2004-02-13", "2007-05-22", "2001-10-10", "2008-05-05", "2004-02-15"), 12, replace = TRUE))
mydf <- data.frame(myids, mydates)

I need to select only the row with the most recent date, for each subject. The result should be:

a001    5/5/08
a002    5/5/08
a003    2/15/04

Anyone know how to do this?

Try `library(dplyr); mydf %>% group_by(myids) %>% summarise(mydates=format(max(mydates), '%m/%d/%y'))` or if you have many columns `mydf %>% group_by(myids) %>% slice(which.max(mydates))` — akrun, Sep 09 '15 at 15:18
This is *NOT* a duplicate. OP does not ask specifically for a `dplyr` solution, and there are many other (IMO better) ways to do it. — jlhoward, Sep 09 '15 at 15:42

score 8 · Accepted Answer · answered Sep 09 '15 at 15:40

8

Here's a data.table solution.

library(data.table)
setDT(mydf)[,.SD[which.max(mydates)],keyby=myids]
#    myids    mydates
# 1:  a001 2008-05-05
# 2:  a002 2008-05-05
# 3:  a003 2004-02-15

answered Sep 09 '15 at 15:40

jlhoward

52,898
6
81
125

Select most recent date, by row in R

1 Answers1