Filter data based on most recent date and ID in R

Question

I have data that is structured similarly to the following:

a<-data.frame(ID=c(1,2,2,2,3,3),Date=as.Date(c("2017-01-01","2017-01-02","2017-01-03","2017-01-04","2017-01-05","2017-01-06")))
print(a)
ID       Date
1   2017-01-01
2   2017-01-02
2   2017-01-03
2   2017-01-04
3   2017-01-05
3   2017-01-06

I want to remove any repeat ID and keep the most recent one based on Date to obtain the following:

b<-data.frame(ID=c(1,2,3),Date=as.Date(c("2017-01-01","2017-01-04","2017-01-06")))
print(b)
ID       Date
1   2017-01-01
2   2017-01-04
3   2017-01-06

Thank you!

Try `top_n` option discussed in the duplicate link: `a %>% group_by(ID) %>% top_n(1, Date)` — CPak, Jan 18 '18 at 18:56

score 2 · Accepted Answer · answered Jan 18 '18 at 18:53

2

With dplyr you can do:

a %>% group_by(ID) %>% filter(Date == max(Date))

answered Jan 18 '18 at 18:53

spinodal

602
6
15

score 0 · Answer 2 · answered Jan 18 '18 at 18:56

0

Using data.table:

  library(data.table)
  setDT(a)
  a[, max_date := max(Date), by = ID]
  a <- a[max_date == Date, ]
  a[, max_date := NULL]

You'll get:

    ID       Date
1:  1 2017-01-01
2:  2 2017-01-04
3:  3 2017-01-06

answered Jan 18 '18 at 18:56

sm925

2,522
1
15
22

Filter data based on most recent date and ID in R

2 Answers2