-1

I have data that is structured similarly to the following:

a<-data.frame(ID=c(1,2,2,2,3,3),Date=as.Date(c("2017-01-01","2017-01-02","2017-01-03","2017-01-04","2017-01-05","2017-01-06")))
print(a)
ID       Date
1   2017-01-01
2   2017-01-02
2   2017-01-03
2   2017-01-04
3   2017-01-05
3   2017-01-06

I want to remove any repeat ID and keep the most recent one based on Date to obtain the following:

b<-data.frame(ID=c(1,2,3),Date=as.Date(c("2017-01-01","2017-01-04","2017-01-06")))
print(b)
ID       Date
1   2017-01-01
2   2017-01-04
3   2017-01-06

Thank you!

costebk08
  • 1,135
  • 3
  • 11
  • 38
  • 1
    Try `top_n` option discussed in the duplicate link: `a %>% group_by(ID) %>% top_n(1, Date)` – CPak Jan 18 '18 at 18:56

2 Answers2

2

With dplyr you can do:

a %>% group_by(ID) %>% filter(Date == max(Date))

spinodal
  • 602
  • 6
  • 15
0

Using data.table:

  library(data.table)
  setDT(a)
  a[, max_date := max(Date), by = ID]
  a <- a[max_date == Date, ]
  a[, max_date := NULL]

You'll get:

    ID       Date
1:  1 2017-01-01
2:  2 2017-01-04
3:  3 2017-01-06
sm925
  • 2,522
  • 1
  • 15
  • 22