remove duplicate under conditions in R

Question

I want to remove duplicates and preserve the one where the year variable is maximum. My data looks like the following:

id  name    year    position
1   Jane    1990    Sales
1   Jane    1991    Sales
1   Jane    1992    Sales
1   Jane    1993    Boss
1   Jane    1994    CEO
2   Tom     1978    HR
2   Tom     1979    Sales
2   Tom     1980    PR
2   Tom     1981    Boss
3   Jim     1981    Sales
3   Jim     1982    Sales
3   Jim     1983    PR

The wanted output is:

   id   name    year    position
    1   Jane    1992    Sales
    1   Jane    1993    Boss
    1   Jane    1994    CEO
    2   Tom     1978    HR
    2   Tom     1979    Sales
    2   Tom     1980    PR
    2   Tom     1981    Boss
    3   Jim     1982    Sales
    3   Jim     1983    PR

Would there be a way to code this? I tried the following but did not work:

new<-ddply(df, df$position=="Sales", function(df) return(df[df$year==max(df$year),]))

score 3 · Answer 1 · edited Oct 06 '17 at 19:14

3

ddply(df, .(id, name, position), summarize, year = max(year))

if you want it to be sorted

arrange(ddply(df, .(id, name, position), summarize, year = max(year)), id, year)

I do recommend the succeeder of plyr: dplyr

library(dplyr)
df %>% group_by(id, name, position) %>% summarise(year=max(year)) %>% arrange(id, year)

edited Oct 06 '17 at 19:14

Jaap

71,900
30
164
175

answered Apr 06 '14 at 09:31

Randy Lai

2,937
2
18
22

remove duplicate under conditions in R

1 Answers1