0

My data set(df) looks like,

   ID    Name    Rating    Score  Ranking
   1     abc       3        NA      NA
   1     abc       3        12      13
   2     bcd       4        NA      NA
   2     bcd       4        19      20

I'm trying to remove duplicates which using

   df <- df[!duplicated(df[1:2]),]

which gives,

   ID    Name    Rating    Score  Ranking
   1     abc       3        NA      NA
   2     bcd       4        NA      NA

but I'm trying to get,

   ID    Name    Rating    Score  Ranking
   1     abc       3        12      13
   2     bcd       4        19      20

How do I avoid rows containing NA's when removing duplicates at the same time, some help would be great, thanks.

epo3
  • 2,573
  • 1
  • 29
  • 51
Maddy
  • 335
  • 2
  • 6
  • 22
  • 1
    did you try `complete.cases()`? You can first filter it via `complete.cases()` and then remove duplicates – Sotos Dec 16 '16 at 13:09
  • You can also use `order`. NAs will move to the bottom of the pile: `df – lmo Dec 16 '16 at 13:13

3 Answers3

1

First, push the NAs to last with na.last = T

df<-df[with(df, order(ID, Name, Score, Ranking),na.last = T),]

then do the removing of duplicated ones with fromLast = FALSE argument:

df <- df[!duplicated(df[1:2],fromLast = FALSE),]
submartingale
  • 618
  • 6
  • 11
1

Using dplyr

df <- df %>% filter(!duplicated(.[,1:2], fromLast = T))

Azam Yahya
  • 436
  • 1
  • 6
  • 6
0

You could just filter out the observations you don't want with which() and then use the unique() function:

a<-unique(c(which(df[,'Score']!="NA"), which(df[,'Ranking']!="NA")))

df2<-unique(df[a,])

> df2
  ID Name Rating Score Ranking
2  1  abc      3    12      13
4  2  bcd      4    19      20
Andrew Haynes
  • 2,314
  • 2
  • 19
  • 31