Remove duplicates making sure of NA values R

Question

My data set(df) looks like,

   ID    Name    Rating    Score  Ranking
   1     abc       3        NA      NA
   1     abc       3        12      13
   2     bcd       4        NA      NA
   2     bcd       4        19      20

I'm trying to remove duplicates which using

   df <- df[!duplicated(df[1:2]),]

which gives,

   ID    Name    Rating    Score  Ranking
   1     abc       3        NA      NA
   2     bcd       4        NA      NA

but I'm trying to get,

   ID    Name    Rating    Score  Ranking
   1     abc       3        12      13
   2     bcd       4        19      20

How do I avoid rows containing NA's when removing duplicates at the same time, some help would be great, thanks.

did you try `complete.cases()`? You can first filter it via `complete.cases()` and then remove duplicates — Sotos, Dec 16 '16 at 13:09
You can also use `order`. NAs will move to the bottom of the pile: `df — lmo, Dec 16 '16 at 13:13

score 1 · Answer 1 · answered Dec 16 '16 at 13:41

1

First, push the NAs to last with na.last = T

df<-df[with(df, order(ID, Name, Score, Ranking),na.last = T),]

then do the removing of duplicated ones with fromLast = FALSE argument:

df <- df[!duplicated(df[1:2],fromLast = FALSE),]

answered Dec 16 '16 at 13:41

submartingale

618
6
11

score 1 · Answer 2 · answered Oct 31 '17 at 13:45

1

Using dplyr

df <- df %>% filter(!duplicated(.[,1:2], fromLast = T))

answered Oct 31 '17 at 13:45

Azam Yahya

436
1
6
6

score 0 · Answer 3 · answered Dec 16 '16 at 13:44

You could just filter out the observations you don't want with which() and then use the unique() function:

a<-unique(c(which(df[,'Score']!="NA"), which(df[,'Ranking']!="NA")))

df2<-unique(df[a,])

> df2
  ID Name Rating Score Ranking
2  1  abc      3    12      13
4  2  bcd      4    19      20

Remove duplicates making sure of NA values R

3 Answers3