Remove rows with missing data in select columns, only if they don't have missing data in all columns (preferably use complete.cases)

Question

So I'm trying to remove rows that have missing data in some columns, but not those that have missing data in all columns.

using rowSums alongside !is.na() gave me 1000's of rows of NA at the bottom of my dataset. The top answer here provided a good way of solving my issue using complete.cases:

Remove rows with all or some NAs (missing values) in data.frame

i.e.

data_set1 <- data_set1[complete.cases(data_set1[11:103]), ]

However, that only allows me to remove rows with any missing data in the specified columns. I'm struggling to get complete.cases to play along with rowSums and stop it from removing rows with all missing data.

Any advice very much appreciated!

Ronak Shah · Accepted Answer · 2019-10-04T13:49:20.640

Try using rowSums like :

cols <- 11:103
vals <- rowSums(is.na(data_set1[cols]))
data_set2 <- data_set1[!(vals > 0 & vals < length(cols)), ]

Or with complete.cases and rowSums

data_set1[complete.cases(data_set1[cols]) | 
         rowSums(is.na(data_set1[cols])) == length(cols) , ]

With reproducible example,

df <- data.frame(a = c(1, 2, 3, NA, 1), b = c(NA, 2, 3, NA, NA), c = 1:5)
cols <- 1:2

vals <- rowSums(is.na(df[cols]))
df[!(vals > 0 & vals < length(cols)), ]

#   a  b c
#2  2  2 2
#3  3  3 3
#4 NA NA 4

Remove rows with missing data in select columns, only if they don't have missing data in all columns (preferably use complete.cases)

1 Answers1