1

So I'm trying to remove rows that have missing data in some columns, but not those that have missing data in all columns.

using rowSums alongside !is.na() gave me 1000's of rows of NA at the bottom of my dataset. The top answer here provided a good way of solving my issue using complete.cases:

Remove rows with all or some NAs (missing values) in data.frame

i.e.

data_set1 <- data_set1[complete.cases(data_set1[11:103]), ]

However, that only allows me to remove rows with any missing data in the specified columns. I'm struggling to get complete.cases to play along with rowSums and stop it from removing rows with all missing data.

Any advice very much appreciated!

M.L.
  • 65
  • 5

1 Answers1

2

Try using rowSums like :

cols <- 11:103
vals <- rowSums(is.na(data_set1[cols]))
data_set2 <- data_set1[!(vals > 0 & vals < length(cols)), ]

Or with complete.cases and rowSums

data_set1[complete.cases(data_set1[cols]) | 
         rowSums(is.na(data_set1[cols])) == length(cols) , ]

With reproducible example,

df <- data.frame(a = c(1, 2, 3, NA, 1), b = c(NA, 2, 3, NA, NA), c = 1:5)
cols <- 1:2

vals <- rowSums(is.na(df[cols]))
df[!(vals > 0 & vals < length(cols)), ]

#   a  b c
#2  2  2 2
#3  3  3 3
#4 NA NA 4
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143