R - check if NA exists in any column of r dataframe row, then if so remove that row

Question

I have a large dataframe that has many rows and columns, and I would like to remove the rows for which at least 1 column is NA / NaN. Below is a small example of the dataframe I am working with:

  team_id athlete_id GP tm_STL tm_TOV   player_WS
1   13304      75047  1      2      8         NaN
2   13304      75048  1      2      8  0.28563827
3   13304      75049  1      2      8         NaN
4   13304      75050  1      2      8         NaN
5   13304      75053  1      2      8  0.03861989
6   13304      75060  1      2      8 -0.15530707

...albeit a bad example because all of the NaNs show up in the last column in this case. i am familiar with the approach of which(is.na(df$column_name)) for getting the rows with NA values from an individual column, but again want to do something like this for rows where at least 1 column in a row of a dataframe has an NA value.

Thanks!

`na.omit` or `complete.cases` – alistaire Aug 12 '16 at 18:06 — alistaire, Aug 12 '16 at 18:06

score 26 · Accepted Answer · edited Aug 13 '16 at 02:18

26

Try using complete.cases.

> df <- data.frame(col1 = c(1, 2, 3, NA, 5), col2 = c('A', 'B', NA, 'C', 'D'),
             col3 = c(9, NaN, 8, 7, 6))
> df
  col1 col2 col3
1    1    A    9
2    2    B  NaN
3    3 <NA>    8
4   NA    C    7
5    5    D    6
> df[complete.cases(df), ]
  col1 col2 col3
1    1    A    9
5    5    D    6

edited Aug 13 '16 at 02:18

akrun

674,427
24
381
486

answered Aug 12 '16 at 18:06

Sam

1,153
16
21

The `complete.cases` should be faster than the rest – akrun Aug 13 '16 at 02:18

score 11 · Answer 2 · answered Aug 12 '16 at 18:11

You can use this.

df[rowSums(is.na(df))==0,]

#  team_id athlete_id GP tm_STL tm_TOV   player_WS
#2   13304      75048  1      2      8  0.28563827
#5   13304      75053  1      2      8  0.03861989
#6   13304      75060  1      2      8 -0.15530707

This way you count the number of NAs per row. You only keep the rows were the sum of non-NAs is zero.

alistaire · Answer 3 · 2016-08-12T18:13:24.520

9

na.omit works:

na.omit(df)
##   team_id athlete_id GP tm_STL tm_TOV   player_WS
## 2   13304      75048  1      2      8  0.28563827
## 5   13304      75053  1      2      8  0.03861989
## 6   13304      75060  1      2      8 -0.15530707

It's a little more convenient than complete.cases if you're piping, as it doesn't require another function to subset like dplyr::filter, magrittr::extract, or [.

edited Aug 12 '16 at 18:13

answered Aug 12 '16 at 18:08

alistaire

38,696
4
60
94

R - check if NA exists in any column of r dataframe row, then if so remove that row

3 Answers3