7

I have a large dataframe that has many rows and columns, and I would like to remove the rows for which at least 1 column is NA / NaN. Below is a small example of the dataframe I am working with:

  team_id athlete_id GP tm_STL tm_TOV   player_WS
1   13304      75047  1      2      8         NaN
2   13304      75048  1      2      8  0.28563827
3   13304      75049  1      2      8         NaN
4   13304      75050  1      2      8         NaN
5   13304      75053  1      2      8  0.03861989
6   13304      75060  1      2      8 -0.15530707

...albeit a bad example because all of the NaNs show up in the last column in this case. i am familiar with the approach of which(is.na(df$column_name)) for getting the rows with NA values from an individual column, but again want to do something like this for rows where at least 1 column in a row of a dataframe has an NA value.

Thanks!

Canovice
  • 4,729
  • 8
  • 39
  • 108

3 Answers3

26

Try using complete.cases.

> df <- data.frame(col1 = c(1, 2, 3, NA, 5), col2 = c('A', 'B', NA, 'C', 'D'),
             col3 = c(9, NaN, 8, 7, 6))
> df
  col1 col2 col3
1    1    A    9
2    2    B  NaN
3    3 <NA>    8
4   NA    C    7
5    5    D    6
> df[complete.cases(df), ]
  col1 col2 col3
1    1    A    9
5    5    D    6
akrun
  • 674,427
  • 24
  • 381
  • 486
Sam
  • 1,153
  • 16
  • 21
11

You can use this.

df[rowSums(is.na(df))==0,]

#  team_id athlete_id GP tm_STL tm_TOV   player_WS
#2   13304      75048  1      2      8  0.28563827
#5   13304      75053  1      2      8  0.03861989
#6   13304      75060  1      2      8 -0.15530707

This way you count the number of NAs per row. You only keep the rows were the sum of non-NAs is zero.

milan
  • 4,106
  • 1
  • 17
  • 32
9

na.omit works:

na.omit(df)
##   team_id athlete_id GP tm_STL tm_TOV   player_WS
## 2   13304      75048  1      2      8  0.28563827
## 5   13304      75053  1      2      8  0.03861989
## 6   13304      75060  1      2      8 -0.15530707

It's a little more convenient than complete.cases if you're piping, as it doesn't require another function to subset like dplyr::filter, magrittr::extract, or [.

alistaire
  • 38,696
  • 4
  • 60
  • 94