149

I want to know how to omit NA values in a data frame, but only in some columns I am interested in.

For example,

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))

but I only want to omit the data where y is NA, therefore the result should be

  x  y  z
1 1  0 NA
2 2 10 33

na.omit seems delete all rows contain any NA.

Can somebody help me out of this simple question?

But if now I change the question like:

DF <- data.frame(x = c(1, 2, 3,NA), y = c(1,0, 10, NA), z=c(43,NA, 33, NA))

If I want to omit only x=na or z=na, where can I put the | in function?

John Paul
  • 10,536
  • 6
  • 53
  • 69
user1489975
  • 1,641
  • 2
  • 12
  • 8

8 Answers8

213

Use is.na

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
DF[!is.na(DF$y),]
mnel
  • 105,872
  • 25
  • 248
  • 242
91

You could use the complete.cases function and put it into a function thusly:

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))

completeFun <- function(data, desiredCols) {
  completeVec <- complete.cases(data[, desiredCols])
  return(data[completeVec, ])
}

completeFun(DF, "y")
#   x  y  z
# 1 1  0 NA
# 2 2 10 33

completeFun(DF, c("y", "z"))
#   x  y  z
# 2 2 10 33

EDIT: Only return rows with no NAs

If you want to eliminate all rows with at least one NA in any column, just use the complete.cases function straight up:

DF[complete.cases(DF), ]
#   x  y  z
# 2 2 10 33

Or if completeFun is already ingrained in your workflow ;)

completeFun(DF, names(DF))
BenBarnes
  • 17,996
  • 6
  • 53
  • 70
82

Hadley's tidyr just got this amazing function drop_na

library(tidyr)
DF %>% drop_na(y)
  x  y  z
1 1  0 NA
2 2 10 33
amrrs
  • 5,557
  • 1
  • 12
  • 24
  • 2
    This method also allows you to specify more than one column (for dropping NA values). For instance, one could use DF %>% drop_na(y,z) to remove NA values in both columns, y, and z. – SolingerMuc Sep 23 '20 at 09:59
34

Use 'subset'

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
subset(DF, !is.na(y))
Rnoob
  • 923
  • 10
  • 12
12

It is possible to use na.omit for data.table:

na.omit(data, cols = c("x", "z"))
M--
  • 18,939
  • 7
  • 44
  • 76
Droney
  • 129
  • 1
  • 4
  • 6
    the `cols=` argument is available in the `data.table::na.omit` library. Not the base `stats::na.omit`. – M.Viking Aug 21 '19 at 18:39
5

Omit row if either of two specific columns contain <NA>.

DF[!is.na(DF$x)&!is.na(DF$z),]
M.Viking
  • 1,658
  • 2
  • 8
  • 21
3

Try this:

cc=is.na(DF$y)
m=which(cc==c("TRUE"))
DF=DF[-m,]
rockswap
  • 603
  • 6
  • 17
1

Just try this:

DF %>% t %>% na.omit %>% t

It transposes the data frame and omits null rows which were 'columns' before transposition and then you transpose it back.

M--
  • 18,939
  • 7
  • 44
  • 76
Luchao Qi
  • 51
  • 1
  • 3