Omit rows containing specific column of NA

Question

I want to know how to omit NA values in a data frame, but only in some columns I am interested in.

For example,

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))

but I only want to omit the data where y is NA, therefore the result should be

  x  y  z
1 1  0 NA
2 2 10 33

na.omit seems delete all rows contain any NA.

Can somebody help me out of this simple question?

But if now I change the question like:

DF <- data.frame(x = c(1, 2, 3,NA), y = c(1,0, 10, NA), z=c(43,NA, 33, NA))

If I want to omit only x=na or z=na, where can I put the | in function?

score 213 · Answer 1 · answered Jun 29 '12 at 00:06

213

Use is.na

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
DF[!is.na(DF$y),]

answered Jun 29 '12 at 00:06

mnel

105,872
25
248
242

1

How do you apply this approach greedily on all columns in the data set? If any of the column value is NA skip. So your data set output is the second column only. – Léo Léopold Hertz 준영 Jul 18 '17 at 15:35
3

Use `na.omit` to greedily remove all rows with NA in any column `na.omit(DF)` – M.Viking Aug 21 '19 at 18:50

BenBarnes · Accepted Answer · 2017-07-18T15:49:03.800

91

You could use the complete.cases function and put it into a function thusly:

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))

completeFun <- function(data, desiredCols) {
  completeVec <- complete.cases(data[, desiredCols])
  return(data[completeVec, ])
}

completeFun(DF, "y")
#   x  y  z
# 1 1  0 NA
# 2 2 10 33

completeFun(DF, c("y", "z"))
#   x  y  z
# 2 2 10 33

EDIT: Only return rows with no NAs

If you want to eliminate all rows with at least one NA in any column, just use the complete.cases function straight up:

DF[complete.cases(DF), ]
#   x  y  z
# 2 2 10 33

Or if completeFun is already ingrained in your workflow ;)

completeFun(DF, names(DF))

edited Jul 18 '17 at 15:49

answered Jun 29 '12 at 08:08

BenBarnes

17,996
6
53
70

Can you make your approach greedy? Take all columns that do not have NAs at all. – Léo Léopold Hertz 준영 Jul 18 '17 at 15:33
1

You mean just return *rows* with no `NA`s? Like `completeFun(DF, names(DF))`? – BenBarnes Jul 18 '17 at 15:39
Correct! Please, consider adding it to your answer because it is a common need here. - - I think mnel's answer cannot be expanded as yours. Your function approach is great! – Léo Léopold Hertz 준영 Jul 18 '17 at 15:43
1

Done! Thx for the tip @LéoLéopoldHertz준영 – BenBarnes Jul 18 '17 at 15:50
If you are viewing this past 2020 do yourself a favor and look at the more recent answers given below, for example the approach outlined by @amrrs below using `drop_na()` from `tidyr` does the same thing but is in my opinion a better solution today. – Ricky Oct 18 '20 at 15:38

score 82 · Answer 3 · answered Aug 16 '16 at 18:37

82

Hadley's tidyr just got this amazing function drop_na

library(tidyr)
DF %>% drop_na(y)
  x  y  z
1 1  0 NA
2 2 10 33

answered Aug 16 '16 at 18:37

amrrs

5,557
1
12
24

2

This method also allows you to specify more than one column (for dropping NA values). For instance, one could use DF %>% drop_na(y,z) to remove NA values in both columns, y, and z. – SolingerMuc Sep 23 '20 at 09:59

score 34 · Answer 4 · answered Jun 12 '13 at 19:00

34

Use 'subset'

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
subset(DF, !is.na(y))

answered Jun 12 '13 at 19:00

Rnoob

923
10
12

score 12 · Answer 5 · edited Jun 26 '20 at 16:57

12

It is possible to use na.omit for data.table:

na.omit(data, cols = c("x", "z"))

edited Jun 26 '20 at 16:57

M--

18,939
7
44
76

answered Feb 28 '19 at 11:48

Droney

129
1
4

6

the `cols=` argument is available in the `data.table::na.omit` library. Not the base `stats::na.omit`. – M.Viking Aug 21 '19 at 18:39

score 5 · Answer 6 · answered Aug 21 '19 at 18:44

5

Omit row if either of two specific columns contain <NA>.

DF[!is.na(DF$x)&!is.na(DF$z),]

answered Aug 21 '19 at 18:44

M.Viking

1,658
2
8
21

score 3 · Answer 7 · answered Jun 29 '12 at 01:33

3

Try this:

cc=is.na(DF$y)
m=which(cc==c("TRUE"))
DF=DF[-m,]

answered Jun 29 '12 at 01:33

rockswap

603
6
17

score 1 · Answer 8 · edited Jul 24 '20 at 23:19

1

Just try this:

DF %>% t %>% na.omit %>% t

It transposes the data frame and omits null rows which were 'columns' before transposition and then you transpose it back.

edited Jul 24 '20 at 23:19

M--

18,939
7
44
76

answered Aug 22 '19 at 19:59

Luchao Qi

51
1
3

11

Please explain a bit what is going on. – vonbrand Aug 22 '19 at 20:17

Omit rows containing specific column of NA

8 Answers8

Linked

Related