Exclude Blank and NA in R

Question

Possible Duplicate:
R - remove rows with NAs in data.frame

I have a dataframe named sub.new with multiple columns in it. And I'm trying to exclude any cell containing NA or a blank space "".
I tried to use subset(), but it's targeting specific column conditional. Is there anyway to scan through the whole dataframe and create a subset that no cell is either NA or blank space ?

In the example below, only the first line should be kept:

# ID               SNP             ILMN_Strand   Customer_Strand
ID1234              [A/G]          TOP           BOT
Non-Specific        NSB (Bgnd)     Green
Non-Polymorphic     NP (A)         Red
Non-Polymorphic     NP (T)         Purple
Non-Polymorphic     NP (C)         Green
Non-Polymorphic     NP (G)         Blue
Restoration         Restore        Green

Any suggestions? Thanks

what didnt work out about them? can you give us some example data? — Chase, Oct 06 '12 at 21:13
Are you working with vectors? Dataframes? If dataframes what do you want should happen if only one element in the row is blank or NA? Please provide more details. — Dason, Oct 06 '12 at 21:13
@user1301840: Googling "R remove rows with NA", or searching StackOverflow with that phrase, both give the above question as the top result. — David Robinson, Oct 06 '12 at 21:21
[This question](http://stackoverflow.com/questions/4862178/remove-rows-with-nas-in-data-frame) has some good answers on additional ways to remove `NA`s from the dataset. — Bas, Mar 23 '16 at 07:21

score 58 · Accepted Answer · edited Oct 06 '12 at 21:44

58

A good idea is to set all of the "" (blank cells) to NA before any further analysis.

If you are reading your input from a file, it is a good choice to cast all "" to NAs:

foo <- read.table(file="Your_file.txt", na.strings=c("", "NA"), sep="\t") # if your file is tab delimited

If you have already your table loaded, you can act as follows:

foo[foo==""] <- NA

Then to keep only rows with no NA you may just use na.omit():

foo <- na.omit(foo)

Or to keep columns with no NA:

foo <- foo[, colSums(is.na(foo)) == 0]

edited Oct 06 '12 at 21:44

Andrej

3,354
8
35
66

answered Oct 06 '12 at 21:29

Ali

8,590
11
55
86

It does the trick, but it removed all the rows. I guess I still have to narrow down to the columns that I want to do the NA check. Thanks for the help – lusketeer Oct 06 '12 at 21:48
1

Thanks for this answer! Indeed above question has been asked before but if one has blank cells as opposed to NA ones then this is very useful. – Simone Apr 29 '16 at 18:19
Can you please specify what is `na.omit`? There is no package `na` in R 3.1.1. – Léo Léopold Hertz 준영 Oct 30 '16 at 10:45
@LéoLéopoldHertz준영, `na.omit()` is a function in the base package of R. Periods don't denote functions in R as they do in Python. – Matt Oct 04 '17 at 16:45
With dplyr, it's possible to write this in a short and readable way like this: `foo %>% na_if("") %>% na.omit` – Agile Bean Aug 13 '19 at 09:59

score 11 · Answer 2 · answered Oct 06 '12 at 21:18

11

Don't know exactly what kind of dataset you have, so I provide general answer.

x <- c(1,2,NA,3,4,5)
y <- c(1,2,3,NA,6,8)
my.data <- data.frame(x, y)
> my.data
   x  y
1  1  1
2  2  2
3 NA  3
4  3 NA
5  4  6
6  5  8
# Exclude rows with NA values
my.data[complete.cases(my.data),]
  x y
1 1 1
2 2 2
5 4 6
6 5 8

answered Oct 06 '12 at 21:18

Andrej

3,354
8
35
66

does this exclude "" as well? – lusketeer Oct 06 '12 at 21:20
No. I suggest you first define what kind of NA values you have in your dataset. – Andrej Oct 06 '12 at 21:22
3

You could first convert "" values to NA by doing something similar to this `my.data[my.data == ""] – Dason Oct 06 '12 at 21:23
@Andrej I don't quite understand "what kind of NA values", all it displays just NA. – lusketeer Oct 06 '12 at 21:25
You have to recode all "" occurrences to NA, as @Dason suggest – Andrej Oct 06 '12 at 21:27
thanks for all the help, I guess I'm good from here now. – lusketeer Oct 06 '12 at 21:48

Exclude Blank and NA in R

2 Answers2

Linked