1

Possible duplicates:

Remove rows with NAs (missing values) in data.frame

How to remove "rows" with a NA value?

I have a nested data frame with some 400 files of approx 65k rows. I want to remove all the rows with NA's from the nested data frame df. I tried na.omit(df) but seem not to work. I am not sure if i am missing something here. Thanks.

 df

[[1]]
    V1              V2           V3
1   ID SignalIntensity          SNR
2  109    6.1823089314 0.8453576915
3  110   10.1727771385 4.3837077591
4  111    7.2922746927           NA
5  112    8.8984671629 2.3192184908
6  113              NA 3.7133402249
7  114    7.9850187685 1.5008899345
8  116    7.7893230124           NA
9  117    7.1948346495  1.134973824
10 118    6.5727729751 0.9041846475
11 119              NA 0.7098581049
12 120    9.3711264685 2.9968456969
13 121    6.1549436434 0.7777584058


[[2]]
    V1              V2           V3
1   ID SignalIntensity          SNR
2  118    6.5727729751 0.9041846475
3  119    5.3775194293           NA
4  120    9.3711264685 2.9968456969
5  121    6.1549436434 0.7777584058
6  123    5.7974462402 0.7235424803
7  124              NA 0.7019574482
8  125    7.0145371807  0.343334334
9  126    6.0891591319  0.797164982
10 127    6.3148197657 0.7845943688

[[3]]
    V1              V2           V3
1   ID SignalIntensity          SNR
2  109    6.1823089314 0.8453576915
3  110   10.1727771385 4.3837077591
4  111    7.2922746927 1.0725751161
5  112    8.8984671629           NA
6  113    9.5910338232 3.7133402249
7  114    7.9850187685 1.5008899345
8  116    7.7893230124 1.3636655582
9  117    7.1948346495           NA
10 118    6.5727729751 0.9041846475
11 119    5.3775194293 0.7098581049
12 120    9.3711264685 2.9968456969

My final data should look like this.

 df
[[1]]
    V1              V2           V3
1   ID SignalIntensity          SNR
2  109    6.1823089314 0.8453576915
3  110   10.1727771385 4.3837077591
5  112    8.8984671629 2.3192184908
6  113    9.5910338232 3.7133402249
7  114    7.9850187685 1.5008899345
9  117    7.1948346495  1.134973824
10 118    6.5727729751 0.9041846475
12 120    9.3711264685 2.9968456969
13 121    6.1549436434 0.7777584058


[[2]]
    V1              V2           V3
1   ID SignalIntensity          SNR
2  118    6.5727729751 0.9041846475
4  120    9.3711264685 2.9968456969
5  121    6.1549436434 0.7777584058
6  123    5.7974462402 0.7235424803
8  125    7.0145371807  0.343334334
9  126    6.0891591319  0.797164982
10 127    6.3148197657 0.7845943688

[[3]]
    V1              V2           V3
1   ID SignalIntensity          SNR
2  109    6.1823089314 0.8453576915
3  110   10.1727771385 4.3837077591
4  111    7.2922746927 1.0725751161
6  113    9.5910338232 3.7133402249
7  114    7.9850187685 1.5008899345
8  116    7.7893230124 1.3636655582
9  117    7.1948346495  1.134973824
10 118    6.5727729751 0.9041846475
11 119    5.3775194293 0.7098581049
12 120    9.3711264685 2.9968456969
Community
  • 1
  • 1
Agaz Wani
  • 4,728
  • 7
  • 38
  • 56

2 Answers2

7

The df is a list of 'data.frames'. So, you can use lapply

lapply(df, na.omit)

Another thing observed is the 1st row in the list of dataframe is 'character'. I am assuming that you used read.table with header=FALSE, while the header was actually there. May be, you need to read the files again using

files <- list.files #if all the files are in the working directory
lst <- lapply(files, read.table, header=TRUE, stringsAsFactors=FALSE)
lapply(lst, na.omit)
akrun
  • 674,427
  • 24
  • 381
  • 486
1

the purrr library can also be used with na.omit

map(df, na.omit)

An alternative is to create an anonymous function within map, as

map(df, ~(.x %>% filter(complete.cases(.))))

This last iteration could be useful if you'd like to preserve the NA records for investigation later (a good practice). Simply add the not ! qualifier for any row with a NA, as:

map(df, ~(.x %>% filter(!complete.cases(.))))

Finally, you may consider replacing all NAs with 0s, if your prime concern is doing calculations:

map(df, ~replace(., is.na(.), 0) )
Nettle
  • 2,385
  • 18
  • 22