Removing NA in dplyr pipe

Question

I tried to remove NA's from the subset using dplyr piping. Is my answer an indication of a missed step. I'm trying to learn how to write functions using dplyr:

> outcome.df%>%
+ group_by(Hospital,State)%>%
+ arrange(desc(HeartAttackDeath,na.rm=TRUE))%>%
+ head()
Source: local data frame [6 x 5]
Groups: Hospital, State

                           Hospital State HeartAttackDeath
1     ABBEVILLE AREA MEDICAL CENTER    SC               NA
2        ABBEVILLE GENERAL HOSPITAL    LA               NA
3      ABBOTT NORTHWESTERN HOSPITAL    MN             12.3
4   ABILENE REGIONAL MEDICAL CENTER    TX             17.2
5        ABINGTON MEMORIAL HOSPITAL    PA             14.3
6 ABRAHAM LINCOLN MEMORIAL HOSPITAL    IL               NA
Variables not shown: HeartFailureDeath (dbl), PneumoniaDeath
  (dbl)

I think you have the wrong library there. Where is the data? — Rich Scriven, Oct 30 '14 at 23:47
There is also http://stackoverflow.com/questions/22353633/filter-for-complete-cases-in-data-frame-using-dplyr-case-wise-deletion/37031161#37031161 which answeres the same question. — Jan Katins, May 04 '16 at 17:00

Gregor Thomas · Accepted Answer · 2019-03-07T13:07:52.180

149

I don't think desc takes an na.rm argument... I'm actually surprised it doesn't throw an error when you give it one. If you just want to remove NAs, use na.omit (base) or tidyr::drop_na:

outcome.df %>%
  na.omit() %>%
  group_by(Hospital, State) %>%
  arrange(desc(HeartAttackDeath)) %>%
  head()

library(tidyr)
outcome.df %>%
  drop_na() %>%
  group_by(Hospital, State) %>%
  arrange(desc(HeartAttackDeath)) %>%
  head()

If you only want to remove NAs from the HeartAttackDeath column, filter with is.na, or use tidyr::drop_na:

outcome.df %>%
  filter(!is.na(HeartAttackDeath)) %>%
  group_by(Hospital, State) %>%
  arrange(desc(HeartAttackDeath)) %>%
  head()

outcome.df %>%
  drop_na(HeartAttackDeath) %>%
  group_by(Hospital, State) %>%
  arrange(desc(HeartAttackDeath)) %>%
  head()

As pointed out at the dupe, complete.cases can also be used, but it's a bit trickier to put in a chain because it takes a data frame as an argument but returns an index vector. So you could use it like this:

outcome.df %>%
  filter(complete.cases(.)) %>%
  group_by(Hospital, State) %>%
  arrange(desc(HeartAttackDeath)) %>%
  head()

edited Mar 07 '19 at 13:07

answered Oct 31 '14 at 00:04

Gregor Thomas

104,719
16
140
257

Thanks much. I used na.omit for all columns and it worked. outcome.df is a subset of large dataset. I'm trying to rank the conditions in order from best to worst. – ITCoderWhiz Nov 01 '14 at 12:23
When I am using na.omit in this manner it throws `Error in na.omit.default() argument "object" is missing, with no default` even if I feed it hflights. Same behavior with !is.na(hflights) at the second stage of the pipe...@ITCoderWhiz – d8aninja Feb 28 '15 at 01:40
@D8Amonk sounds like you have some function masking going on. From a fresh R session `library(dplyr); library(hflights); x = hflights %>% na.omit()` works just fine. Maybe you have loaded a package that has it's own `na.omit` function? – Gregor Thomas Feb 28 '15 at 02:07

Removing NA in dplyr pipe

1 Answers1

Linked