4

I'm trying to filter out a bunch of data using the filter command from the dplyr package. Everything appears to be going exactly as I would hope, but when I try to draw some charts off of the new filtered data, all of the levels that I filtered out are showing up (albeit with no values). But the fact that they are there is still throwing off my horizontal axis.

So two questions:

1) Why are these filtered levels still in the data?

2) How do I filter to make these no longer present?

Here is a small example you can run to see what I am talking about:

library(dplyr)
library(ggvis)

# small example frame
data <- data.frame(
  x = c(1:10),
  y = rep(c("yes", "no"), 5)
)

# filtering to only include data with "yes" in y variable
new_data <- data %>%
  filter(y == "yes")

levels(new_data) ## Why is "no" showing up as a level for this if I've filtered that out?

# Illustration of the filtered values still showing up on axis
new_data %>%
  ggvis(~y, ~x) %>%
  layer_bars()

Thanks for any help.

Nathan F
  • 65
  • 1
  • 6
  • 1
    Related: http://stackoverflow.com/questions/1195826/drop-factor-levels-in-a-subsetted-data-frame/4284931#4284931. – Henrik Aug 17 '15 at 21:08

1 Answers1

9

Factors in R do not automatically drop levels when filtered. You may think this is a silly default (I do), but it's easy to deal with -- just use the droplevels function on the result.

new_data <- data %>%
  filter(y == "yes") %>%
  droplevels
levels(new_data$y)
## [1] "yes"

If you did this all the time you could define a new function

dfilter <- function(...) droplevels(filter(...))
Ben Bolker
  • 173,430
  • 21
  • 312
  • 389