5

Here's a trivial example of what I'm trying to do:

iris %>%
  mutate(Species2 = ifelse(Species %in% c("setosa", "virginica"), "other", as.character(Species)) %>% as.factor) %>%
  str
# 'data.frame': 150 obs. of  6 variables:
#   $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# $ Species2    : Factor w/ 2 levels "Other","versicolor": 1 1 1 1 1 1 1 1 1 1 ...

However, if I want to do multiple merges, I'd end up with deeply nested ifelse statements, which I'm trying to avoid. What's the most elegant way to do this? Preferably I can incorporate the solution into a dplyr pipeline.

kevinykuo
  • 4,050
  • 4
  • 20
  • 29
  • 1
    The base R method would be to change the levels after checking the `levels(iris$Species)`, i.e. `levels(iris$Species) – akrun Mar 18 '15 at 21:23
  • 2
    so after some more research `plyr::revalue` seems to fit the bill, but if I'm combining many levels into one there's a bit of redundant typing, so i'm going to investigate coming up with a wrapper that takes a named list... – kevinykuo Mar 18 '15 at 21:47
  • 2
    I think you should put your classification rules into a data.frame, `data.frame(Species=c("setosa", "virginica"),Species2="other")` and merge it in. You mention "multiple merges", so maybe that is already what you meant... – Frank Mar 18 '15 at 21:51
  • 1
    Expanding on akrun's comment, you could use a "list" in `"levels – alexis_laz Mar 18 '15 at 22:16
  • also see `car::recode()`, although its interface is a little clunky – Ben Bolker Mar 18 '15 at 22:41

2 Answers2

1

You can use match:

species.keep <- c("setosa", "virginica", "other")
iris %>% mutate(Species2 = species.keep[match(Species, species.keep, nomatch=3)])

We use the nomatch argument to match to map to "other" at the last position of our species.keep vector for any species that are not in previous positions. Note this assumes "other" is not a valid species. You'll have to add the as.factor etc., but this should get to what you want. match is the baseline mapping function in R.

BrodieG
  • 48,306
  • 7
  • 80
  • 131
0

If you need to populate the initial array with the possible matches, probably you will need to use something like sapply. Then you can use that array to populate Species2:

s <- sapply(levels(iris$Species), 
            function(x) {
                         if (x %in% c("setosa", "virginica")) 
                           x = "Other" 
                         else 
                           x = x
                        }, 
            simplify = F) 

iris %>% 
  mutate(Species2 = (as.character(s[Species])) %>% as.factor) %>%
  str
mucio
  • 6,700
  • 1
  • 17
  • 30