tidyr is a package for data reshaping. Here, we'll use pivot_longer()
to put it into a long format, where the type names (Type1, Type2) will reside in column "name", while the values (Grass, Poison, etc.) will reside in column "value". We
filter out rows with is.na(value)
because that means the pokemon did not have a second type. We create an indicator variable -- this gets a 1. Each pokemon will then have indicator == 1
for the types it has. We drop the now extraneous "name" column, and use pivot_wider()
to transform each unique value in value
into its own column, which will receive indicator
's value as the cell value for each row. Finally, we mutate on all numeric columns to replace missings with 0, since we know those pokemon aren't those types.
A better solution than mutate_if(is.numeric, ...)
would be to compute the unique values of types and use mutate_at(vars(pokemon_types), ...
. This would not affect other numeric columns unintentionally.
library(tidyr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
pokemon <- tibble(ID = c(1,2), Name = c("Bulbasaur", "Squirtle"),
Type1 = c("Grass", "Water"),
Type2 = c("Poison", NA),
HP = c(40, 50))
pokemon %>% pivot_longer(
starts_with("Type")
) %>%
filter(!is.na(value)) %>%
mutate(indicator = 1) %>%
select(-name) %>%
pivot_wider(names_from = value, values_from = indicator,
) %>%
mutate_if(is.numeric, .funs = function(x) if_else(is.na(x), 0, x))
#> # A tibble: 2 x 6
#> ID Name HP Grass Poison Water
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 Bulbasaur 40 1 1 0
#> 2 2 Squirtle 50 0 0 1