I've got the following data:
dataset <- structure(list(id = structure(c(2L, 3L, 1L, 3L, 1L, 9L), .Label = c("215101",
"215559", "216566", "217284", "219435", "220209", "220249", "220250",
"225678", "225679", "225687", "225869", "228420", "228435", "230621",
"230623", "233063", "233097", "233098", "235546", "235560", "235567",
"236379"), class = "factor"), cat1 = c("A", "B", "B", "A", "A",
"A"), cat2 = c("item 1", "item 1", "item 2", "item 5", "item 3",
"item 28"), cat3 = c("theme 2", "theme 2", "theme 1", "theme 4",
"theme 10", "theme 40")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L))
I would like to create kind of model matrix with one hot encoded columns features created from columns cat2
and cat3
. Therefore, my output would look like this:
structure(list(id = structure(c(1L, 1L, 2L, 3L, 3L, 9L), .Label = c("215101",
"215559", "216566", "217284", "219435", "220209", "220249", "220250",
"225678", "225679", "225687", "225869", "228420", "228435", "230621",
"230623", "233063", "233097", "233098", "235546", "235560", "235567",
"236379"), class = "factor"), cat1 = c("A", "B", "A", "A", "B",
"A"), `item 1` = c(0, 0, 1, 0, 1, 0), `item 2` = c(0, 1, 0, 0,
0, 0), `item 28` = c(0, 0, 0, 0, 0, 1), `item 3` = c(1, 0, 0,
0, 0, 0), `item 5` = c(0, 0, 0, 1, 0, 0), `theme 1` = c(0, 1,
0, 0, 0, 0), `theme 10` = c(1, 0, 0, 0, 0, 0), `theme 2` = c(0,
0, 1, 0, 1, 0), `theme 4` = c(0, 0, 0, 1, 0, 0), `theme 40` = c(0,
0, 0, 0, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))
However, I don't have my independent variable in this dataset and I would like to preserve id
and cat1
columns. How can I do that?