Here's a simple solution to one-hot-encode your category using no packages.
Solution
model.matrix(~0+category)
It needs your categorical variable to be a factor. The factor levels must be the same in your training and test data, check with levels(train$category)
and levels(test$category)
. It doesn't matter if some levels don't occur in your test set.
Example
Here's an example using the iris dataset.
data(iris)
#Split into train and test sets.
train <- sample(1:nrow(iris),100)
test <- -1*train
iris[test,]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
34 5.5 4.2 1.4 0.2 setosa
106 7.6 3.0 6.6 2.1 virginica
112 6.4 2.7 5.3 1.9 virginica
127 6.2 2.8 4.8 1.8 virginica
132 7.9 3.8 6.4 2.0 virginica
model.matrix()
creates a column for each level of the factor, even if it is not present in the data. Zero indicates it is not that level, one indicates it is. Adding the zero specifies that you do not want an intercept or reference level and is equivalent to -1.
oh_train <- model.matrix(~0+iris[train,'Species'])
oh_test <- model.matrix(~0+iris[test,'Species'])
#Renaming the columns to be more concise.
attr(oh_test, "dimnames")[[2]] <- levels(iris$Species)
setosa versicolor virginica
1 1 0 0
2 0 0 1
3 0 0 1
4 0 0 1
5 0 0 1
P.S.
It's generally preferable to include all categories in training and test data. But that's none of my business.