-2

I am using a dataframe called "rawData" which has a column called "Season" with values ranging from 1 to 4. I am trying to use a loop to perform one-hot-encoding, i.e create 4 new columns called "Season 1" , "Season 2", "Season 3", "Season 4", where each column has a binary indicator value of 1/0 showing whether the Season in the column name is occurring for each data-point. So far I have tried this:

for (i in 1:4){
text<-paste("Season", toString(i), sep = " ")
if (rawData$season==i) {
rawData$text<-1
}
}

However, I am just getting an additional column in my dataframe called "text" with all values =1. I understand why R is doing this, but I can not figure out an alternative way to make it do what I want. I tried changing the if-then statement to change "rawData$text" to "rawData$paste("Season", toString(i), sep = " ")<-1" but that is giving me an error

stats_nerd
  • 183
  • 11

2 Answers2

1
df <- data.frame(
  group = c('A', 'A', 'A', 'A', 'A', 'B', 'C'),
  student = c('01', '01', '01', '02', '02', '01', '02'),
  exam_pass = c('Y', 'N', 'Y', 'N', 'Y', 'Y', 'N'),
  subject = c('Math', 'Science', 'Japanese', 'Math', 'Science', 'Japanese', 'Math')
)

library(dummy)
library(dummies)

df1 <- dummy.data.frame(df, names=c("subject"), sep="_") 

This reproducible sample code will help you to do one hot encoding without using for loop.

Example provided by you also works for the same

df1 <- data.frame(seasons = c(1,3,2,4,3,4,1,1,1))

library(dummy)
library(dummies)

df2 <- dummy.data.frame(df1, names=c("seasons"), sep="_") 
Hunaidkhan
  • 1,372
  • 1
  • 7
  • 19
  • When I tried doing this I get the following error message: "Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list?" It is probably because my datagrame column "seasons" has numeric values as opposed to strings like you seem to. Do you know what I can change to fix this? Thank you – stats_nerd Nov 28 '18 at 05:23
  • have a look at the updated answer i have added your seasons data. Code is working fine – Hunaidkhan Nov 28 '18 at 05:56
0

Someone else just showed me how to do it:

df <- data.frame(seasons = c(1,3,2,4,3,4,1,1,1))
for(i in unique(df$seasons)) {
  df[[paste0("season_",i)]] <- ifelse(df$seasons==i,1,0)
}
stats_nerd
  • 183
  • 11