I have created a prediction model for this dataset
>>df.head()
Service Tasks Difficulty Hours
0 ABC 24 1 0.833333
1 CDE 77 1 1.750000
2 SDE 90 3 3.166667
3 QWE 47 1 1.083333
4 ASD 26 3 1.000000
>>df.shape
(998,4)
>>X = df.iloc[:,:-1]
>>y = df.iloc[:,-1].values
>>from sklearn.compose import ColumnTransformer
>>ct = ColumnTransformer([("cat", OneHotEncoder(),[0])], remainder="passthrough")
>>X = ct.fit_transform(X)
>>x = X.toarray()
>>x = x[:,1:]
>>x.shape
(998,339)
>>from sklearn.ensemble import RandomForestRegressor
>>rf_model = RandomForestRegressor(random_state = 1)
>>rf_model.fit(x,y)
How can I use this model to predict Hours
for user input in this format [["SDE", 90, 3]]
I tried
>>test_input = [["SDE", 90, 3]]
>>test_input = ct.fit_transform(test_input)
>>test_input = test_input[[:,1:]
>>test_input[0]
array([24, 1], dtype=object)
>>predict_hours = rf_model.predict(test_input)
ValueError
Since my dataset has many categorical
values its not possible enter the encoded value of "SDE"
as input, I need to convert "SDE"
to onehot encoded
format after receiving the input [["SDE", 90, 3]]
I don't know how to do it can anyone help?