My data consists of 50 columns and most of them are strings. I have a single multi-class variable which I have to predict. I tried using LabelEncoder in scikit-learn to convert the features (not classes) into whole numbers and feed them as input to the RandomForest model I am using. I am using RandomForest for classification.
Now, when new test data comes (stream of new data), for each column, how will I know what the label for each string will be since using LabelEncoder now will give me a new label independent of the labels I generated before. Am, I doing this wrong? Is there anything else I should use for consistent encoding?