Questions tagged [one-hot-encoding]

One-Hot Encoding is a method to encode categorical variables to numerical data that Machine Learning algorithms can deal with.

Also known as Dummy Encoding, One-Hot Encoding is a method to encode categorical variables, where no such ordinal relationship exists, to numerical data that Machine Learning algorithms can deal with. One hot encoding is the most widespread approach, and it works very well unless your categorical variable takes on a large number of unique values. One hot encoding creates new, binary columns, indicating the presence of each possible value from the original data. These columns store ones and zeros for each row, indicating the categorical value of that row.

951 questions
-3
votes
0 answers

An efficient loop for running one hot encoding for categorical variables

I am trying to lessen my code which involves one hot encoding for categorical variables. I have tried a method but that led to failure. I am trying to store the encoders for the categorical variables in a dictionary and iterate those encoders for…
-3
votes
1 answer

why doesn't the looping works in onehot encoding

for i in data.columns: top_10 = [x for x in data.i.value_counts().sort_values(ascending=False).head(10).index] for label in top_10: data[label] = np.where(data['i'] == label, 1, 0) data[['i'] + top_10] what is the mistake?
-3
votes
1 answer

TypeError: argument must be a string or number on column with strings that are numbers

I have a dataset with categories. In column 4 I have 2 values( two and four which are strings). Do you know why I get the error and how to fix it?TypeError: argument must be a string or number Traceback (most recent call last): File "C:..".py",…
-3
votes
1 answer

Indexer error in Apache Spark (Scala) Code?

This exact code (copied and pasted) from Apache Spark documentation is giving me error (pl see snapshot) import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer} val df = spark.createDataFrame(Seq( (0, "a"), (1, "b"), (2, "c"), (3,…
-3
votes
2 answers

Categorical Data - One-hot encoding

I have a large list of strings. Each string is a different example in the training dataset and contains a list of categories, whereby each category is separated by a comma. Eg. mesh = ['aligator, dog, cat', 'cat, mouse, aligator', ''] Some examples…
scutnex
  • 553
  • 1
  • 6
  • 18
1 2 3
63
64