0

Hello I am using the following function to convert all categorical values in a dataset into a numerical, but I want to convert this to use OneHotEncoder, How can do it?

def categorical_to_numerical(dataframe):
    for col in dataframe.columns:
        if str(dataframe[col].dtype) == 'category':
            dataframe[col] = dataframe[col].astype("category").cat.codes
    return dataframe

Thanks

Tlaloc-ES
  • 2,560
  • 4
  • 17
  • 40

1 Answers1

1

If I understand you correctly, you want to use DataFrame.select_dtypes, to select the object (string) columns.

# example dataframe
df = pd.DataFrame({'col1':[1,2,3],
                   'col2':['a','b','a'],
                   'col3':[4,5,6],
                   'col4':['aaa', 'bbb', 'bbb']})

   col1 col2  col3 col4
0     1    a     4  aaa
1     2    b     5  bbb
2     3    a     6  bbb
for col in df.select_dtypes('object'):
    df[col] = df[col].astype('category').cat.codes

   col1  col2  col3  col4
0     1     0     4     0
1     2     1     5     1
2     3     0     6     1

Or if you want to actually OneHotEncode, we can use pd.get_dummies:

df = pd.get_dummies(df)

   col1  col3  col2_a  col2_b  col4_aaa  col4_bbb
0     1     4       1       0         1         0
1     2     5       0       1         0         1
2     3     6       1       0         0         1
Erfan
  • 31,924
  • 5
  • 41
  • 51