OneHotEncoding 2500 different categorical variables

Question

I am working on a flight recommendation project where airport codes of each source will be given along with some data. with that i have to predict the destination to which airplane can reach.

I have to deal with 6+ million rows. so I am facing a problem while oneHotEncoding airport codes, (which in the present dataset are more than 3000). before fitting it into a model. Can anyone suggest how to onehotencode or deal with this kind of problem?

from sklearn.preprocessing import OneHotEncoder
onehotencoder1 = OneHotEncoder()
onehotencoder1.fit(X)
X = onehotencoder1.transform(X)

and i am getting can't allocate 11.3 Gib.

I tried for less data and it's working.

score 0 · Answer 1 · answered May 27 '20 at 13:01

0

Have you tried with pandas? It has a a similar get_dummies function which may work.

answered May 27 '20 at 13:01

mrgou

850
1
8
22

OneHotEncoding 2500 different categorical variables

1 Answers1