-2

I am working on a flight recommendation project where airport codes of each source will be given along with some data. with that i have to predict the destination to which airplane can reach.

I have to deal with 6+ million rows. so I am facing a problem while oneHotEncoding airport codes, (which in the present dataset are more than 3000). before fitting it into a model. Can anyone suggest how to onehotencode or deal with this kind of problem?

from sklearn.preprocessing import OneHotEncoder
onehotencoder1 = OneHotEncoder()
onehotencoder1.fit(X)
X = onehotencoder1.transform(X)

and i am getting can't allocate 11.3 Gib.

I tried for less data and it's working.

James Z
  • 11,838
  • 10
  • 25
  • 41
jdlskldk
  • 1
  • 1

1 Answers1

0

Have you tried with pandas? It has a a similar get_dummies function which may work.

mrgou
  • 850
  • 1
  • 8
  • 22