I am working on a flight recommendation project where airport codes of each source will be given along with some data. with that i have to predict the destination to which airplane can reach.
I have to deal with 6+ million rows. so I am facing a problem while oneHotEncoding airport codes, (which in the present dataset are more than 3000). before fitting it into a model. Can anyone suggest how to onehotencode or deal with this kind of problem?
from sklearn.preprocessing import OneHotEncoder
onehotencoder1 = OneHotEncoder()
onehotencoder1.fit(X)
X = onehotencoder1.transform(X)
and i am getting can't allocate 11.3 Gib.
I tried for less data and it's working.