Scikit's LabelEncoder uses `numpy.int64` instead of integers in `inverse_transform`

Question

If you fit an sklearn.preprocessing.LabelEncoder with labels of type int, for some reason during inverse_transform it returns numpy.int64 type labels.

from sklearn.preprocessing import LabelEncoder
labels = [2,4,6] # just a list of `int`s
e = LabelEncoder().fit(labels)
encoded = e.transform([4,6,2])
decoded = e.inverse_transform(encoded)
type(decoded[0])
# returns  <class 'numpy.int64'>

So I guess I have 2 questions

Why would it do that?
How can someone avoid that without custom code?

(I fell on this problem when Flask's jsonify could not marshal np.int64 to JSON)

score 2 · Accepted Answer · answered Jun 28 '19 at 07:46

Why would it do that?

Because transform and inverse_transform return numpy arrays and

An item extracted from an array, e.g., by indexing, will be a Python object whose type is the scalar type associated with the data type of the array.

In this case the scalar type is int64.

How can someone avoid that without custom code?

If you need to get a single element, use decoded.item(0). If you need the entire array, use decoded.tolist(). See Converting numpy dtypes to native python types for more.

I just realized I never took the time to thank you for this concise and informative answer. Thanks — cmantas, Aug 27 '19 at 07:39

Scikit's LabelEncoder uses `numpy.int64` instead of integers in `inverse_transform`

1 Answers1