1

If you fit an sklearn.preprocessing.LabelEncoder with labels of type int, for some reason during inverse_transform it returns numpy.int64 type labels.

from sklearn.preprocessing import LabelEncoder
labels = [2,4,6] # just a list of `int`s
e = LabelEncoder().fit(labels)
encoded = e.transform([4,6,2])
decoded = e.inverse_transform(encoded)
type(decoded[0])
# returns  <class 'numpy.int64'>

So I guess I have 2 questions

  1. Why would it do that?
  2. How can someone avoid that without custom code?

(I fell on this problem when Flask's jsonify could not marshal np.int64 to JSON)

Vadim Kotov
  • 7,103
  • 8
  • 44
  • 57
cmantas
  • 1,290
  • 13
  • 14

1 Answers1

2

Why would it do that?

Because transform and inverse_transform return numpy arrays and

An item extracted from an array, e.g., by indexing, will be a Python object whose type is the scalar type associated with the data type of the array.

In this case the scalar type is int64.

How can someone avoid that without custom code?

If you need to get a single element, use decoded.item(0). If you need the entire array, use decoded.tolist(). See Converting numpy dtypes to native python types for more.

Alexey Romanov
  • 154,018
  • 31
  • 276
  • 433
  • I just realized I never took the time to thank you for this concise and informative answer. Thanks – cmantas Aug 27 '19 at 07:39