288

If I have a numpy dtype, how do I automatically convert it to its closest python data type? For example,

numpy.float32 -> "python float"
numpy.float64 -> "python float"
numpy.uint32  -> "python int"
numpy.int16   -> "python int"

I could try to come up with a mapping of all of these cases, but does numpy provide some automatic way of converting its dtypes into the closest possible native python types? This mapping need not be exhaustive, but it should convert the common dtypes that have a close python analog. I think this already happens somewhere in numpy.

conradlee
  • 10,743
  • 15
  • 46
  • 81

13 Answers13

400

Use val.item() to convert most NumPy values to a native Python type:

import numpy as np

# for example, numpy.float32 -> python float
val = np.float32(0)
pyval = val.item()
print(type(pyval))         # <class 'float'>

# and similar...
type(np.float64(0).item()) # <class 'float'>
type(np.uint32(0).item())  # <class 'int'>
type(np.int16(0).item())   # <class 'int'>
type(np.cfloat(0).item())  # <class 'complex'>
type(np.datetime64(0, 'D').item())  # <class 'datetime.date'>
type(np.datetime64('2001-01-01 00:00:00').item())  # <class 'datetime.datetime'>
type(np.timedelta64(0, 'D').item()) # <class 'datetime.timedelta'>
...

(Another method is np.asscalar(val), however it is deprecated since NumPy 1.16).


For the curious, to build a table of conversions of NumPy array scalars for your system:

for name in dir(np):
    obj = getattr(np, name)
    if hasattr(obj, 'dtype'):
        try:
            if 'time' in name:
                npn = obj(0, 'D')
            else:
                npn = obj(0)
            nat = npn.item()
            print('{0} ({1!r}) -> {2}'.format(name, npn.dtype.char, type(nat)))
        except:
            pass

There are a few NumPy types that have no native Python equivalent on some systems, including: clongdouble, clongfloat, complex192, complex256, float128, longcomplex, longdouble and longfloat. These need to be converted to their nearest NumPy equivalent before using .item().

Mike T
  • 34,456
  • 15
  • 128
  • 169
  • I am using pandas (0.23.0). At least for that version, np.str doesn't have the .item() method so the only way I saw was to wrap .item() inside a try block. – Robert Lugg Jan 08 '19 at 19:51
  • 4
    @RobertLugg `np.str` is not a Numpy type, i.e. `np.str is str`, so it's just an alias to a standard Python type. Same with `np.float`, `np.int`,`np.bool`, `np.complex`, and `np.object`. The Numpy types have a trailing `_`, e.g. `np.str_`. – Mike T Jan 08 '19 at 20:28
  • 2
    I understand. So the issue is "it would be nice if" I could do: `np.float64(0).item()` and also `np.float(0).item()`. In other words, for the cases where it is known what to do, support the `.item()` method even if it simply returns the same value. That way I could apply `.item()` on far more numpy scalars without special casing. As it is, seemingly parallel concepts differ due to underlying implementation. I totally understand why this was done. But it is an annoyance to the library user. – Robert Lugg Jan 09 '19 at 20:47
52

found myself having mixed set of numpy types and standard python. as all numpy types derive from numpy.generic, here's how you can convert everything to python standard types:

if isinstance(obj, numpy.generic):
    return numpy.asscalar(obj)
tm_lv
  • 5,972
  • 5
  • 22
  • 16
  • 8
    As [the accepted answer notes](https://stackoverflow.com/a/11389998/2809027), **NumPy 1.16 deprecated the `np.asscalar()` method.** Why? Probably for no discernibly good reason. Despite a decade of relative stability, the NumPy API is now an unstable moving target mandating constant maintenance from downstream applications. At least they left us the `item()` method... *for now.* – Cecil Curry Feb 27 '19 at 09:21
  • asscalar method has depreciated since v1.6 of numpy – Eswar Sep 05 '19 at 05:19
  • 2
    You can easily replace the answer with `if isinstance(o, numpy.generic): return o.item() raise TypeError` and it turns into a non-deprecated answer again :D – Buggy Jan 09 '20 at 05:54
25

If you want to convert (numpy.array OR numpy scalar OR native type OR numpy.darray) TO native type you can simply do :

converted_value = getattr(value, "tolist", lambda: value)()

tolist will convert your scalar or array to python native type. The default lambda function takes care of the case where value is already native.

v.thorey
  • 1,588
  • 15
  • 19
  • 2
    Cleanest approach for mixed types (native and non-native), well done! And for those that wonder, yes, tolist just returns a single value (the scalar) when you're calling it on a single value, not a list as you might think. Worth noting is that the simpler way to write the lambda is `lambda: value` since we don't want any inputs. – fgblomqvist Sep 12 '19 at 18:15
  • `getattr` + `tolist` combo is not only universal, but even vectorized! (unlinke .item()) – mirekphd Mar 18 '20 at 18:16
15

tolist() is a more general approach to accomplish this. It works in any primitive dtype and also in arrays or matrices.

I doesn't actually yields a list if called from primitive types:

numpy == 1.15.2

>>> import numpy as np

>>> np_float = np.float64(1.23)
>>> print(type(np_float), np_float)
<class 'numpy.float64'> 1.23

>>> listed_np_float = np_float.tolist()
>>> print(type(listed_np_float), listed_np_float)
<class 'float'> 1.23

>>> np_array = np.array([[1,2,3.], [4,5,6.]])
>>> print(type(np_array), np_array)
<class 'numpy.ndarray'> [[1. 2. 3.]
 [4. 5. 6.]]

>>> listed_np_array = np_array.tolist()
>>> print(type(listed_np_array), listed_np_array)
<class 'list'> [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
Carlos Santos
  • 226
  • 3
  • 4
12

How about:

In [51]: dict([(d, type(np.zeros(1,d).tolist()[0])) for d in (np.float32,np.float64,np.uint32, np.int16)])
Out[51]: 
{<type 'numpy.int16'>: <type 'int'>,
 <type 'numpy.uint32'>: <type 'long'>,
 <type 'numpy.float32'>: <type 'float'>,
 <type 'numpy.float64'>: <type 'float'>}
unutbu
  • 711,858
  • 148
  • 1,594
  • 1,547
  • 1
    I mention that type of solution as a possibility at the end of my question. But I'm looking for a systematic solution rather than a hard-coded one that just covers a few of the cases. For example, if numpy adds more dtypes in the future, your solution would break. So I'm not happy with that solution. – conradlee Feb 26 '12 at 13:51
  • The number of possible dtypes is unbounded. Consider `np.dtype('mint8')` for any positive integer `m`. There can not be an exhaustive mapping. (I also do not believe there is a builtin function to do this conversion for you. I could be wrong, but I don't think so :)) – unutbu Feb 26 '12 at 14:01
  • 2
    Python maps numpy dtypes to python types, I'm not sure how, but I'd like to use whatever method they do. I think this must happen to allow, for example, multiplication (and other operations) between numpy dtypes and python types. I guess their method does not exhaustively map all possible numpy types, but at least the most common ones where it makes sense. – conradlee Feb 26 '12 at 20:54
  • It does not work consistently: `>>> print([numpy.asscalar(x) for x in numpy.linspace(1.0, 0.0, 21)]) [1.0, 0.95, 0.9, 0.85, 0.8, 0.75, 0.7, 0.6499999999999999, 0.6, 0.55, 0.5, 0.44999999999999996, 0.3999999999999999, 0.35, 0.29999999999999993, 0.25, 0.19999999999999996, 0.1499999999999999, 0.09999999999999998, 0.04999999999999993, 0.0]` As you see not all values were correctly converted. – Alex F Dec 20 '17 at 17:19
  • following my previous comment, strangely this one works, though i would have though you would need to put the round on the Python native type instead of the Numpy native type: `>>> print([numpy.asscalar(round(x,2)) for x in numpy.linspace(1.0, 0.0, 21)]) [1.0, 0.95, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1, 0.05, 0.0]` – Alex F Dec 20 '17 at 17:23
  • @AlexF it is consistent, you've just stumbled on floating point binary arithmetic. Some of those decimals e.g. 0.65 are recurring numbers in binary so can't be stored exactly. When displayed to you in decimal it looks like a rounding error https://softwareengineering.stackexchange.com/a/101170/290646 – Davos Mar 20 '18 at 11:50
9

You can also call the item() method of the object you want to convert:

>>> from numpy import float32, uint32
>>> type(float32(0).item())
<type 'float'>
>>> type(uint32(0).item())
<type 'long'>
Mike T
  • 34,456
  • 15
  • 128
  • 169
Aryeh Leib Taurog
  • 4,716
  • 1
  • 37
  • 46
7

Sorry to come late to the partly, but I was looking at a problem of converting numpy.float64 to regular Python float only. I saw 3 ways of doing that:

  1. npValue.item()
  2. npValue.astype(float)
  3. float(npValue)

Here are the relevant timings from IPython:

In [1]: import numpy as np

In [2]: aa = np.random.uniform(0, 1, 1000000)

In [3]: %timeit map(float, aa)
10 loops, best of 3: 117 ms per loop

In [4]: %timeit map(lambda x: x.astype(float), aa)
1 loop, best of 3: 780 ms per loop

In [5]: %timeit map(lambda x: x.item(), aa)
1 loop, best of 3: 475 ms per loop

It sounds like float(npValue) seems much faster.

gt6989b
  • 3,737
  • 7
  • 37
  • 56
6

I think you can just write general type convert function like so:

import numpy as np

def get_type_convert(np_type):
   convert_type = type(np.zeros(1,np_type).tolist()[0])
   return (np_type, convert_type)

print get_type_convert(np.float32)
>> (<type 'numpy.float32'>, <type 'float'>)

print get_type_convert(np.float64)
>> (<type 'numpy.float64'>, <type 'float'>)

This means there is no fixed lists and your code will scale with more types.

Matt Alcock
  • 10,631
  • 13
  • 41
  • 59
  • Do you know where the source code is for the part of the tolist() method that maps numpy types to python types? I took a quick look but couldn't find it. – conradlee Feb 26 '12 at 22:15
  • This is a bit of a hack what I'm doing is generating a `numpy.ndarray` with 1 zero in it using `zeros()` and the calling the `ndarrays` `tolist()` function to convert into native types. Once in native types i ask for the type an return it. `tolist()` is a fucntion of the `ndarray` – Matt Alcock Feb 26 '12 at 22:27
  • Yeah I see that---it works for what I want and so I've accepted your solution. But I wonder how tolist() does its job of deciding what type to cast into, and I'm not sure how to find the source. – conradlee Feb 26 '12 at 22:35
  • http://numpy.sourceforge.net/numdoc/HTML/numdoc.htm#pgfId-36588 is where the function is documented. I thought inspect might be able to help find more information but no joy. Next step I tried to clone https://github.com/numpy/numpy.git and run `grep -r 'tolist' numpy`. (still in progress, numpy is massive! ) – Matt Alcock Feb 26 '12 at 23:01
3

numpy holds that information in a mapping exposed as typeDict so you could do something like the below::

>>> import __builtin__
>>> import numpy as np
>>> {v: k for k, v in np.typeDict.items() if k in dir(__builtin__)}
{numpy.object_: 'object',
 numpy.bool_: 'bool',
 numpy.string_: 'str',
 numpy.unicode_: 'unicode',
 numpy.int64: 'int',
 numpy.float64: 'float',
 numpy.complex128: 'complex'}

If you want the actual python types rather than their names, you can do ::

>>> {v: getattr(__builtin__, k) for k, v in np.typeDict.items() if k in vars(__builtin__)}
{numpy.object_: object,
 numpy.bool_: bool,
 numpy.string_: str,
 numpy.unicode_: unicode,
 numpy.int64: int,
 numpy.float64: float,
 numpy.complex128: complex}
Meitham
  • 7,479
  • 4
  • 30
  • 42
1

My approach is a bit forceful, but seems to play nice for all cases:

def type_np2py(dtype=None, arr=None):
    '''Return the closest python type for a given numpy dtype'''

    if ((dtype is None and arr is None) or
        (dtype is not None and arr is not None)):
        raise ValueError(
            "Provide either keyword argument `dtype` or `arr`: a numpy dtype or a numpy array.")

    if dtype is None:
        dtype = arr.dtype

    #1) Make a single-entry numpy array of the same dtype
    #2) force the array into a python 'object' dtype
    #3) the array entry should now be the closest python type
    single_entry = np.empty([1], dtype=dtype).astype(object)

    return type(single_entry[0])

Usage:

>>> type_np2py(int)
<class 'int'>

>>> type_np2py(np.int)
<class 'int'>

>>> type_np2py(str)
<class 'str'>

>>> type_np2py(arr=np.array(['hello']))
<class 'str'>

>>> type_np2py(arr=np.array([1,2,3]))
<class 'int'>

>>> type_np2py(arr=np.array([1.,2.,3.]))
<class 'float'>
Simon Streicher
  • 2,257
  • 1
  • 19
  • 25
1

A side note about array scalars for those who don't need automatic conversion and know the numpy dtype of the value:

Array scalars differ from Python scalars, but for the most part they can be used interchangeably (the primary exception is for versions of Python older than v2.x, where integer array scalars cannot act as indices for lists and tuples). There are some exceptions, such as when code requires very specific attributes of a scalar or when it checks specifically whether a value is a Python scalar. Generally, problems are easily fixed by explicitly converting array scalars to Python scalars, using the corresponding Python type function (e.g., int, float, complex, str, unicode).

Source

Thus, for most cases conversion might not be needed at all, and the array scalar could be used directly. The effect should be identical to using Python scalar:

>>> np.issubdtype(np.int64, int)
True
>>> np.int64(0) == 0
True
>>> np.issubdtype(np.float64, float)
True
>>> np.float64(1.1) == 1.1
True

But if, for some reason, the explicit conversion is needed, using the corresponding Python built-in function is the way to go. As shown in the other answer it's also faster than array scalar item() method.

wombatonfire
  • 2,718
  • 19
  • 33
0

If you have an array list_numpy_numbers of numpy types, do the following:

list_native_numbers = [i.item() for i in list_numpy_numbers]
XronoX
  • 33
  • 7
-1

Translate the whole ndarray instead one unit data object:

def trans(data):
"""
translate numpy.int/float into python native data type
"""
result = []
for i in data.index:
    # i = data.index[0]
    d0 = data.iloc[i].values
    d = []
    for j in d0:
        if 'int' in str(type(j)):
            res = j.item() if 'item' in dir(j) else j
        elif 'float' in str(type(j)):
            res = j.item() if 'item' in dir(j) else j
        else:
            res = j
        d.append(res)
    d = tuple(d)
    result.append(d)
result = tuple(result)
return result

However, it takes some minutes when handling large dataframes. I am also looking for a more efficient solution. Hope a better answer.