4

The issue

I have a very simple function which uses numpy.where() for a very simple calculation.

  1. If the input is a scalar, the output is a numpy array of size ().
  2. If I multiply it by one, it becomes a numpy int32 of the same size.

My questions are:

what is the difference between 1 and 2?

  • Are they both scalars? Does such a thing as a numpy scalar even exist?
  • Why does multiplying it by one change the type? Is this a known, documented feature/bug?
  • why do other numpy functions, e.g. np.arange(5,6) return an array of size (1,) instead?

I doubt I am the first one to come across this but I haven't found much online.

I have found questions on the difference between an array of shape (n,) and one of shape (n,1), but that's a different matter.

Toy example:

import numpy as np

def my_find(a):
    return np.where(a == 0 , 1, 0)

out_scalar = my_find(5)
out_scalar_times_1 = 1 * out_scalar

print("a scalar input return an output of type:")
print(type(out_scalar))
print("and of shape")
print(out_scalar.shape)

print("")
print("Multiplying it by 1 returns a:")
print(type(out_scalar_times_1))

out_array = my_find(np.arange(0,5))

Spyder screenshot

enter image description here

Pythonista anonymous
  • 5,770
  • 14
  • 47
  • 87

1 Answers1

3

Yes, there is such a thing as a numpy scalar

https://numpy.org/doc/stable/reference/arrays.scalars.html

A numpy array can have 0,1,2 or more dimensions. There's a lot of overlap between

np.int64(3)          # numpy int
np.array(3)          # 0d array
np.array([3])        # 1d array with 1 element
np.int(3)            # python int
3                    # python int

The first 3 have array attributes like shape and dtype. The differences between the first two are minor.

In a function like where, numpy first converts the arguments to array, e.g. np.array(5), np.array(1)

In [161]: np.where(5, 1, 0)
Out[161]: array(1)
In [162]: _.shape
Out[162]: ()
In [163]: np.array(5)
Out[163]: array(5)

But math like addition with a scalar may return a numpy scalar:

In [164]: np.array(5) + 1
Out[164]: 6
In [165]: type(_)
Out[165]: numpy.int64
In [166]: np.array(5) * 1
Out[166]: 5
In [167]: type(_)
Out[167]: numpy.int64

Indexing an array can also produce such a scalar:

In [182]: np.arange(3)[1]
Out[182]: 1
In [183]: type(_)
Out[183]: numpy.int64

where 'broadcasts' the arguments, so the resulting shape is, in the broadcasted sense, the "largest":

In [168]: np.where(np.arange(5),1,0)
Out[168]: array([0, 1, 1, 1, 1])
In [173]: np.where(5, [1],0)
Out[173]: array([1])
In [174]: np.where(0, [1],0)
Out[174]: array([0])
In [175]: np.where([[0]], [1],0)
Out[175]: array([[0]])

If spyder has tab completion like ipython, you can get a list of all the methods attached to an object. The methods for an np.int64(3) will look a lot like the those for np.array(3). But very different from 3.

There are also arrays with 0 elements - if one of the dimensions is 0

Out[184]: array([], dtype=int64)
In [185]: _.shape
Out[185]: (0,)
In [186]: np.arange(1)
Out[186]: array([0])
In [187]: _.shape
Out[187]: (1,)

Obviously a 0d can't have 0 elements, because it doesn't have any 0 dimensions.

Indexing a 0d array (or numpy scalar) is a bit tricker (but still logical):

In [189]: np.array(3)[()]      # 0 element indexing tuple
Out[189]: 3
In [190]: type(_)
Out[190]: numpy.int64
In [191]: np.array(3).item()
Out[191]: 3
In [192]: type(_)
Out[192]: int
In [193]: np.array(3)[()][()]
Out[193]: 3

The return of addition might be explained by 'array_priority'

dtype is not preserved in operations like this. Add a float to an int, and get a float.

In [203]: type(np.array(3, np.int16) + 3)
Out[203]: numpy.int64
In [204]: type(np.array(3, np.int16) + 3.0)
Out[204]: numpy.float64

ufunc casting

+ is actually a call to np.add ufunc. ufunc take key words like casting that give finer control over what results can be:

In [214]: np.add(np.array(3, np.int16), 3)
Out[214]: 6
In [215]: np.add(np.array(3, np.int16), 3, casting='no')
Traceback (most recent call last):
  File "<ipython-input-215-631cb3a3b303>", line 1, in <module>
    np.add(np.array(3, np.int16), 3, casting='no')
UFuncTypeError: Cannot cast ufunc 'add' input 0 from dtype('int16') to dtype('int64') with casting rule 'no'
    
In [217]: np.add(np.array(3, np.int16), 3, casting='safe')
Out[217]: 6

https://numpy.org/doc/stable/reference/ufuncs.html#output-type-determination

I was speculating that __array_priority__ played a role in returning a np.int64, but priorities go the wrong way.

In [194]: np.array(3).__array_priority__
Out[194]: 0.0
In [195]: np.int64(3).__array_priority__
Out[195]: -1000000.0
In [196]: np.array(3) + np.int64(3)
Out[196]: 6
In [197]: type(_)
Out[197]: numpy.int64

I don't know where it's documented, but often an operation will return a numpy scalar rather than a 0d array.

I just remembered/discovered one difference between 0d and numpy scalar - mutability

In [222]: x
Out[222]: array(3)
In [223]: x[...] = 4
In [224]: x
Out[224]: array(4)
In [225]: x = np.int64(3)
In [226]: x[...] = 4
Traceback (most recent call last):
  File "<ipython-input-226-f7dca2cc5565>", line 1, in <module>
    x[...] = 4
TypeError: 'numpy.int64' object does not support item assignment

Python classes can share a lot of behaviors/methods, but differ in others.

hpaulj
  • 175,871
  • 13
  • 170
  • 282