What is the difference between a numpy array of size (100, 1) and (100,)?

Question

I have two variables coming from diffrent functions and the first one a is:

<class 'numpy.ndarray'>
(100,)

while the other one b is:

<class 'numpy.ndarray'>
(100, 1)

If I try to correlate them via:

from scipy.stats import pearsonr
p, r= pearsonr(a, b)

I get:

    r = max(min(r, 1.0), -1.0)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

My questions are:

What is the difference between a and b?
How do I fix this?

Possible duplicate of [Difference between these array shapes in numpy](https://stackoverflow.com/questions/27570756/difference-between-these-array-shapes-in-numpy) — MrFuppes, Jul 31 '19 at 15:24
Possible duplicate of [Difference between numpy.array shape (R, 1) and (R,)](https://stackoverflow.com/questions/22053050/difference-between-numpy-array-shape-r-1-and-r) — NVS Abhilash, Jul 31 '19 at 15:35

score 4 · Answer 1 · edited Jul 31 '19 at 15:29

4

(100,1) is 2d array of rows of length 1 like = [[1],[2],[3],[4]] and second one is 1d array [1, 2, 3, 4 ]

a1 = np.array([[1],[2],[3],[4]])
a2 = np.array([1, 2, 3, 4 ])

edited Jul 31 '19 at 15:29

mattsap

4,108
1
10
32

answered Jul 31 '19 at 15:24

user8426627

863
6
17

NVS Abhilash · Accepted Answer · 2019-07-31T15:34:40.200

First question's answer: a is a vector, and b is a matrix. Look at this stackoverflow link for more details: Difference between numpy.array shape (R, 1) and (R,)

Second question's answer:

I think converting one to the other form should just work fine. For the function you provided, I guess it expects vectors, hence just reshape b using b = b.reshape(-1) which converts it to a single dimensions (a vector). Look at the below example for reference:

>>> import numpy as np
>>> from scipy.stats import pearsonr
>>> a = np.random.random((100,))
>>> b = np.random.random((100,1))
>>> print(a.shape, b.shape)
(100,) (100, 1)
>>> p, r= pearsonr(a, b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\xyz\Appdata\Local\Continuum\Anaconda3\lib\site-packages\scipy\stats\stats.py", line 3042, in pearsonr
    r = max(min(r, 1.0), -1.0)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> b = b.reshape(-1)
>>> p, r= pearsonr(a, b)
>>> print(p, r)
0.10899671932026986 0.280372238354364

mattsap · Answer 3 · 2019-07-31T17:06:36.197

0

You'll need to call the reshape function on the first one to .reshape((100,1)) Reshape will change the "shape" property of the np array which will make the 1D array [1,2,3, ..., 100] to a 2D array [[1],[2],[3],...[100]]

edited Jul 31 '19 at 17:06

answered Jul 31 '19 at 15:22

mattsap

4,108
1
10
32

1

In order to provide a better answer, you should explain why. – Dorian Turba Jul 31 '19 at 15:32

What is the difference between a numpy array of size (100, 1) and (100,)?

3 Answers3