2

I have two variables coming from diffrent functions and the first one a is:

<class 'numpy.ndarray'>
(100,)

while the other one b is:

<class 'numpy.ndarray'>
(100, 1)

If I try to correlate them via:

from scipy.stats import pearsonr
p, r= pearsonr(a, b)

I get:

    r = max(min(r, 1.0), -1.0)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

My questions are:

  1. What is the difference between a and b?
  2. How do I fix this?
lordy
  • 480
  • 6
  • 23
  • 2
    Possible duplicate of [Difference between these array shapes in numpy](https://stackoverflow.com/questions/27570756/difference-between-these-array-shapes-in-numpy) – MrFuppes Jul 31 '19 at 15:24
  • 2
    Possible duplicate of [Difference between numpy.array shape (R, 1) and (R,)](https://stackoverflow.com/questions/22053050/difference-between-numpy-array-shape-r-1-and-r) – NVS Abhilash Jul 31 '19 at 15:35

3 Answers3

4

(100,1) is 2d array of rows of length 1 like = [[1],[2],[3],[4]] and second one is 1d array [1, 2, 3, 4 ]

a1 = np.array([[1],[2],[3],[4]])
a2 = np.array([1, 2, 3, 4 ])
mattsap
  • 4,108
  • 1
  • 10
  • 32
user8426627
  • 863
  • 6
  • 17
3

First question's answer: a is a vector, and b is a matrix. Look at this stackoverflow link for more details: Difference between numpy.array shape (R, 1) and (R,)

Second question's answer:

I think converting one to the other form should just work fine. For the function you provided, I guess it expects vectors, hence just reshape b using b = b.reshape(-1) which converts it to a single dimensions (a vector). Look at the below example for reference:

>>> import numpy as np
>>> from scipy.stats import pearsonr
>>> a = np.random.random((100,))
>>> b = np.random.random((100,1))
>>> print(a.shape, b.shape)
(100,) (100, 1)
>>> p, r= pearsonr(a, b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\xyz\Appdata\Local\Continuum\Anaconda3\lib\site-packages\scipy\stats\stats.py", line 3042, in pearsonr
    r = max(min(r, 1.0), -1.0)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> b = b.reshape(-1)
>>> p, r= pearsonr(a, b)
>>> print(p, r)
0.10899671932026986 0.280372238354364
NVS Abhilash
  • 567
  • 7
  • 24
0

You'll need to call the reshape function on the first one to .reshape((100,1)) Reshape will change the "shape" property of the np array which will make the 1D array [1,2,3, ..., 100] to a 2D array [[1],[2],[3],...[100]]

mattsap
  • 4,108
  • 1
  • 10
  • 32