I have 2 arrays in 2D, where the column vectors are feature vectors. One array is of size F x A, the other of F x B, where A << B. As an example, for A = 2 and F = 3 (B can be anything):
arr1 = np.array( [[1, 4],
[2, 5],
[3, 6]] )
arr2 = np.array( [[1, 4, 7, 10, ..],
[2, 5, 8, 11, ..],
[3, 6, 9, 12, ..]] )
I want to calculate the distance between arr1
and a fragment of arr2
that is of equal size (in this case, 3x2), for each possible fragment of arr2
. The column vectors are independent of each other, so I believe I should calculate the distance between each column vector in arr1
and a collection of column vectors ranging from i
to i + A
from arr2
and take the sum of these distances (not sure though).
Does numpy offer an efficient way of doing this, or will I have to take slices from the second array and, using another loop, calculate the distance between each column vector in arr1
and the corresponding column vector in the slice?
Example for clarity, using the arrays stated above:
>>> magical_distance_func(arr1, arr2[:,:2])
[0, 10.3923..]
>>> # First, distance between arr2[:,:2] and arr1, which equals 0.
>>> # Second, distance between arr2[:,1:3] and arr1, which equals
>>> diff = arr1 - np.array( [[4,7],[5,8],[6,9]] )
>>> diff
[[-3, -3], [-3, -3], [-3, -3]]
>>> # this happens to consist only of -3's. Norm of each column vector is:
>>> norm1 = np.linalg.norm([:,0])
>>> norm2 = np.linalg.norm([:,1])
>>> # would be extremely good if this worked for an arbitrary number of norms
>>> totaldist = norm1 + norm2
>>> totaldist
10.3923...
Of course, transposing the arrays is fine too, if that means that cdist can somehow be used here.