3

Prerequisite

This is a question derived from this post. So, some of the introduction of the problem will be similar to that post.

Problem

Let's say result is a 2D array and values is a 1D array. values holds some values associated with each element in result. The mapping of an element in values to result is stored in x_mapping and y_mapping. A position in result can be associated with different values. Now, I have to find the minimum and maximum of the values grouped by associations.

An example for better clarification.

min_result array:

[[0, 0],
[0, 0],
[0, 0],
[0, 0]]

max_result array:

[[0, 0],
[0, 0],
[0, 0],
[0, 0]]

values array:

[ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.]

Note: Here result arrays and values have the same number of elements. But it might not be the case. There is no relation between the sizes at all.

x_mapping and y_mapping have mappings from 1D values to 2D result(both min and max). The sizes of x_mapping, y_mapping and values will be the same.

x_mapping - [0, 1, 0, 0, 0, 0, 0, 0]

y_mapping - [0, 3, 2, 2, 0, 3, 2, 1]

Here, 1st value(values[0]) and 5th value(values[4]) have x as 0 and y as 0(x_mapping[0] and y_mappping[0]) and hence associated with result[0, 0]. If we compute the minimum and maximum from this group, we will have 1 and 5 as results respectively. So, min_result[0, 0] will have 1 and max_result[0, 0] will have 5.

Note that if there is no association at all then the default value for result will be zero.

Current working solution

x_mapping = np.array([0, 1, 0, 0, 0, 0, 0, 0])
y_mapping = np.array([0, 3, 2, 2, 0, 3, 2, 1])
values = np.array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.], dtype=np.float32)
max_result = np.zeros([4, 2], dtype=np.float32)
min_result = np.zeros([4, 2], dtype=np.float32) 
min_result[-y_mapping, x_mapping] = values # randomly initialising from values
for i in range(values.size):
    x = x_mapping[i]
    y = y_mapping[i]
    # maximum
    if values[i] > max_result[-y, x]:
        max_result[-y, x] = values[i]
    # minimum
    if values[i] < min_result[-y, x]:
        min_result[-y, x] = values[i]

min_result,

[[1., 0.],
[6., 2.],
[3., 0.],
[8., 0.]]

max_result,

[[5., 0.],
[6., 2.],
[7., 0.],
[8., 0.]]

Failed solutions

#1

min_result = np.zeros([4, 2], dtype=np.float32)
np.minimum.reduceat(values, [-y_mapping, x_mapping], out=min_result)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-126de899a90e> in <module>()
1 min_result = np.zeros([4, 2], dtype=np.float32)
----> 2 np.minimum.reduceat(values, [-y_mapping, x_mapping], out=min_result)

ValueError: object too deep for desired array

#2

min_result = np.zeros([4, 2], dtype=np.float32)
np.minimum.reduceat(values, lidx, out= min_result)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-24-07e8c75ccaa5> in <module>()
1 min_result = np.zeros([4, 2], dtype=np.float32)
----> 2 np.minimum.reduceat(values, lidx, out= min_result)

ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (4,2)->(4,) (8,)->() (8,)->(8,) 

#3

lidx = ((-y_mapping) % 4) * 2 + x_mapping #from mentioned post
min_result = np.zeros([8], dtype=np.float32)
np.minimum.reduceat(values, lidx, out= min_result).reshape(4,2)

[[1., 4.],
[5., 5.],
[1., 3.],
[5., 7.]]

Question

How to use np.minimum.reduceat and np.maximum.reduceat for solving this problem? I'm looking for a solution that is optimised for runtime.

Side note

I'm using Numpy version 1.14.3 with Python 3.5.2

tpk
  • 1,083
  • 4
  • 15
  • 28

1 Answers1

1

Approach #1

Again, the most intuitive ones would be with numpy.ufunc.at. Now, since, these reductions would be performed against the existing values, we need to initialize the output with max values for minimum reductions and min values for maximum ones. Hence, the implementation would be -

min_result[-y_mapping, x_mapping] = values.max()
max_result[-y_mapping, x_mapping] = values.min()

np.minimum.at(min_result, [-y_mapping, x_mapping], values)
np.maximum.at(max_result, [-y_mapping, x_mapping], values)

Approach #2

To leverage np.ufunc.reduceat, we need to sort data -

m,n = max_result.shape
out_dtype = max_result.dtype
lidx = ((-y_mapping)%m)*n + x_mapping

sidx = lidx.argsort()
idx = lidx[sidx]
val = values[sidx]

m_idx = np.flatnonzero(np.r_[True,idx[:-1] != idx[1:]])
unq_ids = idx[m_idx]

max_result_out.flat[unq_ids] = np.maximum.reduceat(val, m_idx)
min_result_out.flat[unq_ids] = np.minimum.reduceat(val, m_idx)
Divakar
  • 204,109
  • 15
  • 192
  • 292
  • `np.minimum.at(min_result, [-y_mapping, x_mapping], values)` will result in zero array if we initialise the `min_result` as zero. Any workaround other than `min_result[min_result == MAX_VALUE] = 0.0`? – tpk Aug 06 '18 at 10:58
  • The second approach is much faster than first. Any insights? – tpk Aug 06 '18 at 11:18
  • 1
    @tpk Same as in the previous question. `ufunc.at` ones are probably looping under the hoods as opposed to `ufunc.reduceat` which aren't. – Divakar Aug 06 '18 at 11:19
  • can you look into this question as well? https://stackoverflow.com/questions/53513497/numpy-finding-count-of-distinct-values-from-associations-through-binning – tpk Nov 28 '18 at 06:37