The quick problem
I would like to be able to compare specific dtype fields from two numpy structured arrays that are guaranteed to have the same dtype. I would like to do this in a way that allows the fields we are comparing to be different each time a function is called based on the given inputs (i.e. I can't easily hard code the comparisons for each individual field)
The long problem with examples
I am trying to compare specific fields from two numpy structured arrays with the same dtype. for instance, say we have
import numpy as np
from io import BytesIO
a = np.genfromtxt(BytesIO('12 23 0|23.2|17.9|0\n12 23 1|13.4|16.9|0'.encode()),dtype=[('id','U7'),('pos',[('x',float),('y',float)]),('flag','U1')],delimiter='|')
b = np.genfromtxt(BytesIO(' |23.0|17.91|0'.encode()),dtype=[('id','U7'),('pos',[('x',float),('y',float)]),('flag','U1')],delimiter='|')
which gives
In[156]: a
Out[154]:
array([('12 23 0', (23.2, 17.9), '0'), ('12 23 1', (13.4, 16.9), '0')],
dtype=[('id', '<U7'), ('pos', [('x', '<f8'), ('y', '<f8')]), ('flag', '<U1')])
and
In[153]: b
Out[151]:
array([('', (23.0, 17.91), '0')],
dtype=[('id', '<U7'), ('pos', [('x', '<f8'), ('y', '<f8')]), ('flag', '<U1')])
Now lets say that I want to check and find any entries in a
whose a['pos']['x']
field is greater than the b['pos']['x']
field and return these entries to a new numpy array, something like this would work
newArr = a[a["pos"]["x"]>b["pos"]["x"]]
Now imagine we want to keep only entries in a
where both the x
and y
fields are greater than their counterparts in b
. This is fairly simple as we could again do
newArr = a[np.array([np.array([a['pos']['x']>b['pos']['x']),a['pos']['y']>b['pos']['y'])).all(axis=0)]
which returns an empty array which is the correct answer.
Now however, imagine that we have a very complicated dtype for these arrays (say with 34 fields -- see here for an example of the dtype I'm working with) and we want to be able to compare any of them but likely not all of them (similar to the previous example but with more dtype fields overall and more of them we want to compare. Further, what if the fields we want to compare can change from run to run (so we can't really hard code it in the way I did above). That is the problem I am trying to find the solution to.
My current (unfinished) attempts at solutions
Using masked arrays
My first thought to solving this problem was to use masked arrays to select the data type fields that we want to compare. Something like this (assuming we can make all our comparisons the same):
mask = np.ones(z.shape,dtype=[('id',bool),('pos',[('x',bool),('y',bool)]),('flag',bool)])
# unmask the x and y fields so we can compare them
mask['pos']['x']=0
mask['pos']['y']=0
maskedA = np.ma.masked_array(a, mask=mask)
# We need to do this or the masked array gets angry (at least in python 3)
b.shape = (1,)
maskedB = np.ma.masked_array(b, mask=mask)
Now I would want to do something like
test = (maskedA>maskedB).any(axis=1)
but this doesn't work because you can compare structured arrays like this --
TypeError: unorderable types: MaskedArray() > MaskedArray()
I've also tried compressing the masked arrays
test = (maskedA.compressed()>maskedB.compressed()).any(axis=1)
which results in a different error
TypeError: ufunc 'logical_not' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Now, I realize that the above errors are likely because I don't fully understand how structured and masked arrays work but that is partially why I am asking this question. Is there any way to do something like this using masked arrays?
The solution I just thought of that will probably work and is probably better overall...
So the other option that I just thought of while writing this up is to just do the comparisons when I would be parsing the user's input to form array b
anyway. It would really just be adding a couple of lines to each conditional in the parser to do the comparison and tack the results into a numpy boolean array that I could then use to extract the proper entries from a
. Now that I think about it this is probably the way to go.
The conclusion to my long and rambling problem.
Despite the fact that I think I found a solution to this problem I am still going to post this question at least for a little bit to see if (a) anyone has any ideas about how to do logical comparisons with structured/masked numpy arrays because I think it would be a useful thing to know and (b) to see if anyone has a better idea then what I cam up with. Note that you can very easily form a MWE by copying line by line the snippets in the "The long problem with examples" section and I don't see any reason to take up more space by doing this.