6

I have two symmetric (item co-occurrence) matrices A and B and want to find out if they describe the same co-occurrence, only with the row/column labels permuted. (The same permutation has to be applied to rows and columns to keep the symmetry/co-occurrence property)

For example these two matrices should be equal in my test:

a = np.array([
    #1 #2 #3 #4 #5 #6 #7
    [0, 1, 1, 0, 0, 0, 1], #1
    [1, 0, 1, 2, 1, 1, 2], #2
    [1, 1, 0, 0, 0, 0, 1], #3
    [0, 2, 0, 0, 4, 0, 4], #4
    [0, 1, 0, 4, 0, 1, 2], #5
    [0, 1, 0, 0, 1, 0, 0], #6
    [1, 2, 1, 4, 2, 0, 0]  #7
])
b = np.array([
    #5 #7 #1,3#3,1#2 #4 #6
    [0, 2, 0, 0, 1, 4, 1], #5
    [2, 0, 1, 1, 2, 4, 0], #7
    [0, 1, 0, 1, 1, 0, 0], #1,3 could be either
    [0, 1, 1, 0, 1, 0, 0], #1,3 could be either
    [1, 2, 1, 1, 0, 2, 1], #2
    [4, 4, 0, 0, 2, 0, 0], #4
    [1, 0, 0, 0, 1, 0, 0]  #6
])

I currently test if the eigenvalues are the same using numpy.linalg.eigvals (I am not even sure this is a sufficient condition), but I would like to find a test which doesn't involve numerical accuracy, since I am dealing with integers here.

C. Yduqoli
  • 1,160
  • 10
  • 14
  • 2
    This problem is equivalent to graph isomorphism: exact solutions are likely to be slow. See e.g. [this question](https://math.stackexchange.com/questions/331233/showing-two-graphs-isomorphic-using-their-adjacency-matrices) – Maxim Oct 29 '18 at 11:23
  • I was wondering how did you create `b` without knowing the indices `[5, 7, 1, 3, 2, 4, 6]` in the first place. – Andreas K. Oct 29 '18 at 12:44
  • I calculate the co-occurrence of objects based on a list of lists of objects. These objects get random indices assigned (out of my control) before the co-occurrence matrix is created. This way a different co-oc matrix is created every time. For my example I used two of these matrices and assigned the indices for the second matrix by hand. – C. Yduqoli Oct 30 '18 at 02:41
  • I'd recommend reading up on the [graph isomorphism problem](https://en.wikipedia.org/wiki/Graph_isomorphism_problem) and checking to see if your particular flavor of graph is a solved problem. If not, you're probably in for a lot of brute force. – Daniel F Oct 30 '18 at 12:09
  • If in many cases your matrices won't be equivalent, it might be worth first doing a test that wil tell you if they are not, but won't tell you if they are. For example you could, for each matrix, compute the multiset of each row (eg the first row of a would be {(4,0), (3,1)}) and then forming the multiset of the row multisets. If these two multisets (one for a, one for b) are not equal then the matrices are not equivalent. – dmuir Oct 30 '18 at 19:28

3 Answers3

2

Here's a vectorized solution based on sorting and leveraging searchsorted -

import pandas as pd

# Sort rows for a and b
aS = np.sort(a,axis=1)
bS = np.sort(b,axis=1)

# Scale down each row to a scalar each
scale = np.r_[(np.maximum(aS.max(0),bS.max(0))+1)[::-1].cumprod()[::-1][1:],1]
aS1D = aS.dot(scale)
bS1D = bS.dot(scale)

# Use searchsorted to get the correspondence on indexing
sidx = aS1D.argsort()
searchsorted_idx = np.searchsorted(aS1D,bS1D,sorter=sidx)
searchsorted_idx[searchsorted_idx==len(aS1D)] = len(aS1D)-1
df = pd.DataFrame({'A':searchsorted_idx})
new_order = sidx[df.groupby('A').cumcount().values+searchsorted_idx]
# new_order is the permuted order, i.e. [5, 7, 1, 3, 2, 4, 6]

# Finally index into a with the new_order and compare against b
out = np.array_equal(a[new_order[:,None], new_order],b)
Divakar
  • 204,109
  • 15
  • 192
  • 292
  • Nice one, but a lot slower than the solution with eigenvalues. – Andreas K. Oct 29 '18 at 12:02
  • @AndyK What size are you testing it on? – Divakar Oct 29 '18 at 12:03
  • That doesn't seem to be correct. I tried `a = scipy.linalg.toeplitz(np.arange(8))` and `b` some shuffled version of that, and am getting `False` all the time. – Paul Panzer Oct 29 '18 at 12:14
  • @AndyK I'm pretty sure eigenvalues are not sufficient. for example `[[4 0] [0 0]]` and `[[2 2] [2 2]]` have the same eigenvalues but obviously cannot be shuffled into each other. – Paul Panzer Oct 29 '18 at 12:18
  • What about `np.array([[12, 8, 11],[8, 0, 12],[11, 12, 8]])` `np.array([[0, 12, 8],[12, 8, 11],[8, 11, 12]])`. This should return True, but your method return False in my testing. – C. Yduqoli Oct 30 '18 at 03:14
  • Also it raised an exception for `[[12 8] [ 8 4]]` `[[ 6 5] [ 5 14]]`: `new_order = sidx[df.groupby('A').cumcount().values+searchsorted_idx] IndexError: index 2 is out of bounds for axis 1 with size 2` – C. Yduqoli Oct 30 '18 at 03:19
1

I will assume that you have the list of permutation of the rows/columns of a which gives the b, e.g. something like this

p = np.array([5, 7, 1, 3, 2, 4, 6]) - 1

Then you can simply do the following on a

a_p = a[p]
a_p = a_p[:, p]

and check if b and the permuted a_p are equal:

(a_p == b).all()

Edit: since you don't have a list like the above p, you could (at least for small arrays a and b) generate the permutations of the indices and check for each one:

from itertools import permutations

def a_p(a, b, p):
    p = np.array(p)
    a_p = a[p]
    a_p = a_p[:, p]
    return a_p

for p in permutations(range(a.shape[0])):
    if (a_p(a, b, p) == b).all():
        print('True')
        break
else:
    print('False')

Note that this brute-force method works for non-symmetric matrices as well. But since the number of permutations is huge for large arrays a and b, this method can be very slow. So your solution with computing eigenvalues is much better.

Here's a benchmark:

def Yduqoli(a, b):
    ''' I suppose your solution is similar'''
    if (np.array(np.unique(a, return_counts=True)) == np.array(np.unique(b, return_counts=True))).all():
        a_eigs = np.sort(np.linalg.eigvals(a))
        b_eigs = np.sort(np.linalg.eigvals(b))
        return np.allclose(a_eigs, b_eigs)
    else:
        return False

def AndyK(a, b):
    for p in permutations(range(a.shape[0])):
        if (a_p(a, b, p) == b).all():
            return True
    return False  

%timeit AndyK(a,b)
103 ms ± 4.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit Yduqoli(a,b)
408 µs ± 65.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

where I used the symmetric matrices a and b provided by the OP.

Update: As mentioned by Paul Panzer, simply checking the eigenvalues can give incorrect result in some cases, e.g. a = np.array([[4, 0], [0, 0]]), b = np.array([[2, 2], [2, 2]]) have the same eigenvalues, but cannot be shuffled one into the other. So we first need to check if the arrays a and b have the same elements (regardless their positions).

Andreas K.
  • 6,869
  • 2
  • 28
  • 40
1

You could always sort the matrix by row-norm and see if they are different. If two rows have the same norm, you'd have to check permutations of the rows that have the same norm though. But this reduces the problem to only rows with the same norm. In many cases you can first sort by 2-norm, then 1-norm and finally brute force the remaining permutations.

import numpy as np

def get_row_norm(a):
    """
    Sort by 2-norm
    """
    row_norms = np.sum(a**2, axis=1)
    return row_norms

def sort(a):
    """
    Return the matrix a sorted by 2-norm
    """
    n = a.shape[0]
    # Get the norms
    row_norms = get_row_norm(a)
    # Get the order
    order = np.argsort(row_norms)[::-1]

    sorted_a = a.copy()

    for m in range(n):
        i = order[m]
        for k in range(m+1): 
            j = order[k]
            sorted_a[m, k] = a[i, j]
            sorted_a[k, m] = a[i, j]

    return sorted_a


a = np.array([
    #1 #2 #3 #4 #5 #6 #7
    [0, 1, 1, 0, 0, 0, 1], #1
    [1, 0, 1, 2, 1, 1, 2], #2
    [1, 1, 0, 0, 0, 0, 1], #3
    [0, 2, 0, 0, 4, 0, 4], #4
    [0, 1, 0, 4, 0, 1, 2], #5
    [0, 1, 0, 0, 1, 0, 0], #6
    [1, 2, 1, 4, 2, 0, 0]  #7
])  
b = np.array([
    #5 #7 #1,3#3,1#2 #4 #6 
    [0, 2, 0, 0, 1, 4, 1], #5
    [2, 0, 1, 1, 2, 4, 0], #7
    [0, 1, 0, 1, 1, 0, 0], #1,3 could be either
    [0, 1, 1, 0, 1, 0, 0], #1,3 could be either
    [1, 2, 1, 1, 0, 2, 1], #2
    [4, 4, 0, 0, 2, 0, 0], #4
    [1, 0, 0, 0, 1, 0, 0]  #6
])

# Sort a and b
A = sort(a)
B = sort(b)
# Print the norms
print(get_row_norm(a)) # [ 3. 12.  3. 36. 22.  2. 26.]
print(get_row_norm(A)) # [36. 26. 22. 12.  3.  3.  2.]
print(get_row_norm(B)) # [36. 26. 22. 12.  3.  3.  2.]
# Assert that they are equal
print( (A == B).all())

Note that if they are not equal, you still have to check permutation of the fifth and sixth row, since their norms are equal.

user2653663
  • 2,573
  • 1
  • 14
  • 20