Python multiprocessing diagonalization

Question

I am looking for help since it seems I can't find any solutions.

The task I am interested in is to diagonalize (large) matrices that depend on a parameter (let's call it m). I first did the calculations on my Mac laptop by using a simple for loop and it seems to use all the availbale cores (4) by default. But now I want to do the same thing on my Linux machine (Ubuntu 18.04) with 16 cores but due to the GIL, the program only uses one core.

First question, is the difference coming from the Mac Python interpreter or the linear algebra libraries used by numpy/scipy that are automatically parallelized on Mac and not on Linux? For the moment, I didn't look into the library (LAPACK, BLAS) problem.

Then, using Python's multiprocessing, I have no problems doing it but where I am stuck is the data structure. Indeed, for all these matrices, I need both the eigenvalues and eigenvectors (and the m label but the order of inputs is automatically kept) but due to the structure of pool.map (or pool.starmap in the case of multiple arguments) I can't seem to find the most efficient way of collecting the whole set of eigenvalues and eigenvectors.

Do you have any idea of how to do it or any good reference? I would appreciate help on this BLAS/LAPACK issue too.

Thanks in advance,

PS : see below for a naive code sample.

def Diagonalize(m):

    Mat = np.array([[0, m+1],[m+1, 0]])
    eigenvalues, eigenvectors = np.linalg.eigh(Mat)

    return list([eigenvalues, eigenvectors]) # using list() was one of my attempts

Nprocesses = 4 # or whatever number

if __name__ == '__main__':

    with Pool(processes = Nprocesses) as pool:

         pool.map(Diagonalize, m_values) # EDIT m_values being whatever list

Are you asking how the eigenvalue algorithm works? It isn't clear what you mean by your matrix is large and then you simply have a small matrix like that? — , May 31 '18 at 18:54
No. My small matrix here is a trivial example of what I want to do. In practice these matrices are big. I am not interested in they diagonalisation algorithm but rather what data structure to use to achieve my goal. — Tazwiie, May 31 '18 at 19:26
https://stackoverflow.com/questions/17468054/the-fastest-way-to-calculate-eigenvalues-of-large-matrices. The problem is python or the algorithm. Very large matrices typically use something like the lanczos algorithm but you haven't talked about the structure of the matrix. Python is extremely slow for this. Without specific knowledge of the type of matrix you won't get a speed up. There are many different types of eigenvalue algorithms. — , May 31 '18 at 19:33
https://stackoverflow.com/questions/25004564/parallel-exact-matrix-diagonalization-with-python — , May 31 '18 at 19:37
I am not interested in the diagonalization algorithm. I just want to carry it out concurrently for different matrices using multiprocessing but c — Tazwiie, May 31 '18 at 19:38
(...continued) I am not interested in the diagonalization algorithm but yes I plan to use `scipy.sparse.linalg.eigsh`. I just want to carry it out concurrently for different matrices using multiprocessing. I just can't figure out the best way to gather all the eigenvalues in one array and all eigenvectors in another one, so that I can manipulate them (plot, reshape, ...) easily afterwards. The main question is really about find the best Python syntax. — Tazwiie, May 31 '18 at 19:48
as for the performance - https://stackoverflow.com/q/26511430/5351549 — ewcz, Jun 01 '18 at 09:11

score 0 · Answer 1 · 2018-06-02T18:40:38.690

0

This is what I did for a test, check it out.

from multiprocessing import Pool
import scipy.sparse as sparse 
from scipy.sparse.linalg import eigs, eigsh
import time

def diagon(X):
    vals, vecs = eigs(X)
    return vals, vecs
if __name__ == "__main__":
    n = 10;
    p= .5;
    A = sparse.random(n,n,p)

    dataset = [A,A,A,A]
    n1 = len(dataset)
    eigvals =[]
    eigvecs = []
    t1 = time.time()

    p =Pool(processes = 4)

    result = p.map(diagon,dataset)
    p.close()
    p.join()

    print("Pool took:", time.time() - t1)
    for i in range(0,n1):
        eigvals.append(result[i][0])
        eigvecs.append(result[i][1])

Ok, so some minor speed comparison. I haven't gotten the multiprocessing aspect down for Julia however this was taking over 100x longer in Python.

function gen_matrix(n,m)
%
%
    A = sparse((m+1)*eye(n))
    return A

end
n = 10;
m =6;

A = gen_matrix(n,m)

10×10 SparseMatrixCSC{Float64,Int64} with 10 stored entries:
  [1 ,  1]  =  7.0
  [2 ,  2]  =  7.0
  [3 ,  3]  =  7.0
  [4 ,  4]  =  7.0
  [5 ,  5]  =  7.0
  [6 ,  6]  =  7.0
  [7 ,  7]  =  7.0
  [8 ,  8]  =  7.0
  [9 ,  9]  =  7.0
  [10, 10]  =  7.0

t1 = time_ns()
(eigvals, eigvecs) = eigs(A)
t2 = time_ns()
elapsedTime = (t2-t1)/1.0e9
0.001311208

as I wasn't able to get the PySparse part running yet. That solves part of efficiency. Next would be getting the map function testing it in Julia due to the GIL in Python.

edited Jun 02 '18 at 18:40

answered Jun 01 '18 at 15:42

Yeah but I am interested in finding the best way to gather all the eigenvalues and eigenvectors separetely. (see main post). – Tazwiie Jun 01 '18 at 17:45
Thanks! That wasn’t the case for me, II was returned a list of eigenvalue,eigenvector tuples. I will try to see what happens with your code. What does the 1 mean in your p.map? – Tazwiie Jun 01 '18 at 18:56
I thought it mattered, it doesn't. It appeared in other examples to be the size of the blocks. I don't know much about this module. You can remove it and it still runs. – Jun 01 '18 at 19:04
Ok. Now what I want is to recover efficiently the eigenvalues of the other matrices. Preferably using list comprehension but result[:][0] doesn’t do the job... – Tazwiie Jun 01 '18 at 19:53
Some minor edit here, the eigenvalues and eigenvectors are separated into lists. – Jun 01 '18 at 21:19
Yes but do you think there is a more efficient way? – Tazwiie Jun 01 '18 at 21:21
If I understood what you meant by the structure of your matrix depending on a parameter m and you cared about performance of the eigenvalue algorithms, yes. However my area is more numerical linear algbra not what data structure is better. I don't see how changing what data structure you return it with will matter much. If you realize that the structure of your matrix can drastically alter you time complexity of an eigenvalue decomposition. – Jun 01 '18 at 21:27
is m an integer or a floating point number. Are they all symmetric like that? – Jun 01 '18 at 21:42
Yes m is an integer and they are hermitian. – Tazwiie Jun 02 '18 at 08:43
Just found this library, testing in a few hours. http://pysparse.sourceforge.net/contents.html. Your matrix is over finite field. The best method block lanczos apparently. – Jun 02 '18 at 16:33

Python multiprocessing diagonalization

1 Answers1