0

I am trying to parallelize a for loop in python that must execute a function with several arguments, one of which will be changing through the loop. The loop itself needs to be embedded in a function. I have already looked here, here and here in stackoverflow and beyond (here and here) but I just cannot make it work :(

Below is a MWE:

import time
import numpy as np
from multiprocessing import Pool
from functools import partial

def mytestFun(otherStuff, myparams):
    return myparams[0]*otherStuff - myparams[1]

def myfun1(extraParams, mylist):
    [myMat, otherStuff] = extraParams
    
    for ivals in mylist:
        myparams = myMat[ivals,:]
        result = mytestFun(otherStuff, myparams)
    return result

if __name__ == '__main__':
    a_list = [0, 1, 2, 3, 4, 5]

    myMat = np.random.uniform(0,1,(6,2))
    extraParams = [myMat, 5]
    print(myfun1(extraParams, a_list))
    pool = Pool()
    func = partial(myfun1, extraParams)
    pool.map(func, a_list)
    pool.close()
    pool.join()

And I keep getting errors that I don't know how to interpret:

Traceback (most recent call last):
  File "exampleMultiProcessing.py", line 61, in <module>
    pool.map(func, a_list)
  File "/Users/laurama/miniconda3/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/Users/laurama/miniconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
TypeError: cannot unpack non-iterable int object

Thanks in advance!

Laura
  • 1,622
  • 2
  • 14
  • 21

1 Answers1

1

You can read about joblib here. Basically when we use joblib, it expects that we will be passing it the args for the function which we want to parallelise. So here i am passing the args directly to the function, that's why i am looping using the underscore_variable, you can use anything there, no issues at all. Basically I am ignoring the looping variable using the _;

And yes, Parallel automatically will distribute it over n_cores;

Try this:

from joblib import Parallel, delayed    

if __name__ == '__main__': 
    a_list = [0, 1, 2, 3, 4, 5] 
    myMat = np.random.uniform(0,1,(6,2)) 
    extraParams = [myMat, 5] 
    print(myfun1(extraParams, a_list)) 
    result = Parallel(n_jobs=8)(delayed(myfun1)(extraParams, a_list) for _ in range(1))[0]
Aditya
  • 1,762
  • 1
  • 10
  • 32
  • it does thank you, but could you please provide more context? I don't understand very much what the solution is doing, in particular, this part: for _ in range(1)) in fact, I have never seen an underscore for a variable. Also, would Parallel automatically distribute the loop among the n_jobs? Thanks! – Laura Jun 26 '20 at 17:57
  • @Laura; Apologies, i have updated the answer now; Let me know! – Aditya Jun 26 '20 at 18:30
  • Actually, I don't understand what happens, but a quick check finds that this solution is actually significantly slower than just doing the loop both with this trivial example and with my real example... Any ideas? – Laura Jun 26 '20 at 21:33
  • Ahh now I see why it's slower; you are having a loop over input's, I didn't see that. In that case we need to change the looping strategy in; try modifying it to loop over that array of yours(IE. On your inputs) – Aditya Jun 27 '20 at 00:46