Best way to simultaneously run this loop?

Question

I have the following code:

data = [2,5,3,16,2,5]        

def f(x):       
    return 2*x

f_total = 0
for x in data:
    f_total += f(x)

print(f_total/len(data))

which I want to speed up the for loop. (In reality the code is more complex and I want to run it in a super computer with many many processing cores). I have read that I can do this with the multiprocessing library where I can get python3 to simultaneously run different chunks of the loop at the same time but I am a bit lost with it.

Could you explain me how to do it with this minimal version of my program?

Thanks!

Does this answer your question? [How do I parallelize a simple Python loop?](https://stackoverflow.com/questions/9786102/how-do-i-parallelize-a-simple-python-loop) — ranka47, Nov 24 '20 at 17:00
The super computer is from my research institution. @ranka47 It might answer my question but I can't understand it fully, perhaps a more detailed/simpler answer could work for me? — FriendlyLagrangian, Nov 24 '20 at 17:08

ranka47 · Accepted Answer · 2020-11-25T16:08:50.217

1

import multiprocessing
from numpy import random

"""
This mentions the number of worker threads that you want to run in parallel.
Depending on the number of cores in your system you should choose the appropriate
number of threads. When you call 'map' function it will distribute the input
values in that many parts
"""
NUM_CORES = 6
data = random.rand(100, 1)

"""
+2 so that the cores are not left idle in case a thread is waiting for I/O. 
Choose by performing an empirical analysis depending on the function you are trying to compute.
It could match up to NUM_CORES as well. You can vary the chunksize as well depending on the size of 'data' that you have. 
"""
NUM_THREADS = NUM_CORES+2
CHUNKSIZE = int(len(data)/(NUM_THREADS))    


def f(x):       
    return 2*x

# This takes care of creating pool of worker threads which will be assigned the jobs
pool = multiprocessing.Pool(NUM_THREADS)

# map vs imap. If the data is large go for imap else map is also good.
it = pool.imap(f, data, chunksize=CHUNKSIZE)

f_total = 0
# Iterate and sum up the result
for value in it:
    f_total += sum(value)

print(f_total/len(data))

Why choose imap over map?

edited Nov 25 '20 at 16:08

answered Nov 24 '20 at 23:31

ranka47

686
5
18

Thanks a lot for such detailed answer! So I guess I can imagine a worker being a single core in my computer working at a particular independent task, or can it be any number I like? If so, how does one choose wisely the number of workers? – FriendlyLagrangian Nov 25 '20 at 10:29
Also, if `int(len(data)/(NUM_CORES-2))` is not equal to `len(data)` will python know that it needs to assign some workers some extra iterations to exhaust completely `data`? – FriendlyLagrangian Nov 25 '20 at 10:32
Finally, I dont see the need for `sum(value)`, wouldnt it be sufficient to simply do `f_total += value` as `value` is already a number? I might be missing something. – FriendlyLagrangian Nov 25 '20 at 11:10
Bonus: I have been playing around with `NUM_CORES` and found that even though my PC has 8 (via `os.cpu_count()`) if I put a bigger number (not much bigger), say `NUM_CORES=10` I get a better performance (at least for this silly example albeit with greater data size). How does one choose the best number for `NUM_CORES`? (I guess this is also related to my first question) – FriendlyLagrangian Nov 25 '20 at 11:21
it is `sum(value)` because `imap` is returning a `list`. You can replace it with `value[0]` as well. About the choice of NUM_THREADS I was wrong. You can give more value but to a certain extent. I am not aware of a formulation on how to choose number of threads. I would suggest doing an empirical analysis. More would involve using core when a thread is waiting for I/O. However, very high value may add overhead due to context switching. – ranka47 Nov 25 '20 at 16:04

Best way to simultaneously run this loop?

1 Answers1