11

I am processing some ascii-data, make some operations, and then writing everything back to another file (job done by post_processing_0.main, without returning anything). I want to parallelize the code with the multiprocessing module, see the following code snippet:

from multiprocessing import Pool
import post_processing_0

def chunks(lst,n):
    return [ lst[i::n] for i in xrange(n) ]

def main():
    pool = Pool(processes=proc_num)
    P={}
    for i in range(0,proc_num):
        P['process_'+str(i)]=pool.apply_async(post_processing_0.main, [split_list[i]])
    pool.close()
    pool.join()


proc_num=8
timesteps=100
list_to_do=range(0,timesteps)
split_list=chunks(list_to_do,proc_num)

main()

I read the difference between map and async, but I don t understand it very well. Is my application of multiprocessing module correct?

In this case, should I use map_async or apply_async? And why?

Edit:

I don't think this is a duplicate of the question Python multiprocessing.Pool: when to use apply, apply_async or map?. In the question, the answer focus on the order of the result that can be obtained using the two functions. Here i am asking: what is it the difference when nothing is returned?

Community
  • 1
  • 1
Pierpaolo
  • 1,511
  • 4
  • 18
  • 31
  • possible duplicate of [Python multiprocessing.Pool: when to use apply, apply\_async or map?](http://stackoverflow.com/questions/8533318/python-multiprocessing-pool-when-to-use-apply-apply-async-or-map) – user4815162342 Dec 15 '14 at 07:35
  • 1
    I read that question, but it doesn't clear my doubts. I am asking a stricter question: what is the difference in this case, where no result from the parallelized function is returned, ? – Pierpaolo Dec 15 '14 at 07:38

2 Answers2

15

I would recommend map_async for three reasons:

  1. It's cleaner looking code. This:

    pool = Pool(processes=proc_num)
    async_result = pool.map_async(post_processing_0.main, split_list)
    pool.close()
    pool.join()
    

    looks nicer than this:

    pool = Pool(processes=proc_num)
    P={}
    for i in range(0,proc_num):
        P['process_'+str(i)]=pool.apply_async(post_processing_0.main, [split_list[i]])
    pool.close()
    pool.join()
    
  2. With apply_async, if an exception occurs inside of post_processing_0.main, you won't know about it unless you explicitly call P['process_x'].get() on the failing AsyncResult object, which would require iterating over all of P. With map_async the exception will be raised if you call async_result.get() - no iteration required.

  3. map_async has built-in chunking functionality, which will make your code perform noticeably better if split_list is very large.

Other than that, the behavior is basically the same if you don't care about the results.

dano
  • 78,755
  • 12
  • 192
  • 204
11

apply_async submits a single job to the pool. map_async submits multiple jobs calling the same function with different arguments. The former takes a function plus argument list; the latter takes a function plus iterable (i.e. sequence) which represents the arguments. map_async can only call unary functions (i.e. functions taking one argument).

In your case, it might be better to restructure the code slightly to put all your arguments in a single list and just call map_async once with that list.

John Zwinck
  • 207,363
  • 31
  • 261
  • 371
  • So if we consider 4 processes and 16 file to postprocess, map will "create" 16 instances that will run all at the same time? – Pierpaolo Dec 15 '14 at 07:55
  • 1
    You mean the pool has size 4 and the arguments list has size 16? In that case, only 4 at a time will run; when the first completes, the fifth will start, etc. – John Zwinck Dec 15 '14 at 07:57
  • Ok! Then i understand there are no differences(apart of modifying the code).is that correct? – Pierpaolo Dec 15 '14 at 08:00
  • I said what the differences are in my answer. For example, `map_async` can only call unary functions. – John Zwinck Dec 15 '14 at 08:01
  • Whereas other answers I read on SO still left me with questions, this answer clarified the difference between "apply_async" and "map_async" in a beautifully succinct way. Thank you! – PeterByte Jun 13 '16 at 13:56