325

I have not seen clear examples with use-cases for Pool.apply, Pool.apply_async and Pool.map. I am mainly using Pool.map; what are the advantages of others?

martineau
  • 99,260
  • 22
  • 139
  • 249
Phyo Arkar Lwin
  • 5,682
  • 11
  • 35
  • 52

3 Answers3

473

Back in the old days of Python, to call a function with arbitrary arguments, you would use apply:

apply(f,args,kwargs)

apply still exists in Python2.7 though not in Python3, and is generally not used anymore. Nowadays,

f(*args,**kwargs)

is preferred. The multiprocessing.Pool modules tries to provide a similar interface.

Pool.apply is like Python apply, except that the function call is performed in a separate process. Pool.apply blocks until the function is completed.

Pool.apply_async is also like Python's built-in apply, except that the call returns immediately instead of waiting for the result. An AsyncResult object is returned. You call its get() method to retrieve the result of the function call. The get() method blocks until the function is completed. Thus, pool.apply(func, args, kwargs) is equivalent to pool.apply_async(func, args, kwargs).get().

In contrast to Pool.apply, the Pool.apply_async method also has a callback which, if supplied, is called when the function is complete. This can be used instead of calling get().

For example:

import multiprocessing as mp
import time

def foo_pool(x):
    time.sleep(2)
    return x*x

result_list = []
def log_result(result):
    # This is called whenever foo_pool(i) returns a result.
    # result_list is modified only by the main process, not the pool workers.
    result_list.append(result)

def apply_async_with_callback():
    pool = mp.Pool()
    for i in range(10):
        pool.apply_async(foo_pool, args = (i, ), callback = log_result)
    pool.close()
    pool.join()
    print(result_list)

if __name__ == '__main__':
    apply_async_with_callback()

may yield a result such as

[1, 0, 4, 9, 25, 16, 49, 36, 81, 64]

Notice, unlike pool.map, the order of the results may not correspond to the order in which the pool.apply_async calls were made.


So, if you need to run a function in a separate process, but want the current process to block until that function returns, use Pool.apply. Like Pool.apply, Pool.map blocks until the complete result is returned.

If you want the Pool of worker processes to perform many function calls asynchronously, use Pool.apply_async. The order of the results is not guaranteed to be the same as the order of the calls to Pool.apply_async.

Notice also that you could call a number of different functions with Pool.apply_async (not all calls need to use the same function).

In contrast, Pool.map applies the same function to many arguments. However, unlike Pool.apply_async, the results are returned in an order corresponding to the order of the arguments.

Darkonaut
  • 14,188
  • 6
  • 32
  • 48
unutbu
  • 711,858
  • 148
  • 1,594
  • 1,547
  • 12
    Should there be `if __name__=="__main__"` before `apply_async_with_callback()` on Windows? – jfs Dec 16 '11 at 12:38
  • 3
    Thanks a lot. how about map_async? – Phyo Arkar Lwin Dec 17 '11 at 08:53
  • 39
    Look inside [multiprocessing/pool.py](http://hg.python.org/cpython/file/ea421c534305/Lib/multiprocessing/pool.py#l245) and you will see that `Pool.map(func,iterable)` is equivalent to `Pool.map_async(func,iterable).get()`. So the relationship between `Pool.map` and `Pool.map_async` is similar to that of `Pool.apply` and `Pool.apply_async`. The `async` commands return immediately, while the non-`async` commands block. The `async` commands also have a callback. – unutbu Dec 17 '11 at 11:38
  • 8
    Deciding between using `Pool.map` and `Pool.apply` is similar to deciding when to use `map` or `apply` in Python. You just use the tool that fits the job. Deciding between using the `async` and non-`async` version depends on if you want the call to block the current process and/or if you want to use the callback. – unutbu Dec 17 '11 at 11:39
  • callback sounds more truly async! i gonna test it out! – Phyo Arkar Lwin Dec 17 '11 at 12:12
  • 2
    Is the order guarenteed if you use a list comprehension like [this example in the official docs](https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers)? e.g. `multiple_results = [pool.apply_async(os.getpid, ()) for i in range(4)]` `print [res.get(timeout=1) for res in multiple_results]` – falsePockets May 22 '17 at 07:31
  • 6
    @falsePockets: Yes. Each call to `apply_async` returns an `ApplyResult` object. Calling that `ApplyResult`'s `get` method will return the associated function's return value (or raise `mp.TimeoutError` if the call times-out.) So if you put the `ApplyResult`s in an ordered list, then calling their `get` methods will return the results in the same order. You could just use `pool.map` in this situation however. – unutbu May 22 '17 at 10:11
  • 1
    Thanks. I considered pool.map, but I want to use a `for` loop instead of list comprehension to set up the arguments to each process, since I'm passing multiple arguments to each process. Some arguments are unique to each process, some are constant among all, and some are constant among only a subset of all processes. Using starmap and others gets quite messy for this case. Also, I want it to work in both python 2 and 3. – falsePockets May 23 '17 at 01:26
  • @unutbu I want to run a function in the background but I have some resource limitations and cannot run the function as many times that I want and want to queue the extra executions of the function. Do you have any idea on how I should do that? I have my question [here](https://stackoverflow.com/questions/49081260/executing-a-function-in-the-background-while-using-limited-number-of-cores-threa). Could you please take a look at my question and see if you can give me some hints (or even better, an answer) on how I should do that? – Amir Mar 03 '18 at 19:05
  • 1
    If we use `callback` in `pool.apply_async ` as in this example, do we have guarantee of writing protection on the shared variable `result_list`? – galactica Jul 29 '19 at 22:18
  • 3
    @galactica: Each time the worker function ends successfully (without raising an exception), the callback function is called *in the main process*. The worker functions put return values in a queue, and the `pool._result_handler` thread in the main process handles the returned values one at a time, passing the returned value to the callback function. So you are guaranteed that the callback function will be called once for each returned value and there is no concurrency problem here because the callback is being called sequentially by a single thread in the main process. – unutbu Jul 29 '19 at 22:46
  • @galactica: The only caveat is that you need call `pool.join()` before inspecting `result_list` or else it might not contain all the results. (After all, `pool.join()` is telling the main process to wait until all the tasks have finished.) – unutbu Jul 29 '19 at 22:46
  • 2
    @galactica: `result_list` is not a shared variable. Although it may exist in the worker processes, it is only meant to be modified and accessed by the main process. If you do modify `result_list` in a worker process, the value will not be seen by the main process. – unutbu Jul 29 '19 at 22:51
  • @unutbu, thanks for the detailed clarification! Concerns relieved. This callback based approach seems easier to aggregate results from multiple worker processes. One thing I noticed is the restriction of only being able to pass one parameter to callback function. Thus, it seems I can't place this callback function and the result list as class members, but only place them as global ones. Any workaround for this? – galactica Jul 29 '19 at 23:15
  • @galatica: [It is possible](https://ideone.com/FaWOoQ) to make the callback and `result_list` members of a class. In the linked example, `foo.log_result` is what is called a *bound method*. Accessing `foo.log_result` returns a function with the instance `foo` "bound" to it, in the sense that calling `foo.log_result(x)` is equivalent to `Foo.log_result(foo, x)`. Notice that `foo` gets passed as the first argument. – unutbu Jul 30 '19 at 00:05
86

Regarding apply vs map:

pool.apply(f, args): f is only executed in ONE of the workers of the pool. So ONE of the processes in the pool will run f(args).

pool.map(f, iterable): This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. So you take advantage of all the processes in the pool.

Im0rtality
  • 3,354
  • 3
  • 29
  • 41
kakhkAtion
  • 2,034
  • 18
  • 21
  • 6
    what if the iterable is a generator – RustyShackleford Jun 21 '17 at 19:36
  • Hmm... Good question. To be honest I haven't ever used pools with generators, but this thread might be helpful: https://stackoverflow.com/questions/5318936/python-multiprocessing-pool-lazy-iteration – kakhkAtion Jun 21 '17 at 20:10
  • @kakhkAtion Regarding apply, if only one of the workers execute the function, what do the rest of the workers do? Do I have to call apply multiple times to have the rest of the workers perform a task? – Moondra Jul 27 '17 at 17:33
  • 3
    True. Also take a look at pool.apply_async if you want to lunch workers asynchronously. "pool_apply blocks until the result is ready, so apply_async() is better suited for performing work in parallel" – kakhkAtion Jul 27 '17 at 18:50
  • 1
    What happens when I have 4 processes but have called `apply_async()` 8 times? Will it automatically handle it with a queue? – Saravanabalagi Ramachandran Dec 23 '19 at 14:19
  • @SaravanabalagiRamachandran When I tested it using map() function, I think it works as you said. I put 13 arguments in one iterable and run it with 12 core. 12 jobs were running simultaneously and the rest one works after the first bunch is done. – JunKim Feb 18 '20 at 06:40
72

Here is an overview in a table format in order to show the differences between Pool.apply, Pool.apply_async, Pool.map and Pool.map_async. When choosing one, you have to take multi-args, concurrency, blocking, and ordering into account:

                  | Multi-args   Concurrence    Blocking     Ordered-results
---------------------------------------------------------------------
Pool.map          | no           yes            yes          yes
Pool.map_async    | no           yes            no           yes
Pool.apply        | yes          no             yes          no
Pool.apply_async  | yes          yes            no           no
Pool.starmap      | yes          yes            yes          yes
Pool.starmap_async| yes          yes            no           no

Notes:

  • Pool.imap and Pool.imap_async – lazier version of map and map_async.

  • Pool.starmap method, very much similar to map method besides it acceptance of multiple arguments.

  • Async methods submit all the processes at once and retrieve the results once they are finished. Use get method to obtain the results.

  • Pool.map(or Pool.apply)methods are very much similar to Python built-in map(or apply). They block the main process until all the processes complete and return the result.

Examples:

map

Is called for a list of jobs in one time

results = pool.map(func, [1, 2, 3])

apply

Can only be called for one job

for x, y in [[1, 1], [2, 2]]:
    results.append(pool.apply(func, (x, y)))

def collect_result(result):
    results.append(result)

map_async

Is called for a list of jobs in one time

pool.map_async(func, jobs, callback=collect_result)

apply_async

Can only be called for one job and executes a job in the background in parallel

for x, y in [[1, 1], [2, 2]]:
    pool.apply_async(worker, (x, y), callback=collect_result)

starmap

Is a variant of pool.map which support multiple arguments

pool.starmap(func, [(1, 1), (2, 1), (3, 1)])

starmap_async

A combination of starmap() and map_async() that iterates over iterable of iterables and calls func with the iterables unpacked. Returns a result object.

pool.starmap_async(calculate_worker, [(1, 1), (2, 1), (3, 1)], callback=collect_result)

Reference:

Find complete documentation here: https://docs.python.org/3/library/multiprocessing.html

airborne
  • 2,226
  • 2
  • 10
  • 20
Rene B.
  • 3,421
  • 3
  • 28
  • 49