Fastest way to perform Multiprocessing of a loop in a function?

Question

1. I have a function var. I want to know the best possible way to run the loop within this function quickly by multiprocessing/parallel processing by utilizing all the processors, cores, threads, and RAM memory the system has.

import numpy
from pysheds.grid import Grid

xs = 82.1206, 72.4542, 65.0431, 83.8056, 35.6744
ys = 25.2111, 17.9458, 13.8844, 10.0833, 24.8306

a = r'/home/test/image1.tif'
b = r'/home/test/image2.tif'

def var(interest):
    
    variable_avg = []
    for (x,y) in zip(xs,ys):
        grid = Grid.from_raster(interest, data_name='map')

        grid.catchment(data='map', x=x, y=y, out_name='catch')

        variable = grid.view('catch', nodata=np.nan)
        variable = numpy.array(variable)
        variablemean = (variable).mean()
        variable_avg.append(variablemean)
    return(variable_avg)

2. It would be great if I can run both function var and loop in it parallelly for the given multiple parameters of the function. ex:var(a)and var(b) at the same time. Since it will consume much less time then just parallelizing the loop alone.

Ignore 2, if it does not make sense.

btw don't use the name of your function `variable` inside as the name of a variable. Perhaps don't use `variable`, use more descriptive words instead. — quamrana, Jul 31 '20 at 08:44
If I use ````var(a)````, the code takes ````r'/home/test/image1.tif'```` in the place of ````interest```` in ````grid.image(data=interest, x=x, y=y, out_name='catch')````. Same applies to the parameter ````b```` as well to use ````r'/home/test/image2.tif'```` in place of interest. — Gun, Jul 31 '20 at 09:07
@Daniser Thanks for the suggestion. However, ````numba```` does not work in this case. — Gun, Jul 31 '20 at 11:11
What is `grid`? Why does numba not work? `prange` seems like what you want. — yao99, Jul 31 '20 at 15:04
@yao99 ````grid```` is a function from ````pysheds```` library. I want the answer to be general. — Gun, Jul 31 '20 at 17:08
I am not sure parallelism could provide any speed up here. Indeed, the only purely computational function is a *mean* which is very *cheap* compared to the operations performed on `grid` (main *I/O operations* I think). — Jérôme Richard, Jul 31 '20 at 17:23
@JérômeRichard You are right. ````grid```` handles most of the processing in my script. Besides, I am wondering whether the script is utilizing all the processors, cores, threads, and RAM memory the system has or not. — Gun, Jul 31 '20 at 17:28
Maybe this will be helpful, https://stackoverflow.com/questions/11368486/openmp-and-python. — yao99, Jul 31 '20 at 17:54
Does this answer your question? [How do I parallelize a simple Python loop?](https://stackoverflow.com/questions/9786102/how-do-i-parallelize-a-simple-python-loop) — mkrieger1, Jul 31 '20 at 21:31

score 4 · Answer 1 · answered Aug 02 '20 at 15:53

TLDR: You can use the multiprocessing library to run your var function in parallel. However, as written you likely don't make enough calls to var for multiprocessing to have a performance benefit because of its overhead. If all you need to do is run those two calls, running in serial is likely the fastest you'll get. However, if you need to make a lot of calls, multiprocessing can help you out.

We'll need to use a process pool to run this in parallel, threads won't work here because Python's global interpreter lock will prevent us from true parallelism. The drawback of process pools is that processes are heavyweight to spin up. In the example of just running two calls to var the time to create the pool overwhelms the time spent running var itself.

To illiustrate this, let's use a process pool and use asyncio to run calls to var in parallel and compare it to just running things sequentially. Note to run this example I used an image from the Pysheds library https://github.com/mdbartos/pysheds/tree/master/data - if your image is much larger the below may not hold true.

import functools
import time
from concurrent.futures.process import ProcessPoolExecutor
import asyncio

a = 'diem.tif'
xs = 10, 20, 30, 40, 50
ys = 10, 20, 30, 40, 50

async def main():
    loop = asyncio.get_event_loop()
    pool_start = time.time()
    with ProcessPoolExecutor() as pool:
        task_one = loop.run_in_executor(pool, functools.partial(var, a))
        task_two = loop.run_in_executor(pool, functools.partial(var, a))
        results = await asyncio.gather(task_one, task_two)
        pool_end = time.time()
        print(f'Process pool took {pool_end-pool_start}')

    serial_start = time.time()

    result_one = var(a)
    result_two = var(a)

    serial_end = time.time()
    print(f'Running in serial took {serial_end - serial_start}')


if __name__ == "__main__":
    asyncio.run(main())

Running the above on my machine (a 2.4 GHz 8-Core Intel Core i9) I get the following output:

Process pool took 1.7581260204315186
Running in serial took 0.32335805892944336

In this example, a process pool is over five times slower! This is due to the overhead of creating and managing multiple processes. That said, if you need to call var more than just a few times, a process pool may make more sense. Let's adapt this to run var 100 times and compare the results:

async def main():
    loop = asyncio.get_event_loop()
    pool_start = time.time()
    tasks = []
    with ProcessPoolExecutor() as pool:
        for _ in range(100):
            tasks.append(loop.run_in_executor(pool, functools.partial(var, a)))
        results = await asyncio.gather(*tasks)
        pool_end = time.time()
        print(f'Process pool took {pool_end-pool_start}')

    serial_start = time.time()

    for _ in range(100):
        result = var(a)

    serial_end = time.time()
    print(f'Running in serial took {serial_end - serial_start}')

Running 100 times, I get the following output:

Process pool took 3.442288875579834
Running in serial took 13.769982099533081

In this case, running in a process pool is about 4x faster. You may also wish to try running each iteration of your loop concurrently. You can do this by creating a function that processes one x,y coordinate at a time and then run each point you want to examine in a process pool:

def process_poi(interest, x, y):
    grid = Grid.from_raster(interest, data_name='map')

    grid.catchment(data='map', x=x, y=y, out_name='catch')

    variable = grid.view('catch', nodata=np.nan)
    variable = np.array(variable)
    return variable.mean()

async def var_loop_async(interest, pool, loop):
    tasks = []
    for (x,y) in zip(xs,ys):
        function_call = functools.partial(process_poi, interest, x, y)
        tasks.append(loop.run_in_executor(pool, function_call))

    return await asyncio.gather(*tasks)

async def main():
    loop = asyncio.get_event_loop()
    pool_start = time.time()
    tasks = []
    with ProcessPoolExecutor() as pool:
        for _ in range(100):
            tasks.append(var_loop_async(a, pool, loop))
        results = await asyncio.gather(*tasks)
        pool_end = time.time()
        print(f'Process pool took {pool_end-pool_start}')

    serial_start = time.time()

In this case I get Process pool took 3.2950568199157715 - so not really any faster than our first version with one process per each call of var. This is likely because the limiting factor at this point is how many cores we have available on our CPU, splitting our work into smaller increments does not add much value.

That said, if you have 1000 x and y coordinates you wish to examine across two images, this last approach may yield a performance gain.

Thank you for the detailed answer. I have a total of 20000 ````x```` and ````y```` coordinates and 20 unique ````var```` parameters(like ````a```` and ````b```` in the above example code). What would be the best approach to get the maximum speed with this numbers? — Gun, Aug 02 '20 at 17:18
@Ganesh In this case since there are so many coordinates, the method described in `var_loop_async` is likely to perform the best. I tried this approach with 1000 x and y coordinates across 20 var parameters and was able to run in about 80 seconds as opposed to 95 by just running each `var` call in a process. I would benchmark on your machine however to determine the best approach. — Matt Fowler, Aug 02 '20 at 19:44
May I know what should I pass at the parameters ````pool```` and ````loop```` in ````var_loop_async(interest, pool, loop)````? And also, In ````for _ in range(100):````, the number 100 would always be constant, or it changes depending on the inputs? May I know exactly what is happening there and which is a final function to be called and what are its parameters, since in the original code I posted I have only ````interest```` to pass to the only function ````var(interest)````, but here I see two different functions ````process_poi(), var_loop_async()````. Unknown parameters x, y, pool, loop . — Gun, Aug 03 '20 at 05:51
@Ganesh you should pass in the process pool for `pool` and asyncio event loop for `loop` - this is done in the `main` method in the answer. You should be able to remove the `for _ in range(100)` - this was just to demonstrate how things perform when calling with 100 var parameters. You should call `var_loop_async` once per each `var` parameter, `process_poi` is for processing one x,y coordinate at a time. — Matt Fowler, Aug 03 '20 at 12:11
@MattFowler I tried calling function ````var_loop_async(a, pool, loop)````, but it is not working. The system says ````pool```` and ````loop```` are not defined. Can you please tell with an example by passing parameters ````pool```` and ````loop```` in its place. — Gun, Aug 06 '20 at 16:20

alec_djinn · Answer 2 · 2020-09-16T07:52:06.997

1

I think this is a reasonable and straightforward way of speeding up your code by merely parallelizing only the main loop. You can saturate your cores with this, so there is no need to parallelize also for the interest variable. I can't test the code, so I assume that your function is correct, I have just encoded the loop in a new function and parallelized it in var().

from multiprocessing import Pool


def var(interest,xs,ys):
    grid = Grid.from_raster(interest, data_name='map')
    with Pool(4) as p: #uses 4 cores, adjust this as you need
        variable_avg = p.starmap(loop, [(x,y,grid) for x,y in zip(xs,ys)])
    return variable_avg
    

def loop(x, y, grid):
    grid.catchment(data='map', x=x, y=y, out_name='catch')
    variable = grid.view('catch', nodata=np.nan)
    variable = numpy.array(variable)
    return variable.mean()

edited Sep 16 '20 at 07:52

answered Aug 06 '20 at 13:53

alec_djinn

7,288
8
29
58

Thanks for the effort. I got an error ````NameError: nodata value for 'map' not found in instance.```` when I ran your code. However, the original code runs smoothly without any error. – Gun Aug 06 '20 at 15:52
I don’t know how grid works, you probably must pass `map` as well since you have `data=map`. Unfortunately, from your code it is not clear where `map` is coming from. You have to adjust it a little. – alec_djinn Aug 06 '20 at 16:33
Unfortunately, if you can't give us a minimal reproducible example https://stackoverflow.com/help/minimal-reproducible-example we can't help much. We need to be able to run the code ourselves to fix it. – alec_djinn Sep 16 '20 at 07:56

Fastest way to perform Multiprocessing of a loop in a function?

2 Answers2

Linked