How do I find the optimal chunk size for multiprocessing.Pool
instances?
I used this before to create a generator of n
sudoku objects:
processes = multiprocessing.cpu_count()
worker_pool = multiprocessing.Pool(processes)
sudokus = worker_pool.imap_unordered(create_sudoku, range(n), n // processes + 1)
To measure the time, I use time.time()
before the snippet above, then I initialize the pool as described, then I convert the generator into a list (list(sudokus)
) to trigger generating the items (only for time measurement, I know this is nonsense in the final program), then I take the time using time.time()
again and output the difference.
I observed that the chunk size of n // processes + 1
results in times of around 0.425 ms per object. But I also observed that the CPU is only fully loaded the first half of the process, in the end the usage goes down to 25% (on an i3 with 2 cores and hyper-threading).
If I use a smaller chunk size of int(l // (processes**2) + 1)
instead, I get times of around 0.355 ms instead and the CPU load is much better distributed. It just has some small spikes down to ca. 75%, but stays high for much longer part of the process time before it goes down to 25%.
Is there an even better formula to calculate the chunk size or a otherwise better method to use the CPU most effective? Please help me to improve this multiprocessing pool's effectiveness.