8

So I've been messing around with python's multiprocessing lib for the last few days and I really like the processing pool. It's easy to implement and I can visualize a lot of uses. I've done a couple of projects I've heard about before to familiarize myself with it and recently finished a program that brute forces games of hangman.

Anywho, I was doing an execution time compairison of summing all the prime numbers between 1 million and 2 million both single threaded and through a processing pool. Now, for the hangman cruncher, putting the games in a processing pool improved execution time by about 8 times (i7 with 8 cores), but when grinding out these primes, it actually increased processing time by almost a factor of 4.

Can anyone tell me why this is? Here is the code for anyone interested in looking at or testing it:

#!/user/bin/python.exe
import math
from multiprocessing import Pool

global primes
primes = []

def log(result):
    global primes

    if result:
        primes.append(result[1])

def isPrime( n ):
    if n < 2:
        return False
    if n == 2:
        return True, n

    max = int(math.ceil(math.sqrt(n)))
    i = 2
    while i <= max:
        if n % i == 0:
            return False
        i += 1
    return True, n


def main():

   global primes

   #pool = Pool()

   for i in range(1000000, 2000000):
       #pool.apply_async(isPrime,(i,), callback = log)
       temp = isPrime(i)
       log(temp)

   #pool.close()
   #pool.join()

   print sum(primes)

   return

if __name__ == "__main__":
    main()

It'll currently run in a single thread, to run through the processing pool, uncomment the pool statements and comment out the other lines in the main for loop.

Laharah
  • 311
  • 4
  • 11
  • Actually something that will speed your code up a hell of a lot is remove the global parts. If you have variables locally lookup is much faster. – Jakob Bowyer Aug 26 '11 at 19:38
  • can't remove the global variables without adding classes, as I can only pass one variable to the callback function. – Laharah Aug 27 '11 at 01:16
  • related: [Fastest way to list all primes below N in python](http://stackoverflow.com/q/2068372/4279) – jfs Mar 06 '14 at 21:14

1 Answers1

14

the most efficient way to use multiprocessing is to divide the work into n equal sized chunks, with n the size of the pool, which should be approximately the number of cores on your system. The reason for this is that the work of starting subprocesses and communicating between them is quite large. If the size of the work is small compared to the number of work chunks, then the overhead of IPC becomes significant.

In your case, you're asking multiprocessing to process each prime individually. A better way to deal with the problem is to pass each worker a range of values, (probably just a start and end value) and have it return all of the primes in that range it found.

In the case of identifying large-ish primes, the work done grows with the starting value, and so You probably don't want to divide the total range into exactly n chunks, but rather n*k equal chunks, with k some reasonable, small number, say 10 - 100. that way, when some workers finish before others, there's more work left to do and it can be balanced efficiently across all workers.

Edit: Here's an improved example to show what that solution might look like. I've changed as little as possible so you can compare apples to apples.

#!/user/bin/python.exe
import math
from multiprocessing import Pool

global primes
primes = set()

def log(result):
    global primes

    if result:
        # since the result is a batch of primes, we have to use 
        # update instead of add (or for a list, extend instead of append)
        primes.update(result)

def isPrime( n ):
    if n < 2:
        return False
    if n == 2:
        return True, n

    max = int(math.ceil(math.sqrt(n)))
    i = 2
    while i <= max:
        if n % i == 0:
            return False
        i += 1
    return True, n

def isPrimeWorker(start, stop):
    """
    find a batch of primes
    """
    primes = set()
    for i in xrange(start, stop):
        if isPrime(i):
            primes.add(i)

    return primes



def main():

    global primes

    pool = Pool()

    # pick an arbitrary chunk size, this will give us 100 different 
    # chunks, but another value might be optimal
    step = 10000

    # use xrange instead of range, we don't actually need a list, just
    # the values in that range.
    for i in xrange(1000000, 2000000, step):
        # call the *worker* function with start and stop values.
        pool.apply_async(isPrimeWorker,(i, i+step,), callback = log)

    pool.close()
    pool.join()

    print sum(primes)

    return

if __name__ == "__main__":
    main()
SingleNegationElimination
  • 137,315
  • 28
  • 247
  • 284
  • Thank you very much, this answer was exactly what I was looking for. I knew that starting processes cost a lot of overhead, but not that adding new work to them was also so expensive. – Laharah Aug 26 '11 at 21:37