15

I have a function that uses multiprocessing (specifically joblib) to speed up a slow routine using multiple cores. It works great; no questions there.

I have a test suite that uses multiprocessing (currently just the multiprocessing.Pool() system, but can change it to joblib) to run each module's test functions independently. It works great; no questions there.

The problem is that I've now integrated the multiprocessing function into the module's test suite, so that the pool process runs the multiprocessing function. I would like to make it so that the inner function knows that it is already being multiprocessed and not spin up more forks of itself. Currently the inner process sometimes hangs, but even if it doesn't, obviously there are no gains to multiprocessing within an already parallel routine.

I can think of several ways (with lock files, setting some sort of global variable, etc.) to determine the state we're in, but I'm wondering if there is some standard way of figuring this out (either in PY multiprocessing or in joblib). If it only works in PY3, that'd be fine, though obviously solutions that also work on 2.7 or lower would be better. Thanks!

Michael Scott Cuthbert
  • 2,611
  • 2
  • 18
  • 40

3 Answers3

3

Parallel in joblib should be able to sort these things out:
http://pydoc.net/Python/joblib/0.8.3-r1/joblib.parallel/

Two pieces from 0.8.3-r1:

# Set an environment variable to avoid infinite loops
os.environ[JOBLIB_SPAWNED_PROCESS] = '1'

Don't know why they go from a variable referring to the environmental, to the env. itself.. But as you can see. The feature is already implemented in joblib.

# We can now allow subprocesses again
os.environ.pop('__JOBLIB_SPAWNED_PARALLEL__', 0)


Here you can select other versions, if that's more relevant:
http://pydoc.net/Python/joblib/0.8.3-r1/

Payne
  • 319
  • 1
  • 6
2

The answer to the specific question is: I don't know of a ready-made utility.

A minimal(*) core refactoring would to be add a named parameter to your function currently creating child processes. The default parameter would be your current behavior, and an other value would switch to a behavior compatible with how you are running tests(**).

(*: there might be other, may be better, design alternatives to consider but we do not have enough information) (**: one may say that the introduction of a conditional behavior would require to test that as well, and we are back to square one...)

lgautier
  • 10,758
  • 24
  • 41
0

Check multiprocessing.current_process().daemon -- it will return True if the current process is a spawned one. (Answering own question)

Michael Scott Cuthbert
  • 2,611
  • 2
  • 18
  • 40