10

When I run the following script on a Windows computer, I do not see any of the log messages from the log_pid function, however I do when I run on Unix / Mac. I've read before that multiprocessing is different on Windows compared to Mac, but it's not clear to me what changes should I make to get this script to work on Windows. I'm running Python 3.6.

import logging
import sys
from concurrent.futures import ProcessPoolExecutor
import os


def log_pid(x):
    logger.info('Executing on process: %s' % os.getpid())


def do_stuff():
    logger.info('this is the do stuff function.')
    with ProcessPoolExecutor(max_workers=4) as executor:
        executor.map(log_pid, range(0, 10))


def main():
    logger.info('this is the main function.')
    do_stuff()


if __name__ == '__main__':
    logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
    logger = logging.getLogger(__name__)

    logger.info('Start of script ...')

    main()

    logger.info('End of script ...')
blahblahblah
  • 1,944
  • 5
  • 33
  • 54

1 Answers1

11

Unix processes are created via the fork strategy where the child gets cloned from the parent and continues its execution right at the moment the parent forked.

On Windows is quite different: a blank process is created and a new Python interpreter gets launched. The interpreter will then load the module where the log_pid function is located and execute it.

This means the __main__ section is not executed by the newly spawned child process. Hence, the logger object is not created and the log_pid function crashes accordingly. You don't see the error because you ignore the result of your computation. Try to modify the logic as follows.

def do_stuff():
    logger.info('this is the do stuff function.')
    with ProcessPoolExecutor(max_workers=4) as executor:
        iterator = executor.map(log_pid, range(0, 10))
        list(iterator)  # collect the results in a list

And the issue will become evident.

Traceback (most recent call last):
  File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 175, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 153, in _process_chunk
    return [fn(*args) for args in chunk]
  File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 153, in <listcomp>
    return [fn(*args) for args in chunk]
  File "C:\Users\cafama\Desktop\pool.py", line 8, in log_pid
    logger.info('Executing on process: %s' % os.getpid())
NameError: name 'logger' is not defined

When dealing with process pools (whether concurrent.futures or multiprocessing ones) always collect the result of the computation to avoid silent bugs to cause confusion.

To fix the problem, just move the logger creation at the top level of the module and everything will work on all platforms.

import logging
import sys
from concurrent.futures import ProcessPoolExecutor
import os

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logger = logging.getLogger(__name__)   

def log_pid(x):
    logger.info('Executing on process: %s' % os.getpid())

...
noxdafox
  • 11,600
  • 3
  • 26
  • 37