0

I'm using Python 3.5, and want use the multiprocessing module to run a function to parse some files with each file going to a different CPU Core; I'm sending a lot of parameters to the function so I'm using kwargs.

The following code works using concurrent futures (threaded), but I want to convert it to the multiprocessing equivalent (concurrent futures - ProcessPoolExecutor is too heavyweight for me, and gives me some pickling errors anyway).

from concurrent import futures
with futures.ThreadPoolExecutor(max_workers=4) as executor:
    # A dictionary which will contain a list the future info in the key, and the filename in the value
    jobs = {}

    # Loop through the files, and run the parse function for each file, sending the file-name to it, along with the kwargs of parser_variables.
    # The results of the functions can come back in any order.
    for this_file in files_list:
        job = executor.submit(parse_log_file.parse, this_file, **parser_variables)
        jobs[job] = this_file

    # Get the completed jobs whenever they are done
    for job in futures.as_completed(jobs):
        debug.checkpointer("Multi-threaded Parsing File finishing")

        # Send the result of the file the job is based on (jobs[job]) and the job (job.result)
        result_content = job.result()
        this_file = jobs[job]

It doesn't matter what order the results come back in.

When I tried with pool.apply, pool.apply_async, and pool.map (despite reading this question, I'm not much clearer on the differences) I got kwarg related errors:

TypeError: apply() got an unexpected keyword argument 'variable_list'

How do I convert the above to its multiprocessing equivilent?

Community
  • 1
  • 1
GIS-Jonathan
  • 3,167
  • 6
  • 28
  • 42
  • Can use post the `multiprocessing` code that isn't working instead of the thread code that is. Part of it is likely the use of `**parser_variables` ... you probably want `kwds=parser_variables`. – tdelaney Jan 02 '16 at 19:58
  • @tdelaney - the kwds=parser_variables seems to resolve that problem. I get another one now, but that's pickles and the same one that happened with concurrent Futures when I changed from threadPoolExecutor to ProcessPoolExecutor. – GIS-Jonathan Jan 03 '16 at 13:02
  • Progress! All threads see the same memory space so `multiprocessing` only needs to pass object references between them. With processes, `multiprocess` serializes the objects via `pickle` and builds new copies in the other process. You've got an object that `pickle` doesn't know how to serialize and you should focus on that problem. the exception should tell you exactly which object is the problem. If you can't solve it, write a new question here specifically about picking that object. – tdelaney Jan 03 '16 at 16:53

0 Answers0