I'm using Python 3.5, and want use the multiprocessing module to run a function to parse some files with each file going to a different CPU Core; I'm sending a lot of parameters to the function so I'm using kwargs.
The following code works using concurrent futures (threaded), but I want to convert it to the multiprocessing equivalent (concurrent futures - ProcessPoolExecutor is too heavyweight for me, and gives me some pickling errors anyway).
from concurrent import futures
with futures.ThreadPoolExecutor(max_workers=4) as executor:
# A dictionary which will contain a list the future info in the key, and the filename in the value
jobs = {}
# Loop through the files, and run the parse function for each file, sending the file-name to it, along with the kwargs of parser_variables.
# The results of the functions can come back in any order.
for this_file in files_list:
job = executor.submit(parse_log_file.parse, this_file, **parser_variables)
jobs[job] = this_file
# Get the completed jobs whenever they are done
for job in futures.as_completed(jobs):
debug.checkpointer("Multi-threaded Parsing File finishing")
# Send the result of the file the job is based on (jobs[job]) and the job (job.result)
result_content = job.result()
this_file = jobs[job]
It doesn't matter what order the results come back in.
When I tried with pool.apply
, pool.apply_async
, and pool.map
(despite reading this question, I'm not much clearer on the differences) I got kwarg related errors:
TypeError: apply() got an unexpected keyword argument 'variable_list'
How do I convert the above to its multiprocessing equivilent?