Let work_one()
be a function like this:
def work_one(program_path):
task = subprocess.Popen("./" + program_path, shell=True)
t = 0
while True:
ret = task.poll()
if ret != None or t >= T: # T is the max time we allow for 'task'
break;
else:
t += 0.05
time.sleep(0.05)
return t
There are many such programs to feed to work_one()
. When these programs run sequentially, the time reported by each work_one()
is a reliable coarse measure of each program's runtime.
However, let say we have a multiprocessing.Pool()
instance comprising 20 workers, pool
, and we call function like this:
def work_all(programs):
pool = multiprocessing.Pool(20)
ts = pool.map(work_one, programs)
return ts
Now, the runtime measures reported by work_all()
is approximately 20 times larger than what a sequential work_one()
would have reported.
This is reasonable, because inside work_one()
, when a worker's process adds 0.05
to t
and yields (by calling time.sleep()
), it might not be able to yield to the subprocess it is managing (task
) (unlike when there is only one worker); instead, the OS might decided to give the CPU to another concurrent worker's process. Therefore the number of iterations inside a worker_one()
may be 20 times more before task
completes.
Question:
- How to correctly implement
work_one()
to get a good runtime measure, knowing thatwork_one()
might be running concurrently? - I also want
work_one()
to return early if the subprocesstask
doesn't complete inT
seconds, soos.wait*()
functions seems not a good solution because they block the parent process.